Process 15,000 docs per minute. – THIS CANDIDATE NEEDS TO HAVE WORKED WITH AROUND THIS AMOUNT OF DATA (Duncan was working with 8 data points per DAY).
Trying to process and extract enrichments from the documents
They collect news from across the world, take those in real time and run an LP algorithm to do things like identity companies mentioned, product mentioned, etc. then link it to say this company was mentioned and link it to their records, or a person.
Sentiment analysis - its complex.
They want to know generally if an article is positive or negative about a certain person or company.
It may be negative about one company and positive about another
They want to be able to extract additional things - when did an event occur
Last month? When was that? Need to be able to categorize based on the timeline.