Gistology

making sense of the world

Keyword Analysis and Named Entity Recognition

One of the most elementary yet important properties of a text is what subject matter the text is discussing. Keyword Extraction finds the terms in a text, or set of texts, that the writer uses. A trivial approach to this is to perform a simple word count, but that can produce a lot of noise, even when a stopword list is used to filter out the most common words. Keyword Extraction, done well, performs a more nuanced analysis of terms, distinguishing, for example, between a mention of ice cream, on the one hand, from separate mentions of ice and cream.

Named Entity Recognition extends the value of Keyword Extraction by classifying terms in important categories such as people, places, or organizations.

In some cases, Keyword Extraction and NER systems can be quite accurate, over 90% correct, for example, in finding people names in English text. Combining Sentiment Analysis with these techniques can be powerful for determining what writers are talking about and what they have to say about it.

These technologies can be much more challenging for other languages than they are for English. Approaches that are designed for English will often perform poorly if they are naively applied on other languages without much alteration. The multilingual experience we have at Gistology is crucial for developing quality implementations for the languages of Europe, Asia, and elsewhere.


Sentiment Analysis Keyword Extraction Named Entity Recognition Language Detection
English German Spanish French Italian Portuguese Russian Arabic Chinese Japanese Dutch Swedish Norwegian Danish