Stanford Named Entity Recognizer integration

From WandoraWiki
Revision as of 17:01, 12 August 2011 by Akivela (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Stanford Named Entity Recognizer (NER) is an open source Java library for named entity recognition. In other words, Stanford NER can extract named entities out of given text. Named entities are persons, organizations, and locations, for example. Wandora features a tool called Stanford Named Entity Recognizer that uses Stanford NER Java library and extracts topics and associations out of given text, an occurrence, for example. Tool locates in Wandora application menu File > Extract > Classification. It is available in occurrence editor and browser plugin also.

Stanford NER is included in Wandora distribution package and embedded tool Stanford Named Entity Recognizer processes given text locally.

Configuring Stanford NER

Keeping CTRL-key pressed while starting the tool in Wandora opens up a configuration dialog window. In this window Wandora user can change NER's sequence classifier with NER model. Sequence classifier contains all information related to recognized entities.

At the moment Wandora includes all default sequence classifiers of Stanford NER. They locate in buid/classes/org/wandora/application/tools/extractors/stanfordner/classifiers and are

  • ner-eng-ie.crf-3-all2008.ser.gz
  • ner-eng-ie.crf-3-all2008-distsim.ser.gz
  • ner-eng-ie.crf-4-conll.ser.gz
  • ner-eng-ie.crf-4-conll-distsim.ser.gz

To train your own NER model see Stanford NER FAQ.

See also

Personal tools