Stanford Named Entity Recognizer integration
(→Configuring Stanford NER) |
(→See also) |
||
Line 25: | Line 25: | ||
* [[Zemanta extractor]] | * [[Zemanta extractor]] | ||
* [[GATE/ANNIE integration|GATE/ANNIE]] | * [[GATE/ANNIE integration|GATE/ANNIE]] | ||
+ | * [[UClassify integration]] |
Latest revision as of 17:01, 12 August 2011
Stanford Named Entity Recognizer (NER) is an open source Java library for named entity recognition. In other words, Stanford NER can extract named entities out of given text. Named entities are persons, organizations, and locations, for example. Wandora features a tool called Stanford Named Entity Recognizer that uses Stanford NER Java library and extracts topics and associations out of given text, an occurrence, for example. Tool locates in Wandora application menu File > Extract > Classification. It is available in occurrence editor and browser plugin also.
Stanford NER is included in Wandora distribution package and embedded tool Stanford Named Entity Recognizer processes given text locally.
[edit] Configuring Stanford NER
Keeping CTRL-key pressed while starting the tool in Wandora opens up a configuration dialog window. In this window Wandora user can change NER's sequence classifier with NER model. Sequence classifier contains all information related to recognized entities.
At the moment Wandora includes all default sequence classifiers of Stanford NER. They locate in buid/classes/org/wandora/application/tools/extractors/stanfordner/classifiers and are
- ner-eng-ie.crf-3-all2008.ser.gz
- ner-eng-ie.crf-3-all2008-distsim.ser.gz
- ner-eng-ie.crf-4-conll.ser.gz
- ner-eng-ie.crf-4-conll-distsim.ser.gz
To train your own NER model see Stanford NER FAQ.