GATE (General Architecture for Text Engineering) is a mature and actively used software framework for computational tasks involving human language. GATE has been developed in The University of Sheffield. It is open source and free software. ANNIE (A Nearly-New Information Extraction System) is a component of GATE used for information extraction. ANNIE extracts information out of given unstructured text. Wandora features a tool called GATE Annie that uses GATE and ANNIE to extract topics and associations out of given text, an occurrence for example. Tool locates in Wandora application menu File > Extract > Classification. It is also available in occurrence editor and browser plugin.

GATE and ANNIE are included in Wandora distribution package and embedded tool GATE Annie processes given text locally.

Configuring GATE Annie

Keeping CTRL-key pressed while starting the tool in Wandora application opens a configuration dialog window. Configuration options allow Wandora user to include and exclude Annie tokens of certain types.

Wandora keeps GATE related resources in build/lib/gate folder. ANNIE locates in folder build/lib/gate/plugins/ANNIE. It may be possible to adjust some of ANNIE settings by editing files directly in ANNIE's plugin folder. Also, it may be possible to replace the default ANNIE plugin with a modified one by changing files in plugin folder. You can adjust the ANNIE plugin with GATE's own IDE, GATE Developer. Wandora authors have not tried to modify default ANNIE, thus the word may.

GATE Annie extraction example

This example shows one way of using Wandora's GATE/ANNIE integration. In this example, Wandora user has somehow created topic GATE. Topic has an occurrence of type description. Value of the occurrence is the Introduction chapter found at GATE's website [1].

First Wandora user mouse clicks the occurrence cell and Wandora opens up occurrence editor. Occurrence editor is used to modify occurrence text.

Editor has menu option Refine > Classify > Classify with GATE Annie. User selects the option and Wandora starts GATE/ANNIE and extracts topics and associations out of occurrence text. After a while Wandora finishes extraction and Wandora window is refreshed below the occurrence editor.

Wandora user closes the occurrence editor. Extracted topics and associations are visible in topic's associations below association type GATE Annie entity type.

By default GATE Annie extracts only few entity types. To extract all entity types provided by the GATE Annie Wandora user has to configure the extractor. Wandora user opens up the occurrence editor again. She selects the menu option Refine > Classify > Classify with GATE Annie again but keeps CTRL-key pressed while selecting the menu option. Now Wandora opens up a configuration dialog for the GATE Annie extractor. Wandora user ticks the checkbox option Accept ALL annotation types and confirms change by pressing OK button.

Wandora user starts the extractor again by selecting menu option Refine > Classify > Classify with GATE Annie. Now extraction transforms all found entity types to topics and associations. When extraction finishes association table contains more associations. New entity types are Money, Token and Sentence for example.

See also

