Simple Word Matching Extractor

From WandoraWiki
Jump to: navigation, search

The simple word extractor associates a list of keywords with topics with corresponding data. Several configuration options control how this association is made. See Configuring tools for instructions on how to configure tools.

Regex
Whether to treat each word a regular expression instead of a substring.
Case sensitive
Whether to match words in a case sensitive manner.
Match word
Whether to require the word to match completely (in terms of whitespace), i.e. if disabled, "white" would not match "whitespace".
Base Name
Whether to consider the base name of the topic when determining a match
Variant Name
Whether to consider the variant name of the topic when determining a match
Instance Data
Whether to consider instance data of the topic when determining a match

The extractor is found in File > Extract > Classification > Word Extractor.

Example

The extractor's operations is demonstrated using the current description of Wandora found on wandora.org:

Wandora is a tool for people who collect and process information, especially networked knowledge and knowledge about WWW resources. With Wandora you can aggregate and combine information from various different sources. You can manipulate the collected knowledge flexible and efficiently, and without programming skills. More generally speaking Wandora is a general purpose information extraction, management and publishing application based on Topic Maps and Java. Wandora suits well for constructing and maintaining vocabularies, ontologies and information mashups. Application areas include linked data, open data, data integration, business intelligence, digital preservation and data journalism. Wandora's license is GNU GPL. Wandora application is developed actively by a small number of experienced software developers. We call ourselves as the Wandora Team.


First create a topic to contain this text data via Topics > New topic.

Simple word 1.png

Open the topic and add the text as English occurrence data. Use the default occurrence type by pressing Use default.

Simple word 2.png Simple word 3.png

Open the extractor via File > Extract > Classification > Word extractor

Simple word 4.png

In the raw panel specify words Java and GNU, which we expect to find in the description.

Simple word 5.png

The extractor reports two associations with the given words.

Simple word 6.png

These associations are then listed in the previously created topic.

Simple word 7.png

See also

Personal tools