MobyThesaurusExtractor

From WandoraWiki
Revision as of 13:24, 10 May 2008 by Akivela (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Tool reads Moby thesaurus file and converts if to a topic map. Moby thesaurus file is a simple text file where each line defines single word and related words. For example:

word1 relatedWord1 relatedWord2 relatedWord3 relatedWord4 ...
word2 relatedWord1 relatedWord2 relatedWord3 relatedWord4 ...

Extractor creates a topic for each word (including related words) and a binary association for each word-relatedWord pair. If word has four related words then extractor creates four associations. Notice the word may be a related word for some other word, increasing the overall number of associations one word eventually gets.

Moby thesaurus is public domain and can be acquired from http://www.gutenberg.org/etext/3202

As the Moby thesaurus contains hundreds of thousands words Wandora requires at least 2G of memory to extract complete thesaurus. Even with 2G of memory the application is rather unstable after extraction.

GUI name

Tool Class

org.wandora.application.tools.extractors.MobyThesaurusExtractor

Personal tools