Moby thesaurus extractor

From WandoraWiki
Jump to: navigation, search

Wandora's Moby thesaurus extractor was developed to convert Moby's thesaurus to topic map format. Moby thesaurus is a specially formatted text file where each line contains a root word and similar words:

rootword similar1 similar2 similar3

Number of similar words varies. Extractor converts previous example line to three binary associations

rootword, similar1
rootword, similar2
rootword, similar3

Association type and roles remain same in all assocations. As Moby thesaurus is very large, you need to give JRE at least 2G of memory to successfully process whole thesaurus. Wandora's Moby thesaurus extractor starts with menu option File > Extract > Language > Moby thesaurus extractor.

Moby thesaurus is not included in Wandora application but you should easily find one as it is public domain. See Project Gutenberg for example.

Also note the input format can be used to construct not only word-relations but any associations, if you like. Just change the default association type and roles after extraction.

See also

Personal tools