Page 1 of 1

Does Wandora can automaticly extract the topic of text?

PostPosted: Tue Sep 15, 2009 4:34 pm
by Butty
Hello AKI,
I have known that Wandora provides a lot of extractors. The Simple Text Document Extractor can convert simple text documents to topic occurrence. It read the document, creates simple topic for the document and places the text content into an accurrence attached to the document topic.

I just wonder what is the meaning of "occurrence"? If the Simple Text Document Extractor can automatically extract the topic of text and visuaize the topic?

PostPosted: Wed Sep 16, 2009 11:11 am
by akivela
Hello

Although Topic Maps standard suggests occurrence is an information packet that instances topic's subject, I tend to think occurrences are simple free text properties attached to a topic. Occurrences, in a way, allow you to store free text to your topic map. Wandora's style to handle occurrences reflects this design decision.

Using Simple Text Document extractor creates a topic and attaches given text as an occurrence to the topic, just like you described. Simple Text Document extractor doesn't analyze the text. Text is simply stored as an occurrence.

However, Wandora contains features digging topics out of text occurrences. Say, you have used Simple Text Document extractor and have a topic with text occurrence. Now, you can open the occurrence editor by clicking the occurrence text cell (in Traditional topic panel) that shows first words of your occurrence. Occurrence editor is a simple text editor window where you can edit the occurrence text. Window has a menu bar with Refine menu. If you look into the Refine menu you should see menu options

* Make topics
* Find topics
* Classify
* Insert

Classify contains submenus Classify with OpenCalais and Classify with SemanticHacker. Selecting OpenCalais option sends your occurrence text to OpenCalais (http://www.opencalais.com) and retrieves topics describing the occurrence text. Then Wandora associates your topic (occurrence carrier) with all retrieved topics. You might say Wandora in a way resolves topics for free occurrence text, although external webservice (OpenCalais) is used to do the actual topic extraction.

Selecting Classify with SemanticHacker does similar operation but uses http://www.semantichacker.com to retrieve occurrence text topics. You need a valid SemanticHacker user token to use Classify with SemanticHacker option.

Wandora contains also separate extractors for OpenCalais and SemanticHacker in File > Extract > Classification. OpenCalais extractor is discussed at http://www.wandora.org/wandora/wiki/index.php?title=OpenCalais_classifier. SemanticHacker extractor is discussed at http://www.wandora.org/wandora/wiki/index.php?title=SemanticHacker_classifier.

I hope I answered your question. If not, please drop a line.

Kind Regards,
Aki Kivelä
Wandora Team