Refining occurrences

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
(Batch refining occurrences)
(Occurrence refining options)
Line 33: Line 33:
 
In this chapter you have learned how Wandora user can extract topics out of single occurrence using refine menu options in occurrence editor. Next chapters discuss options used to refine multiple occurrences at once.
 
In this chapter you have learned how Wandora user can extract topics out of single occurrence using refine menu options in occurrence editor. Next chapters discuss options used to refine multiple occurrences at once.
  
== Occurrence refining options ==
 
  
  

Revision as of 00:00, 1 December 2010

Occurrence is a resource attached to a topic. Topic Maps standards defines occurrence as the instance of subject, the topic represents. However, usually occurrence is thought more generally a property of the topic. Occurrence resource can be almost anything. Usually occurrence resource is an URL addressing networked resource or a literal, a resource itself. In the URL resource, the resource data is stored outside a topic map while the literal resource is in the topic map. Wandora supports only literal resources although Wandora may handle the literal as an URL. A literal resource is essentially a text.

It is a valid conclusion that occurrence as a text fragment contains information that somehow relates to the topic carrying the occurrence. On the other hand, topics are connected with each other. These connections, associations in Topic Maps vocabulary, represent also information related to topics they connect. Saying this, it becomes quite natural to ask a question, how one could distill associations out of occurrence resources. And more over, could one pack information in associations into occurrences. This page discusses the first option of transforming occurrences to associations (and topics, of course).

First we'll look at refining options of occurrence editor. After that, this page shows general occurrence transformation options of Wandora. Finally, we'll investigate Wandora's batch extractors used to refine occurrences.

Occurrence editor

Wandora's occurrence editor is a simple text editor window used to modify occurrence text. Wandora opens occurrence editor window when user mouse clicks an occurrence text cell in occurrence table of topic panel. For example, user has a topic Wandora application open in topic panel (see image below). This topic has one occurrence of type description. Occurrence's scope is English. First words of the occurrence data is viewed in the table cell.


Refining occurrence editor 01.gif


When user clicks the cell in occurrence table, an editor window is opened (see image below). This window is an occurrence editor. It has a menu called Refine. In this menu, occurrence editor contains few refining options for the occurrence text. At the moment these refining options are

  • Make topics with selection. This option is used to transform selected text fragment to a topic. Topic's base name will be the text selection. Wandora creates a random subject identifier for the topic. Created topic is not associated with the occurrence carrier topic.
  • Make topics with selection and associate. Option creates a topic and sets selected text fragment to topic's base name. Furthermore, option gives a random subject identifier to topic. Created topic is associated with the occurrence carrier topic. Association type is Occurrence distilled association and roles Occurrence carrier and Occurrence distilled topic.
  • Find topics in occurrences.... This option is useful when you have an existing vocabulary and want to spot vocabulary terms out of given occurrence. Option loops over all topics in Wandora and tries to find topic's base name in occurrence text. If occurrence text contains topic's base name, topic is associated with the occurrence carrier. Association type is Occurrence association and roles Occurrence container and Topic in occurrence.
  • Find topics with similar occurrence... option searches for topics containing similar occurrence text. Occurrence texts are compared using Levensthein distance with threshold value 0.75. To change the distance metric and threshold value, keep CTRL key pressed while starting the menu option. Wandora uses Simmetrics library to measure string similarity. Available metrics are here.
  • Classify with OpenCalais option sends the occurrence text or selected occurrence text fragment to OpenCalais web service for classification. Service returns keywords which are transformed to topics and associated to occurrence carrier topic. OpenCalais classifier is also a general extractor in Wandora.
  • Classify with SemanticHacker option sends occurrence text to SemanticHacker web service. Service returns a set of weighted terms and Wandora transforms these to topics, and associates topics to the occurrence carrier. SemanticHacker classifier is also a general extractor in Wandora. Personal API key is required for SemanticHacker classification.
  • Classify with Bing search engine... option takes selected occurrence text fragment and sends it to the Bing search engine, receives web addresses, and transforms these web addresses to topics. Finally option associates these web resource topics with the occurrence carrier. Personal Bing API key is required for Bing classification.
  • Alchemy entity extractor option sends occurrence text to AlchemyAPI web service. Service returns named entities and Wandora transforms these entities to topics. Finally option associates created entity topics with the occurrence carrier. AlchemyAPI extractors have also several other uses in Wandora. Personal API key is required for AlchemyAPI extractions.
  • Alchemy keyword extractor option is similar to Alchemy entity extractor but handles keywords instead of named entities.
  • Alchemy category extractor option sends occurrence text to AlchemyAPI web service and transforms received category or categories to topics, and associates category topic with the occurrence carrier.
  • Alchemy language extractor option sends occurrence text to AlchemyAPI web service and transforms received language to a topic, and associates the language topic with the occurrence carrier.


Refining occurrence editor 02.gif


In this chapter you have learned how Wandora user can extract topics out of single occurrence using refine menu options in occurrence editor. Next chapters discuss options used to refine multiple occurrences at once.


Batch refining occurrences

Wandora features several options to refine multiple occurrence at once. These options locate in context menus that appear when you right click a topic selection or a complete topic map layer. Actual context menu path is Topics > Occurrences > Refine. Under the Refine menu you'll find options

  • With OpenCalais classifier...
  • With Alchemy entity extractor...
  • With Alchemy keyword extractor...
  • With Alchemy category extractor...
  • With Alchemy category extractor...
  • With Yahoo! YQL term extractor...
  • With GeoNames near by extractor...
  • Find associations in occurrences...
  • Find associations in occurrences using pattern...

Conclusions

Personal tools