Refining occurrences

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
(Occurrence editor)
(Occurrence editor)
Line 16: Line 16:
  
 
* '''Make topics with selection'''. This option is used to transform selected text fragment to a topic. Topic's base name will be the text selection. Wandora creates a random subject identifier for the topic. Created topic is '''not''' associated with the occurrence carrier topic.
 
* '''Make topics with selection'''. This option is used to transform selected text fragment to a topic. Topic's base name will be the text selection. Wandora creates a random subject identifier for the topic. Created topic is '''not''' associated with the occurrence carrier topic.
* '''Make topics with selection and associate'''. Option creates a topic and sets selected text fragment to topic's base name. Furthermore, option gives a random subject identifier to topic. Created topic is associated with the occurrence carrier topic.
+
* '''Make topics with selection and associate'''. Option creates a topic and sets selected text fragment to topic's base name. Furthermore, option gives a random subject identifier to topic. Created topic is associated with the occurrence carrier topic. Association type is '''Occurrence distilled association''' and roles '''Occurrence carrier''' and '''Occurrence distilled topic'''.
* '''Find topics in occurrences...'''
+
* '''Find topics in occurrences...'''. This option is useful when you have an existing vocabulary and want to spot vocabulary terms out of given occurrence.  Option loops over all topics in Wandora and tries to find topic's base name in occurrence text. If occurrence text contains topic's base name, topic is associated with the occurrence carrier. Association type is '''Occurrence association''' and roles '''Occurrence container''' and '''Topic in occurrence'''.
* '''Find topics with similar occurrence...'''
+
* '''Find topics with similar occurrence...''' option searches for topics containing similar occurrence text. Occurrence texts are compared using ''Levensthein distance'' with threshold value 0.75. To change the distance metric and threshold value, keep CTRL key pressed while starting the menu option. Wandora uses [http://staffwww.dcs.shef.ac.uk/people/S.Chapman/simmetrics.html Simmetrics library] for measuring string similarity. Available metrics are [http://staffwww.dcs.shef.ac.uk/people/S.Chapman/simmetrics/uk/ac/shef/wit/simmetrics/similaritymetrics/package-summary.html here].
 
* '''Classify with OpenCalais'''
 
* '''Classify with OpenCalais'''
 
* '''Classify with SemanticHacker'''
 
* '''Classify with SemanticHacker'''

Revision as of 22:48, 30 November 2010

Occurrence is a resource attached to a topic. Topic Maps standards defines occurrence as the instance of subject, the topic represents. However, usually occurrence is thought more generally a property of the topic. Occurrence resource can be almost anything. Usually occurrence resource is an URL addressing networked resource or a literal, a resource itself. In the URL resource, the resource data is stored outside a topic map while the literal resource is in the topic map. Wandora supports only literal resources although Wandora may handle the literal as an URL. A literal resource is essentially a text.

It is a valid conclusion that occurrence as a text fragment contains information that somehow relates to the topic carrying the occurrence. On the other hand, topics are connected with each other. These connections, associations in Topic Maps vocabulary, represent also information related to topics they connect. Saying this, it becomes quite natural to ask a question, how one could distill associations out of occurrence resources. And more over, could one pack information in associations into occurrences. This page discusses the first option of transforming occurrences to associations (and topics, of course).

First we'll look at refining options of occurrence editor. After that, this page shows general occurrence transformation options of Wandora. Finally, we'll investigate Wandora's batch extractors used to refine occurrences.

Contents

Occurrence editor

Wandora's occurrence editor is a simple text editor window used to modify occurrence text. Wandora opens occurrence editor window when user mouse clicks an occurrence text cell in occurrence table of topic panel. For example, user has a topic Wandora application open in topic panel (see image below). This topic has one occurrence of type description. Occurrence's scope is English. First words of the occurrence data is viewed in the table cell.


Refining occurrence editor 01.gif


When user clicks the cell in occurrence table, an editor window is opened (see image below). This window is an occurrence editor. It has a menu called Refine. In this menu, occurrence editor contains few refining options for the occurrence text. At the moment these refining options are

  • Make topics with selection. This option is used to transform selected text fragment to a topic. Topic's base name will be the text selection. Wandora creates a random subject identifier for the topic. Created topic is not associated with the occurrence carrier topic.
  • Make topics with selection and associate. Option creates a topic and sets selected text fragment to topic's base name. Furthermore, option gives a random subject identifier to topic. Created topic is associated with the occurrence carrier topic. Association type is Occurrence distilled association and roles Occurrence carrier and Occurrence distilled topic.
  • Find topics in occurrences.... This option is useful when you have an existing vocabulary and want to spot vocabulary terms out of given occurrence. Option loops over all topics in Wandora and tries to find topic's base name in occurrence text. If occurrence text contains topic's base name, topic is associated with the occurrence carrier. Association type is Occurrence association and roles Occurrence container and Topic in occurrence.
  • Find topics with similar occurrence... option searches for topics containing similar occurrence text. Occurrence texts are compared using Levensthein distance with threshold value 0.75. To change the distance metric and threshold value, keep CTRL key pressed while starting the menu option. Wandora uses Simmetrics library for measuring string similarity. Available metrics are here.
  • Classify with OpenCalais
  • Classify with SemanticHacker
  • Classify with Bing search engine...
  • Alchemy entity extractor
  • Alchemy keyword extractor
  • Alchemy category extractor
  • Alchemy language extractor


Refining occurrence editor 02.gif

Occurrence refining options

Batch refining occurrences

Conclusions

Personal tools