Finding a topic
(→String similarity) |
(→String similarity) |
||
Line 42: | Line 42: | ||
* SoundEx distance | * SoundEx distance | ||
− | Similarity types are discussed in [http://www.dcs.shef.ac.uk/~sam/stringmetrics.html documentation of SimMetrics]. Addition to similarity type Wandora user can adjust similarity threshold. Similarity threshold | + | Similarity types are discussed in [http://www.dcs.shef.ac.uk/~sam/stringmetrics.html documentation of SimMetrics]. Addition to similarity type Wandora user can adjust similarity threshold. Similarity threshold is a value between 0 and 100. If similarity threshold is near 100, compared strings must be very similar and only minimal differences are allowed. If similarity threshold is near 0, compared strings can be very different and they are still considered as similar. |
− | Some similarity measures use '''Gap cost''' and '''Tokenizer''' settings. First specifies | + | Some similarity measures use '''Gap cost''' and '''Tokenizer''' settings. First specifies a penalty caused by a gap in word (space character). Latter is used to split words out of text. For example, it is very common to use space character (" ") to separate different words in text. |
− | '''Difference instead similarity''' option changes the similarity measure to difference measure. | + | '''Difference instead similarity''' option changes the similarity measure to difference measure. If selected, Wandora searches for strings that are maximally different compared to given string. |
− | + | Wandora views similarity search results in a separate dialog window. To open any topic in the search results Wandora user can double click a topic name in the results dialog. User can close the results dialog by clicking '''Close'''. If search results are unsatisfying, user can click '''Again''' button and tune similarity search settings. | |
=== Query scripts=== | === Query scripts=== |
Revision as of 21:03, 6 April 2010
Finder is used to locate and open topics. Finder locates beside the Topics tab as shown above.
Finder is a simple free text search. Finder tries to locate given search word in topics. You can search with any topic element or element combination. Search result appears below the search field. Double clicking a topic in the search result opens the topic into the topic panel. Right clicking a topic opens context menu with a large number of topic tools.
Search words used in Finder can contain Java specific regular expression characters such as dot. Finder doesn't restrict search word lenght. As an extreme example you could start search with a single dot and the search would result every topic in Wandora. Viewing very large result sets is time consuming and may cause OutOfMemoryExceptions in Wandora. This is especially true when you are accessing database topic maps.
Find
Addition to Finder tab, topics can be searched selecting Edit > Find or pressing CTRL-F in Wandora. Option opens Find dialog window as shown below
Writing a word to search field and pressing OK starts search. Search word is interpreted as a regular expression allowing rather complicated searches. Search results are viewed in a separate dialog window as shown below:
To open any topic in the search results double click topic name or right click topic name and select option Open topic. The search result dialog is closed by pressing Close button. The search dialog is restored by clicking Again. Addition to traditional regular expression searches Wandora features also string similarity search.
String similarity
String similarity tab is used to perform similarity searches to the topic map. String similarity allows Wandora user to search topics with strings that only resemble matched string. Wandora utilizes Sam Chapman's SimMetrics open source library to calculate string similarities. Below is a screenshot of Wandora's String similarity tab.
Available similarity types are
- Levenshtein distance
- Needleman-Wunch distance
- Smith-Waterman distance
- Block distance
- Monge Elkan distance
- Jaro distance
- Jaro Winkler
- SoundEx distance
Similarity types are discussed in documentation of SimMetrics. Addition to similarity type Wandora user can adjust similarity threshold. Similarity threshold is a value between 0 and 100. If similarity threshold is near 100, compared strings must be very similar and only minimal differences are allowed. If similarity threshold is near 0, compared strings can be very different and they are still considered as similar.
Some similarity measures use Gap cost and Tokenizer settings. First specifies a penalty caused by a gap in word (space character). Latter is used to split words out of text. For example, it is very common to use space character (" ") to separate different words in text.
Difference instead similarity option changes the similarity measure to difference measure. If selected, Wandora searches for strings that are maximally different compared to given string.
Wandora views similarity search results in a separate dialog window. To open any topic in the search results Wandora user can double click a topic name in the results dialog. User can close the results dialog by clicking Close. If search results are unsatisfying, user can click Again button and tune similarity search settings.
Query scripts
Query script tab is used to write and perform queries for Wandora's topic maps. A query is a little script that returns topics and string data distilled from the topic map. Wandora uses a non-standard query language that resembles functional languages such as LISP. Wandora's query language has a tutorial page of it's own.