Postby Aparna Lalingkar » Tue Jan 19, 2010 11:10 am

I am not able to extract topic map or even list of topics, associations and occurrences of any simple text file though I am following the directions. I could load the wordnet xtm file. I am not able to see list of topics, associations and occurrences in it? How can I do it? I tried to extract a doc file as well as pdf. It is not getting loaded. what would be the possible error? :?:
Aparna Lalingkar
Posts: 13
Joined: Tue Jan 19, 2010 11:02 am

Postby akivela » Tue Jan 19, 2010 2:22 pm

Hello Aparna

You sure have many non-working features in your Wandora :)

Some extractors in Wandora really don't add (or link) topics to the topic tree. Thus you can't directly see extracted topics nor associations. To dig up extracted topics and association you can use Wandora's Finder tab beside topic tree tab. Or you can try to search topics with Find feature [1].

I recall that extracting HTML lists as described in [2] and [3] doesn't add anything to the topic tree. You need to do it manually as the Wandora user.

The Wordnet problem you described may be a consequence of
* Insufficient memory. To use Wordnet topic map you need to start Wandora with shell script "wandora-huge.bat" or "". Both scripts are in Wandora's bin folder.
* You have opened Wordnet's XTM file with Wandora's Open Project feature. XTM files are not valid Wandora projects [4]. To import XTM file to Wandora, use File > Import > Topic Map import... feature.

If you still can't find Wordnet's or extracted topics nor associations, you can check if your topic map has more topics and associations after extraction. See Topic Map Info [5] before and after extraction, and compare number of topics and associations. If numbers are equal, your extraction has failed and you have to think other solutions for the problem.

In general, a good starting point to detect problems in Wandora is to start the application with bat or sh scripts found in Wandora's bin folder, and see what is going on in the console window.

If these notes didn't help you, please drop a line and try to describe the problem more detailed (what specific actions you did), and I'll try to find a solution for you.

Kind Regards,
Aki Kivelä
Wandora Team

[1] ... ng_a_topic
[2] ... _extractor
[3] ... _extractor
[4] ... ad_project
[5] ... c_map_info
Site Admin
Posts: 260
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Postby Aparna Lalingkar » Wed Jan 20, 2010 10:37 am

Many thanks Aki Kivelä,

I will try and get back to you if the problem continues. :)
Aparna Lalingkar
Posts: 13
Joined: Tue Jan 19, 2010 11:02 am

Postby Aparna Lalingkar » Wed Jan 20, 2010 11:02 am

Hi Aki Kivelä,

Do you have any user guide for Wandora tool kit? If yes, can you send the soft copy to me? This tool kit will be very much useful for my research, if I could create topic maps automatically by using Wandora (with little manual modification) will be of great help.

I opened Wandora by using huge.bat file in the bin and imported the wordnet topic map. While getting imported it shows to contain almost app. 115000 Topics and Roles and 136000 Associations. I want to see all the topics listed or associations listed. I am seeing the general association types such as hypernym, part-meronym etc. I want to see for example Topic Type as Living Things, Non-living things, Mammals etc.

Can I just get a list of Topics and Associations for a given Web page or text, pdf by using Wandora?[/img]
Aparna Lalingkar
Posts: 13
Joined: Tue Jan 19, 2010 11:02 am

Postby akivela » Wed Jan 20, 2010 11:39 am

Hi Aparna

Wandora's documentation is at [1]. As the page states, documentation is a work in progress and incomplete as our focus is on application development at the moment. However, if you have specific questions, I'll be happy to answer.

Yes, Wordnet contains about 115000 topics and 137000 associations. Wandora wiki's Wordnet download page at [2] contains more specific numbers.

To start using Wordnet, you probably want to open the Wordnet topic that appears below Wandora class after you have imported Wordnet XTM to Wandora. Click the blue handle next to the Wordnet topic and you'll see instances of the topic. Next click the handle next to the Association-Types topic and you'll see all association types in Wordnet. If you double click the MemberMeronymOf topic, Wandora opens the topic to the right column and you are able to browse all member-meronym associations of Wordnet. Depending on your computer, opening the topic may take few seconds as the number of associations is high (>10000). And now you can start browsing any topic you see. Just double click a topic in tree or topic tables [3] and Wandora opens the topic for you.

If you wish to find specific topic, say mammal, you can use the Finder tab. Just write mammal to the search field and hit Search button. However, as Wordnet is just a general vocabulary, it doesn't cover biological phenomenon completely. For example, Wordnet doesn't contain types Living-Thing or Non-Living-Things at all. If you wish to examine such types, I suggest you take a look at Topic Maps conversion of OpenCYC [4].

Kind Regards,
Aki / Wandora Team

[1] ... umentation
[2] ... of_WordNet
[3] ... pic_tables
[4] ... of_OpenCyc
Site Admin
Posts: 260
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Postby akivela » Wed Jan 20, 2010 3:58 pm

Oops, I forgot to answer your last question about getting topics and associations related to given web page, text or pdf.

Wandora it self features no such subject extractor for free text. However, Wandora contains several tools that can do content extractions with the help of an external web service. See [1], [2], and [3] for example.

Notice, the quality of subject extraction depends on used web service and varies a lot.

Kind Regards,
Aki / Wandora Team

[1] ... classifier
[2] ... classifier
[3] ... extractors
Site Admin
Posts: 260
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Postby Aparna Lalingkar » Thu Jan 21, 2010 11:03 am

Hi Aki,

Many thanks for all quick replies. Documentation says that Wandora can extract Topic Map from given word, txt, pdf or jpeg files. I tried to extract a TM from a txt file. It does create a TM but only one useful node i.e. heading of the txt file. Rest of the nodes are supr-class, role, Wandora language etc. which are not useful for me. I am not seeing any useful node in the extracted TM. Why is it so? I am doing something wrong? I did: File > Extract>Simple Files> Txt and then the browse window appeared thorugh which I selected a txt file and then clicked on extract. In the Topic panel column, under the Wandora class the heading of the txt file is appearing as a class but further nothing is happening. what should I do to create/extract a simple TM from a word document or pdf file?


Aparna Lalingkar
Posts: 13
Joined: Tue Jan 19, 2010 11:02 am

Postby akivela » Thu Jan 21, 2010 11:29 am

Hi Aparna

Yes, your right. Wandora's Simple text document extractor creates only one topic that represents the extracted file and attaches file content to an occurrence typed document-text. In other words, Wandora doesn't analyze the file content.

If you wish to analyze the file content and extract additional topics that relate to the extracted file, you can use Wandora's OpenCalais, SemanticHacker, and AlchemyAPI extractors, for example. See my reply above for documentation links for these extractors.

If you started using Simple text document extractor, you can refine file content to additional topics and associations using occurrence editor's Refine > Classify menu options. Just click on the occurrence text and an occurrence editor window opens, make text selection, and choose Refine > Classify > Classify with OpenCalais for example.

Kind Regards,
Aki / Wandora Team
Site Admin
Posts: 260
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Postby Aparna Lalingkar » Thu Jan 21, 2010 11:44 am

Thanks Aki for your quick reply.

where is this occurrence editor and refine tab? I am not able to see it in Wandora.


Aparna Lalingkar
Posts: 13
Joined: Tue Jan 19, 2010 11:02 am

Postby akivela » Thu Jan 21, 2010 12:05 pm

To open the occurrence editor

1. Open topic that carries the occurrence. In our example that would be the extracted document topic. When topic is open, you should see it on right column of Wandora window and you can edit topic's internal structures such as base name, variant names etc.

2. Locate content box that is labeled Occurrences. In the content box, there should be a table with columns Occurrence type, English, Finnish etc. In our example, there should also be rows labeled extraction time and document-text. If your extraction was successful, you should also see first words of your document in one cell of occurrence table.

3. Click left mouse button on occurrence table cell you want to edit. In our example, that cell would be the one with extracted text.

4. Occurrence editor opens and shows the occurrence text completely. Occurrence editor window has a menu bar and the rightmost menu is named Refine.

Notice, most refine options use text selection. In other words, you have to select occurrence text that is being used in refine operation.

Kind Regards,
Aki / Wandora Team
Site Admin
Posts: 260
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Postby Aparna Lalingkar » Thu Jan 21, 2010 12:18 pm

Thanks for the quick and useful reply.

I tried to follow your instructions and was successful in refine > classify> classify with opencalais. It gave an error window. For other two i.e. SemanticHacker and AlchemyAPI it asked for input.

What should I do to rectify that error message and get the work done?

How can I get APIkey for alchemy?

How can I get valid semanticHacker token?


Aparna Lalingkar
Posts: 13
Joined: Tue Jan 19, 2010 11:02 am

Postby Aparna Lalingkar » Thu Jan 21, 2010 12:34 pm

Hi Aki,

I applied for alchemy API key and semntic Hacker token.

Do not know how to rectify the error given by OpenCalais.


Aparna Lalingkar
Posts: 13
Joined: Tue Jan 19, 2010 11:02 am

Postby akivela » Thu Jan 21, 2010 12:45 pm

Can you give me the actual error message Wandora generates for OpenCalais classification.

Some general rules for successful OpenCalais classification are that your text should be ascii, plain-text (non-structured), short, and written in english.

Kind Regards,
Aki / Wandora Team
Site Admin
Posts: 260
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Postby Aparna Lalingkar » Thu Jan 21, 2010 1:03 pm

I can print screen and save the image and send it to you. But here I cannot attach any file. It seems.

The message is not getting selected. So I am not able to copy it.

The error message starts with:

I got the semanticHacker token and tried but it is giving the same error message.

For semanticHacker classification error message starts with: " Connection refused: connect at Method)........"

I hope this might help you to get the error.

thanks and regards,

Aparna Lalingkar
Posts: 13
Joined: Tue Jan 19, 2010 11:02 am

Postby akivela » Thu Jan 21, 2010 1:15 pm

Do you have a firewall? It looks like Wandora is trying to access network but something prevents the connection.

If you have a firewall, you should allow Java to access network. When Java can access network, Wandora is also able to access network. Unfortunately I can't give you instructions on how to tune your firewall rules.

Kind Regards,
Aki / Wandora
Site Admin
Posts: 260
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland


