text files

Forum is for miscellaneous user help requests.

text files

Postby k080080 » Thu Mar 08, 2012 10:59 pm

Hello!
I am a new user. I found Wandora quite useful. Can you please guide me if I can use this for offline data?
I have some 5000 ".txt" files and I want to know their tags and tag values.

also I want the tag words and their values as ".txt" file.

Thank You :)
k080080
 
Posts: 2
Joined: Thu Mar 08, 2012 10:54 pm

Re: text files

Postby akivela » Fri Mar 09, 2012 4:06 pm

Hello

First, extract all text documents using File > Extract > Simple files > Simple document extractor...

In Simple document extractor select Files tab and enter (or browse) the directory where your text files locate. Press Extract button.

Now Wandora contains a Document topic just below Wandora Class. You'll find instance topics below the Document topic. These topics represent your text files and contain an occurrence document-content that actually is the content of a file.

Next open topic Document to topic panel by double clicking it. Last topic table in the topic panel contains instances of the document. Select all rows in the table. Right mouse click the selection. Choose menu option Topics > Occurrences > Refine > With GATE Annie.... Wandora opens a dialog window. Click the No topic button beside label Type of occurrences and select topic document-content. You can find it easily using Finder tab. Next click the No topic button beside label Scope of occurrences. Select topic English language. And finally press OK button in Extract information from occurrences window.

Wandora extracts topics out of your text documents. As GATE Annie is integrated in Wandora distribution package, the extraction is probably very fast.

After the extraction finishes, you can look at topics representing text documents. Each topic is now associated with topics given by GATE Annie.

You should notice that Wandora supports several different tags-out-of-text extractors and GATE Annie is just one of them. Other extractors include Calais, AlchemyAPI, Yahoo YQL term extractor etc. You can apply different extractors to the same document set. It depends on your text documents which extractor works well.

Happy experimenting. If you have any problems with Wandora, please do drop a line.

Kind Regards,
Aki / Wandora Team

p.s. I almost forget that you can actually extract classifications out of text documents also using classifiers. For example, select menu option File > Extract > Classification > uClassify. A dialog with three tabs opens up. By default tab labeled Raw is selected. Select tab Files and enter your file names into the text files and proceed like usually....
akivela
Site Admin
 
Posts: 256
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Re: text files

Postby k080080 » Sat Mar 10, 2012 8:13 pm

Wow !!
Such a detailed reply !! Thanks ! it worked :)

issue : " How can I export the output (tags and their values) in a .txt file ?"

Thanks
k080080
 
Posts: 2
Joined: Thu Mar 08, 2012 10:54 pm

Re: text files

Postby akivela » Tue Mar 13, 2012 11:56 am

Hi

I think the most convenient way of exporting tags is via clipboard [1]. You can paste clipboard data to any spreadsheet application or text editor. Specific instructions follow:

* First open the association type topic used in tagging associations. Association type topic is viewed as a title of association table. To open association type right mouse click the label and select menu option Open association type topic in popup menu. Association type topic depends on the extractor you have used to tag your document topics. For example, if you used GATE Annie [2], the association type topic is GATE Annie entity type.
* Next select all rows in association table (titled with association type topic). The table contains all associations of this type.
* Right click the table selection and choose menu option Copy associations > Copy associations as tab text.
* Now Wandora performs a copy operation. Once finished, you have all the data in your clipboard. Now you can paste the data in any application you like for further analysis or processing.

Wandora has several different copy operations. Here is a bit different way of copying tags out of Wandora:

* First open the Document topic that has your files as instance topics.
* Select all instance topics (that represent your files).
* Right click the selection and choose menu option Topics > Copy also > Copy also players...
* Wandora asks for an association type. Locate the association type you want to copy. If you used GATE Annie, the association type is GATE Annie entity type.
* Next Wandora asks for a role topic. Role topic specifies the column of association table you want to copy. If you used GATE Annie, you probably want to copy GATE Annie entity.
* Wandora performs a copy. Once finished you can paste the data to any application you wish.

You may need to do some testing to find a best way of exporting your tags. Have any questions, please do drop a line.

Kind Regards,
Aki / Wandora Team


[1] http://www.wandora.org/wandora/wiki/ind ... _clipboard
[2] http://www.wandora.org/wandora/wiki/ind ... ntegration
akivela
Site Admin
 
Posts: 256
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland


Return to How to... and problems

Who is online

Users browsing this forum: No registered users and 1 guest