Drag and drop extractor

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
(Simple Document Extractor)
(Alchemy image keywords extractor)
 
(17 intermediate revisions by one user not shown)
Line 11: Line 11:
 
Drop extractor pipes all log messages generated by the selected extractor to the log tab. The user should notice, the Drop extractor may bypass some initializations of selected extractor as it uses the selected extractor's extraction methods directly. This may change the behavior of some extractors and some extractors may even become unusable because of omitted initialization. The user should also notice that most extractors are assuming some input format. Feeding a jpeg image file to '''Firefox bookmark extractor''' may not result anything but exception messages.
 
Drop extractor pipes all log messages generated by the selected extractor to the log tab. The user should notice, the Drop extractor may bypass some initializations of selected extractor as it uses the selected extractor's extraction methods directly. This may change the behavior of some extractors and some extractors may even become unusable because of omitted initialization. The user should also notice that most extractors are assuming some input format. Feeding a jpeg image file to '''Firefox bookmark extractor''' may not result anything but exception messages.
  
== Simple Document Extractor ==
+
== Simple document extractor ==
  
'''Simple Document Extractor''' may be easiest extractor to test drive Wandora's Drop extractor. It activates by selecting '''Simple files > Simple document extractor'''.
+
'''Simple document extractor''' may be easiest extractor to test drive Wandora's Drop extractor. It activates by selecting '''Simple files > Simple document extractor'''.
  
Simple Document Extractor takes a file or a set of files and creates one topic for each file. All created file topics have subject identifier and locator derived from the location of file in file system. All created file topics have occurrences for document content, file name and extraction time, too. Wandora adds Document topic as a type for all file topics. Next image views Wandora after extracting Wandora's startup script '''Wandora.bat''' with the Simple Document extractor.
+
Simple document extractor takes a file or a set of files and creates one topic for each file. All created file topics have subject identifier and locator derived from the location of file in file system. All created file topics have occurrences for document content, file name and extraction time, too. Wandora adds Document topic as a type for all file topics. Next image views Wandora after extracting Wandora's startup script '''Wandora.bat''' with the Simple document extractor.
  
  
 
[[File:Drop_extractor_simple_document_1.gif|center]]
 
[[File:Drop_extractor_simple_document_1.gif|center]]
  
== Old documentation ==
 
  
'''Wandora versions released after January of 2013 don't contain Drag and drop extractor feature! The feature has been removed temporarily as it was not compatible with the implementation of docking topic panels. We'll be looking for new ways to implement the drag and drop extractor and inform changes here.'''
+
Simple document extractor extracts not only text files but binary file formats too. It transforms pdf, rtf, doc, docx, ppt and vsd files to plain text. Other binary files are stored as such. Thus, extracting large binary files is not suggested.  
  
 +
Simple document extractor accepts also text drops. Wandora behaves as the dropped text was an anonymous file and creates a document topic for the text but constructs a default identifiers for the topic.
  
If you look carefully Wandora's startup screen you may notice special icon right bottom corner of the Wandora window. This icon of arrows entering circle marks the drag and drop extractor in Wandora. Clicking on the icon opens extractor selection menu. Selected extractor name is shown left to the icon. After you have selected the extractor you can simple drag and drop not only files and folders but also texts from WWW browser over the icon, and Wandora starts selected extractor for the given source. This is very efficient method to construct a topic map out of files. All topics and associations created by the extractor are stored to the selected layer. Below is an example of drag and drop extractor after the user has selected JPG extractor:
+
== Alchemy image keywords extractor ==
  
[[Image:drag_drop_extrator.gif|center]]
+
This example shows how to tag local images with Alchemy image keywords extractor. First, select drop extractor '''Classification > Alchemy image keyword extractor''', then, drop a local image file to the Drop extractor. The Alchemy image keywords extractor asks your personal API key until it proceeds with the extraction. Enter your API key to the input field. If everything went successfully, the user should see Alchemy topic under the Wandora class. Next image views Wandora after two Alchemy image keyword extractions. Notice, Wandora doesn't ask the Alchemy API key after it has entered. To remove the stored API key either restart the application or remove the stored API key by holding CTRL key down while selecting top menu option '''File > Extract > Classifications > Alchemy image keyword extractor...'''.
  
If you have opened a topic to the topic panel the drag and drop icon is not available. To get the drag and drop extractor back close current topic with '''Topics > Close topic''' or press '''CTRL+W'''.
 
  
== Drag and Drop Example ==
+
[[File:Drop_extractor_alchemy_image.gif|center]]
  
This example shows how you can converts BibTeX fragments to topics and associations with Wandora's BibTex extractor. Open Wandora application and activate '''BibTeX extractor''' in drag and drop extractor. Now Wandora window should look like this
+
== OCR extractor ==
  
 +
Another image processing drop extractor is [[OCR Extractor|OCR extractor]] that automates text recognition of images. First, ensure you have installed [https://code.google.com/p/tesseract-ocr/ Tesseract OCR engine] and configured Wandora's configuration file '''SetTesseract.sh|bat''' in Wandora's '''bin''' folder. After successful installation and configuration select drop extractor '''Media > OCR extractor''' and drop any image containing text to the Drop extractor. Next image views Wandora after successful OCR extraction. Second column views the extracted [http://wandora.org/wandora/download/other/poster09.gif poster image] and third column the occurrences of extracted topic. Poster image's size was 2481*3391 pixels.
  
[[Image:dragdrop_example_02.gif|center]]
 
  
 
+
[[File:drop_extractor_ocr.gif]]
Start your WWW browser and open page ''http://citeseer.ist.psu.edu/pepper00tao.html''. This page contains citation information about Steve Pepper's famous TAO article. If you scroll down you may also notice a BibTeX entry of the article. Now select the BibTeX entry as show below
+
 
+
 
+
[[Image:dragdrop_example_01.gif|center]]
+
 
+
 
+
And drag the selected text to Wandora's drag and drop extractor. Wandora automatically extracts the dropped entry and constructs topics for the article, author, and publishing year. Notice Wandora uses temporary file to store dropped text fragment.
+
 
+
 
+
[[Image:dragdrop_example_03.gif|center]]
+
 
+
 
+
If you look at the article topic closer it should look now something like this
+
 
+
 
+
[[Image:dragdrop_example_04.gif|center]]
+

Latest revision as of 19:10, 4 February 2015

Wandora version 2015-02-03 (and after) contains Drop extractor that essentially behaves like Wandora's old Drag and drop extractor. However, there are differences too. Now the user can have any number of Drop extractors open simultaneously, and the Drop extractor includes log tab. Log tab collects all log messages written during extractions. Also, the extractor is visually little different compared to the original Drag and drop extractor.

To open Drop extractor choose menu option View > New panel > Drop extractor. Wandora adds new Drop extractor panel into the panel area. The screen capture below views Wandora's application window with one Drop extractor open. The Drop extractor has no extractor selected.


Drop extractor in wandora.gif


To start using the extractor, the user should first select specific extractor. The specific extractor is selected by mouse clicking anywhere in the white background of Drop extractor. The mouse click opens up a popup menu of available extractors that consume files, urls and text snippets. Once the user has selected specific extractor, the text No extractor selected on Drop extractor should change to the name of selected extractor. Now the user can drop files, urls and text into the Drop extractor and it tries to apply the selected extractor to the dropped things. It depends on the selected extractor what happens during the extractor. Usually the selected extractor creates some topics and associations to current topic map.

Drop extractor pipes all log messages generated by the selected extractor to the log tab. The user should notice, the Drop extractor may bypass some initializations of selected extractor as it uses the selected extractor's extraction methods directly. This may change the behavior of some extractors and some extractors may even become unusable because of omitted initialization. The user should also notice that most extractors are assuming some input format. Feeding a jpeg image file to Firefox bookmark extractor may not result anything but exception messages.

[edit] Simple document extractor

Simple document extractor may be easiest extractor to test drive Wandora's Drop extractor. It activates by selecting Simple files > Simple document extractor.

Simple document extractor takes a file or a set of files and creates one topic for each file. All created file topics have subject identifier and locator derived from the location of file in file system. All created file topics have occurrences for document content, file name and extraction time, too. Wandora adds Document topic as a type for all file topics. Next image views Wandora after extracting Wandora's startup script Wandora.bat with the Simple document extractor.


Drop extractor simple document 1.gif


Simple document extractor extracts not only text files but binary file formats too. It transforms pdf, rtf, doc, docx, ppt and vsd files to plain text. Other binary files are stored as such. Thus, extracting large binary files is not suggested.

Simple document extractor accepts also text drops. Wandora behaves as the dropped text was an anonymous file and creates a document topic for the text but constructs a default identifiers for the topic.

[edit] Alchemy image keywords extractor

This example shows how to tag local images with Alchemy image keywords extractor. First, select drop extractor Classification > Alchemy image keyword extractor, then, drop a local image file to the Drop extractor. The Alchemy image keywords extractor asks your personal API key until it proceeds with the extraction. Enter your API key to the input field. If everything went successfully, the user should see Alchemy topic under the Wandora class. Next image views Wandora after two Alchemy image keyword extractions. Notice, Wandora doesn't ask the Alchemy API key after it has entered. To remove the stored API key either restart the application or remove the stored API key by holding CTRL key down while selecting top menu option File > Extract > Classifications > Alchemy image keyword extractor....


Drop extractor alchemy image.gif

[edit] OCR extractor

Another image processing drop extractor is OCR extractor that automates text recognition of images. First, ensure you have installed Tesseract OCR engine and configured Wandora's configuration file SetTesseract.sh|bat in Wandora's bin folder. After successful installation and configuration select drop extractor Media > OCR extractor and drop any image containing text to the Drop extractor. Next image views Wandora after successful OCR extraction. Second column views the extracted poster image and third column the occurrences of extracted topic. Poster image's size was 2481*3391 pixels.


Drop extractor ocr.gif

Personal tools