Drag and drop extractor

From WandoraWiki
Jump to: navigation, search

Wandora version 2015-02-03 (and after) contains Drop extractor that essentially behaves like Wandora's old Drag and drop extractor. However, there are differences too. Now the user can have any number of Drop extractors open simultaneously, and the Drop extractor includes log tab. Log tab collects all log messages written during extractions. Also, the extractor is visually little different compared to the original Drag and drop extractor.

To open Drop extractor choose menu option View > New panel > Drop extractor. Wandora adds new Drop extractor panel into the panel area. The screen capture below views Wandora's application window with one Drop extractor open. The Drop extractor has no extractor selected.

Drop extractor in wandora.gif

To start using the extractor, the user should first select specific extractor. The specific extractor is selected by mouse clicking anywhere in the white background of Drop extractor. The mouse click opens up a popup menu of available extractors that consume files, urls and text snippets. Once the user has selected specific extractor, the text No extractor selected on Drop extractor should change to the name of selected extractor. Now the user can drop files, urls and text into the Drop extractor and it tries to apply the selected extractor to the dropped things. It depends on the selected extractor what happens during the extractor. Usually the selected extractor creates some topics and associations to current topic map.

Drop extractor pipes all log messages generated by the selected extractor to the log tab. The user should notice, the Drop extractor may bypass some initializations of selected extractor as it uses the selected extractor's extraction methods directly. This may change the behavior of some extractors and some extractors may even become unusable because of omitted initialization. The user should also notice that most extractors are assuming some input format. Feeding a jpeg image file to Firefox bookmark extractor may not result anything but exception messages.

Simple document extractor

Simple document extractor may be easiest extractor to test drive Wandora's Drop extractor. It activates by selecting Simple files > Simple document extractor.

Simple document extractor takes a file or a set of files and creates one topic for each file. All created file topics have subject identifier and locator derived from the location of file in file system. All created file topics have occurrences for document content, file name and extraction time, too. Wandora adds Document topic as a type for all file topics. Next image views Wandora after extracting Wandora's startup script Wandora.bat with the Simple document extractor.

Drop extractor simple document 1.gif

Simple document extractor extracts not only text files but binary file formats too. It transforms pdf, rtf, doc, docx, ppt and vsd files to plain text. Other binary files are stored as such. Thus, extracting large binary files is not suggested.

Simple document extractor accepts also text drops. Wandora behaves as the dropped text was an anonymous file and creates a document topic for the text but constructs a default identifiers for the topic.

Alchemy image keywords extractor

This example shows how to tag local images with Alchemy image keywords extractor. First, select drop extractor Classification > Alchemy image keyword extractor, then, drop a local image file to the Drop extractor. The Alchemy image keywords extractor asks your personal API key until it proceeds with the extraction. Enter your API key to the input field. If everything went successfully, the user should see Alchemy topic under the Wandora class. Next image views Wandora after two Alchemy image keyword extractions. Notice, Wandora doesn't ask the Alchemy API key after it has entered. To remove the stored API key either restart the application or remove the stored API key by holding CTRL key down while selecting top menu option File > Extract > Classifications > Alchemy image keyword extractor....

Drop extractor alchemy image.gif

OCR extractor

Another image processing drop extractor is OCR extractor that automates text recognition of images. First, ensure you have installed Tesseract OCR engine and configured Wandora's configuration file SetTesseract.sh|bat in Wandora's bin folder. After successful installation and configuration select drop extractor Media > OCR extractor and drop any image containing text to the Drop extractor. Next image views Wandora after successful OCR extraction. Second column views the extracted poster image and third column the occurrences of extracted topic. Poster image's size was 2481*3391 pixels.

Drop extractor ocr.gif

Personal tools