Drag and drop extractor
(→Simple Document Extractor) |
(→Simple Document Extractor) |
||
Line 21: | Line 21: | ||
− | Simple Document Extractor extracts not only text files but binary file formats too. It transforms pdf, rtf, doc, docx, ppt and vsd files to plain text. Other binary files are stored as such. Thus, extracting large binary files is not suggested. | + | Simple Document Extractor extracts not only text files but binary file formats too. It transforms pdf, rtf, doc, docx, ppt and vsd files to plain text. Other binary files are stored as such. Thus, extracting large binary files is not suggested. |
+ | |||
+ | Simple Document Extractor accepts also text drops. Text can be dragged from WWW browser, for example. Wandora behaves as the dropped text was an anonymous file and creates a document topic for the text but constructs a default identifiers for the topic. | ||
== Old documentation == | == Old documentation == |
Revision as of 15:41, 4 February 2015
Wandora version 2015-02-03 (and after) contains Drop extractor that essentially behaves like Wandora's old Drag and drop extractor. However, there are differences too. Now the user can have any number of Drop extractors open simultaneously, and the Drop extractor includes log tab. Log tab collects all log messages written during extractions. Also, the extractor is visually little different compared to the original Drag and drop extractor.
To open Drop extractor choose menu option View > New panel > Drop extractor. Wandora adds new Drop extractor panel into the panel area. The screen capture below views Wandora's application window with one Drop extractor open. The Drop extractor has no extractor selected.
To start using the extractor, the user should first select specific extractor. The specific extractor is selected by mouse clicking anywhere in the white background of Drop extractor. The mouse click opens up a popup menu of available extractors that consume files, urls and text snippets. Once the user has selected specific extractor, the text No extractor selected on Drop extractor should change to the name of selected extractor. Now the user can drop files, urls and text into the Drop extractor and it tries to apply the selected extractor to the dropped things. It depends on the selected extractor what happens during the extractor. Usually the selected extractor creates some topics and associations to current topic map.
Drop extractor pipes all log messages generated by the selected extractor to the log tab. The user should notice, the Drop extractor may bypass some initializations of selected extractor as it uses the selected extractor's extraction methods directly. This may change the behavior of some extractors and some extractors may even become unusable because of omitted initialization. The user should also notice that most extractors are assuming some input format. Feeding a jpeg image file to Firefox bookmark extractor may not result anything but exception messages.
Simple Document Extractor
Simple Document Extractor may be easiest extractor to test drive Wandora's Drop extractor. It activates by selecting Simple files > Simple document extractor.
Simple Document Extractor takes a file or a set of files and creates one topic for each file. All created file topics have subject identifier and locator derived from the location of file in file system. All created file topics have occurrences for document content, file name and extraction time, too. Wandora adds Document topic as a type for all file topics. Next image views Wandora after extracting Wandora's startup script Wandora.bat with the Simple Document extractor.
Simple Document Extractor extracts not only text files but binary file formats too. It transforms pdf, rtf, doc, docx, ppt and vsd files to plain text. Other binary files are stored as such. Thus, extracting large binary files is not suggested.
Simple Document Extractor accepts also text drops. Text can be dragged from WWW browser, for example. Wandora behaves as the dropped text was an anonymous file and creates a document topic for the text but constructs a default identifiers for the topic.
Old documentation
Wandora versions released after January of 2013 don't contain Drag and drop extractor feature! The feature has been removed temporarily as it was not compatible with the implementation of docking topic panels. We'll be looking for new ways to implement the drag and drop extractor and inform changes here.
If you look carefully Wandora's startup screen you may notice special icon right bottom corner of the Wandora window. This icon of arrows entering circle marks the drag and drop extractor in Wandora. Clicking on the icon opens extractor selection menu. Selected extractor name is shown left to the icon. After you have selected the extractor you can simple drag and drop not only files and folders but also texts from WWW browser over the icon, and Wandora starts selected extractor for the given source. This is very efficient method to construct a topic map out of files. All topics and associations created by the extractor are stored to the selected layer. Below is an example of drag and drop extractor after the user has selected JPG extractor:
If you have opened a topic to the topic panel the drag and drop icon is not available. To get the drag and drop extractor back close current topic with Topics > Close topic or press CTRL+W.
Drag and Drop Example
This example shows how you can converts BibTeX fragments to topics and associations with Wandora's BibTex extractor. Open Wandora application and activate BibTeX extractor in drag and drop extractor. Now Wandora window should look like this
Start your WWW browser and open page http://citeseer.ist.psu.edu/pepper00tao.html. This page contains citation information about Steve Pepper's famous TAO article. If you scroll down you may also notice a BibTeX entry of the article. Now select the BibTeX entry as show below
And drag the selected text to Wandora's drag and drop extractor. Wandora automatically extracts the dropped entry and constructs topics for the article, author, and publishing year. Notice Wandora uses temporary file to store dropped text fragment.
If you look at the article topic closer it should look now something like this