<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://wandora.org/w/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://wandora.org/w/index.php?action=history&amp;feed=atom&amp;title=Simple_Document_Extractor</id>
		<title>Simple Document Extractor - Revision history</title>
		<link rel="self" type="application/atom+xml" href="http://wandora.org/w/index.php?action=history&amp;feed=atom&amp;title=Simple_Document_Extractor"/>
		<link rel="alternate" type="text/html" href="http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;action=history"/>
		<updated>2026-04-18T10:07:24Z</updated>
		<subtitle>Revision history for this page on the wiki</subtitle>
		<generator>MediaWiki 1.19.1</generator>

	<entry>
		<id>http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8370&amp;oldid=prev</id>
		<title>Akivela at 11:11, 8 February 2011</title>
		<link rel="alternate" type="text/html" href="http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8370&amp;oldid=prev"/>
				<updated>2011-02-08T11:11:02Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 11:11, 8 February 2011&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Simple document extractor extracts text out of PDF, Office, and HTML (including XML) documents. You may also use extractor for binary documents but resulting document content occurrences contain binary data and are probably unusable. Wandora doesn't really support binary occurrences.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Simple document extractor extracts text out of PDF, Office, and HTML (including XML) documents. You may also use extractor for binary documents but resulting document content occurrences contain binary data and are probably unusable. Wandora doesn't really support binary occurrences.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;del style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Simple document extractor example ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Simple document extractor example ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Akivela</name></author>	</entry>

	<entry>
		<id>http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8369&amp;oldid=prev</id>
		<title>Akivela at 11:10, 8 February 2011</title>
		<link rel="alternate" type="text/html" href="http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8369&amp;oldid=prev"/>
				<updated>2011-02-08T11:10:49Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 11:10, 8 February 2011&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Wandora's Simple document extractor is used to create a topic out of given document and attach document content as an occurrence to the created topic. Simple document extractor can convert several documents at once. Simple document extractor starts with a menu option '''File &amp;gt; Extract &amp;gt; Simple files &amp;gt; Simple document extractor...'''. You &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;may &lt;/del&gt;use the extractor as a drag'n'drop extractor also, or as a browser extractor.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Wandora's Simple document extractor is used to create a topic out of given document and attach document content as an occurrence to the created topic. Simple document extractor can convert several documents at once. Simple document extractor starts with a menu option '''File &amp;gt; Extract &amp;gt; Simple files &amp;gt; Simple document extractor...'''. You &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;can &lt;/ins&gt;use the extractor as a drag'n'drop extractor also, or as a browser extractor.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Simple document extractor extracts text out of PDF, Office, and HTML (including XML) documents. You may also use extractor for binary documents but resulting document content occurrences contain binary data and are probably unusable. Wandora doesn't really support binary occurrences.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Simple document extractor extracts text out of PDF, Office, and HTML (including XML) documents. You may also use extractor for binary documents but resulting document content occurrences contain binary data and are probably unusable. Wandora doesn't really support binary occurrences.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Akivela</name></author>	</entry>

	<entry>
		<id>http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8367&amp;oldid=prev</id>
		<title>Akivela: /* Simple document extractor example */</title>
		<link rel="alternate" type="text/html" href="http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8367&amp;oldid=prev"/>
				<updated>2011-02-08T11:07:30Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Simple document extractor example&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 11:07, 8 February 2011&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 12:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 12:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;* Document-content&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;* Document-content&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Occurrence typed as '''document-content''' contains the content of that document. Notice, documents were HTML files and Wandora has stripped all HTML tags away.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Occurrence typed as '''document-content''' contains the content of that document. Notice, documents were HTML files and Wandora has stripped all HTML tags away. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;To continue, Wandora user might be interested in [[Refining occurrences|filtering and refining extracted occurrences]].&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[Image:simple_document_extractor_01.gif|center]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[Image:simple_document_extractor_02.gif|center]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[Image:simple_document_extractor_03.gif|center]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[Image:simple_document_extractor_04.gif|center]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[Image:simple_document_extractor_05.gif|center]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[Image:simple_document_extractor_06.gif|center]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Akivela</name></author>	</entry>

	<entry>
		<id>http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8360&amp;oldid=prev</id>
		<title>Akivela at 10:57, 8 February 2011</title>
		<link rel="alternate" type="text/html" href="http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8360&amp;oldid=prev"/>
				<updated>2011-02-08T10:57:34Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 10:57, 8 February 2011&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Simple &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Text Document Extractor helps you when you need &lt;/del&gt;to &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;enclose bunch of separate text documents into &lt;/del&gt;a topic &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;map. Simple Text Document Extractor reads the document, creates simple topic for the &lt;/del&gt;document and &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;places the text &lt;/del&gt;content &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;into &lt;/del&gt;an occurrence &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;attached &lt;/del&gt;to the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;document &lt;/del&gt;topic. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Extractor works best &lt;/del&gt;with drag &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;and &lt;/del&gt;drop &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;but &lt;/del&gt;may also &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;be started with &lt;/del&gt;'''File &amp;gt; Extract &amp;gt; Simple &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;text &lt;/del&gt;document extractor...'''.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Wandora's &lt;/ins&gt;Simple &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;document extractor is used &lt;/ins&gt;to &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;create &lt;/ins&gt;a topic &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;out of given &lt;/ins&gt;document and &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;attach document &lt;/ins&gt;content &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;as &lt;/ins&gt;an occurrence to the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;created &lt;/ins&gt;topic. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Simple document extractor can convert several documents at once. Simple document extractor starts &lt;/ins&gt;with &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;a menu option '''File &amp;gt; Extract &amp;gt; Simple files &amp;gt; Simple document extractor...'''. You may use the extractor as a &lt;/ins&gt;drag&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;'n'&lt;/ins&gt;drop &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;extractor also, or as a browser extractor.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Simple document extractor extracts text out of PDF, Office, and HTML (including XML) documents. You &lt;/ins&gt;may also &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;use extractor for binary documents but resulting document content occurrences contain binary data and are probably unusable. Wandora doesn't really support binary occurrences.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;== Simple document extractor example ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;In this example, Wandora user has downloaded a document collection known as CableGate of Wikileaks. Wandora user aims to build a topic map out of CableGate documents. User has the document collection available in her file system. All Cablegate documents are in folder caller '''cable'''. Wandora user chooses menu option &lt;/ins&gt;'''File &amp;gt; Extract &amp;gt; Simple &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;files &amp;gt; Simple &lt;/ins&gt;document extractor...'''&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. A dialog opens. User selects '''Files''' tab and presses '''Browse''' button. User addresses the folder containing all CableGate documents and starts the extractor by pressing '''Extract''' button. Wandora reads given folder, it's subfolders and all enclosed files, and creates a topic for each file. While the extraction ends, Wandora user can find all extracted documents in topic tree, below '''Document''' topic. Each document topic contains three occurrences:&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;* Extraction-time&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;* File-name&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;* Document-content&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Occurrence typed as '''document-content''' contains the content of that document. Notice, documents were HTML files and Wandora has stripped all HTML tags away&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Akivela</name></author>	</entry>

	<entry>
		<id>http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8358&amp;oldid=prev</id>
		<title>Akivela: Simple Text Document Extractor moved to Simple Document Extractor</title>
		<link rel="alternate" type="text/html" href="http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=8358&amp;oldid=prev"/>
				<updated>2011-02-08T10:19:37Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;a href=&quot;/wiki/Simple_Text_Document_Extractor&quot; class=&quot;mw-redirect&quot; title=&quot;Simple Text Document Extractor&quot;&gt;Simple Text Document Extractor&lt;/a&gt; moved to &lt;a href=&quot;/wiki/Simple_Document_Extractor&quot; title=&quot;Simple Document Extractor&quot;&gt;Simple Document Extractor&lt;/a&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='1' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='1' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 10:19, 8 February 2011&lt;/td&gt;
			&lt;/tr&gt;&lt;/table&gt;</summary>
		<author><name>Akivela</name></author>	</entry>

	<entry>
		<id>http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=3693&amp;oldid=prev</id>
		<title>Akivela at 16:51, 5 November 2007</title>
		<link rel="alternate" type="text/html" href="http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=3693&amp;oldid=prev"/>
				<updated>2007-11-05T16:51:51Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 16:51, 5 November 2007&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Simple Text Document Extractor helps you when you need to enclose bunch of separate text documents into a topic map. Simple Text Document Extractor reads the document, creates simple topic for the document and places the text content into an occurrence attached to the document topic. Extractor works best with drag and drop but may also started with '''File &amp;gt; Extract &amp;gt; Simple text document extractor...'''.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Simple Text Document Extractor helps you when you need to enclose bunch of separate text documents into a topic map. Simple Text Document Extractor reads the document, creates simple topic for the document and places the text content into an occurrence attached to the document topic. Extractor works best with drag and drop but may also &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;be &lt;/ins&gt;started with '''File &amp;gt; Extract &amp;gt; Simple text document extractor...'''.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Akivela</name></author>	</entry>

	<entry>
		<id>http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=3692&amp;oldid=prev</id>
		<title>Akivela at 16:51, 5 November 2007</title>
		<link rel="alternate" type="text/html" href="http://wandora.org/w/index.php?title=Simple_Document_Extractor&amp;diff=3692&amp;oldid=prev"/>
				<updated>2007-11-05T16:51:33Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Simple Text Document Extractor helps you when you need to enclose bunch of separate text documents into a topic map. Simple Text Document Extractor reads the document, creates simple topic for the document and places the text content into an occurrence attached to the document topic. Extractor works best with drag and drop but may also started with '''File &amp;gt; Extract &amp;gt; Simple text document extractor...'''.&lt;/div&gt;</summary>
		<author><name>Akivela</name></author>	</entry>

	</feed>