MediaWiki extractor

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
MediaWiki extractor reads XML dump of MediaWiki page and creates a topic for the page. Page content is attached to the topic as a text data (occurrence). The extractor is started with '''File > Extract > MediaWiki extractor'''. You can extract data from local XML files or directly from MediaWiki site using URL. For example the export URL for this page is
+
MediaWiki extractor allows you gather topics and associations from various large knowledge repositories such as [http://www.wikipedia.oeg Wikipedia]. MediaWiki extractor reads the XML dump of MediaWiki page and creates a topic for the page. Page content is attached to the topic as a text data occurrence. The extractor is started with '''File > Extract > MediaWiki extractor'''. You can extract data from local XML files or directly from MediaWiki site using export URL of the page. For example the export URL of this page is
  
 
  http://www.wandora.net/wandora/wiki/index.php?title=Special:Export/MediaWiki_extractor
 
  http://www.wandora.net/wandora/wiki/index.php?title=Special:Export/MediaWiki_extractor
 +
 +
== Postprocessing MediaWiki extracted topics ==
 +
 +
The MediaWiki extractor does not process the content of extracted pages. However it is possible to create associations out of page content using another tool in Wandora. Context menu has a tool called '''Topics > Associations > Find associations in text datas...''' that can be used to extract associations out of text data. The tool requires type and scope of processed text data, role used in generated associations, and regular expression used to recognize topic names in text data.

Revision as of 11:28, 2 May 2007

MediaWiki extractor allows you gather topics and associations from various large knowledge repositories such as Wikipedia. MediaWiki extractor reads the XML dump of MediaWiki page and creates a topic for the page. Page content is attached to the topic as a text data occurrence. The extractor is started with File > Extract > MediaWiki extractor. You can extract data from local XML files or directly from MediaWiki site using export URL of the page. For example the export URL of this page is

http://www.wandora.net/wandora/wiki/index.php?title=Special:Export/MediaWiki_extractor

Postprocessing MediaWiki extracted topics

The MediaWiki extractor does not process the content of extracted pages. However it is possible to create associations out of page content using another tool in Wandora. Context menu has a tool called Topics > Associations > Find associations in text datas... that can be used to extract associations out of text data. The tool requires type and scope of processed text data, role used in generated associations, and regular expression used to recognize topic names in text data.

Personal tools