MediaWiki Content Extractor
The MediaWiki content extractor found in File > Extract > Wiki > MediaWiki API Extractor enables extraction of wiki content using the MediaWiki API. The API should be available on Wikipedia and most other MediaWiki instances by default.
The extraction is filtered using select methods exposed by the MediaWiki API. Content may be restricted by it's category or title prefix. Content may also be searched for and explicitly specified by titles.
Stub category articles may also be extracted along the targeted articles to construct a class hierarchy of the categories in Wandora.
The extractor needs the MediaWiki instance URL to reach the targeted wiki instance. The path should more specifically reach api.php
and index.php
. For the English Wikipedia this URL would be http://en.wikipedia.org/w/
as of this writing.
The extractor starts processing through the given topics in batches of 100 in the order they are returned from the API. The extraction may be terminated by pressing stop any time during the extraction.
The article data is stored as a MediaWiki content occurrence in Wandora.
Extraction
- Category search
- The category search allows extraction by a set category in the wiki. Categories are denoted in the wiki as
Category:<categoryname>
in English or it's equivalent in other languages. TheCategory:
prefix should be omitted in the search field.
- Blank input is interpreted as a search for all articles in the wiki.
- Prefix search
- The prefix search allows extraction of topics prefixed with a set prefix. This allows for example searching for all articles beginning with the letter 'c'.
- Free search
- The free search method allows freely searching for articles. This is more or less equivalent to searching for topics with search box available on the wiki webpage.
- Titles search
- The titles search allows extracting set articles specified as a comma separated list.
Example
The extactor is found in File > Extract > Wiki > MediaWiki API Extractor.
Search for members in the category physics in the English Wikipedia. Also crawl the categories of each extracted article.
The extraction results are shown when the extraction ends.
Fill the extracted category stub of Category:Physics by explicitly searching for it.
The article content is now visible on the category article
Supplement the original extraction by freely searching for articles related to physics