MediaWiki Content Extractor

From WandoraWiki
Jump to: navigation, search

The MediaWiki content extractor found in File > Extract > Wiki > MediaWiki API Extractor enables extraction of wiki content using the MediaWiki API. The API should be available on Wikipedia and most other MediaWiki instances by default.

The extraction is filtered using select methods exposed by the MediaWiki API. Content may be restricted by it's category or title prefix. Content may also be searched for and explicitly specified by titles.

Stub category articles may also be extracted along the targeted articles to construct a class hierarchy of the categories in Wandora.

The extractor needs the MediaWiki instance URL to reach the targeted wiki instance. The path should more specifically reach api.php and index.php. For the English Wikipedia this URL would be http://en.wikipedia.org/w/ as of this writing.

The extractor starts processing through the given topics in batches of 100 in the order they are returned from the API. The extraction may be terminated by pressing stop any time during the extraction.

The article data is stored as a MediaWiki content occurrence in Wandora.

Extraction

Category search
The category search allows extraction by a set category in the wiki. Categories are denoted in the wiki as Category:<categoryname> in English or it's equivalent in other languages. The Category: prefix should be omitted in the search field.
Blank input is interpreted as a search for all articles in the wiki.
Prefix search
The prefix search allows extraction of topics prefixed with a set prefix. This allows for example searching for all articles beginning with the letter 'c'.
Free search
The free search method allows freely searching for articles. This is more or less equivalent to searching for topics with search box available on the wiki webpage.
Titles search
The titles search allows extracting set articles specified as a comma separated list.

Example

The extactor is found in File > Extract > Wiki > MediaWiki API Extractor.

Mediawiki e 1.png

Search for members in the category physics in the English Wikipedia. Also crawl the categories of each extracted article.

Mediawiki e 2.png

The extraction results are shown when the extraction ends.

Mediawiki e 3.png

Fill the extracted category stub of Category:Physics by explicitly searching for it.

Mediawiki e 4.png

The article content is now visible on the category article

Mediawiki e 5.png

Supplement the original extraction by freely searching for articles related to physics

Mediawiki e 6.png

See also

Personal tools