Atom extractor

From WandoraWiki
Jump to: navigation, search

Wandora's Atom extractor converts generic Atom syndication feeds to topic maps format. Extractor starts with menu option File > Extract > News feeds > Atom extractor.... Wandora requires feed file or an URL resource before transformation is possible.

See also

RSS 2.0 Extractor

Example

In this example we show how Wandora user extracts Atom feed provided by Citeseer. User has searched for Topic Maps in Citeseer and gets result set (or 100 first matches) Atom feed as shown below. Feed contains information about scientific papers related to Topic Maps.


Atom extract example 01.gif


User copies feed URL, starts Wandora application, and chooses Atom extractor with File > Extract > Atom extractor option. User selects Urls tab and pastes Atom feed URL to the text area and clicks Extract button.


Atom extract example 02.gif


After successful extraction Wandora views some data about extraction process.


Atom extract example 03.gif


Now Wandora has topics for the extracted Atom feed and all feed entries. Feed entry topics are associated to the feed topic.


Atom extract example 04.gif


Now Wandora user clicks open one extracted entry topic.


Atom extract example 05.gif


Atom feed contains also abstracts of Topic Maps related papers. Each abstract is stored as occurrence related to entry topic.


Atom extract example 08.gif


As you might have noticed above, Atom feed entry authors are authors of scientific paper the Atom entry represent. If paper has multiple authors as our example above, all authors are stacked into a single topic. This is a consequence of original Atom feed structure where single XML element is used to represent Atom feed entry authors. Next images show how Wandora user may split up the author topic. Wandora user must select the author topic first. Then user must right click the authr topic cell and choose Topics > Split topics > Split topic with base name... in context menu. As authors are separated with comma and white space, regular expression used to plit topic is

\, 

Although not visible, the comma character should follow white space character in regular expression above.

Atom extract example 06.gif


After topic split the Atom feed entry has two authors instead of one and both authors represent one person instead of group of persons. After split topic is viewed below. Wandora user should also note that extra effort is required to complete details of splitted topics.


Atom extract example 07.gif


Notice Wandora contains also Bibtex extractor that might suit better for bibliographical extractions compared to Atom feed extractor.

Personal tools