Any23 extractor

From WandoraWiki
Revision as of 22:28, 17 February 2011 by Akivela (Talk | contribs)

Jump to: navigation, search

Any23 (Anything to triplets) is a library used to extract structured data i.e. micro-formats out of web documents such as HTML pages. Wandora's Any23 extractor uses Any23 library and ables Wandora user to extract topic mapped RDF out of web resources. Especially Any23 extractor adds Wandora a full featured micro-format extractor. Any23 extractor starts with Wandora's menu option File > Extract > Microformats > Any23 extractor.... Extractor opens up a dialog for input files and URLs. Extraction starts when user presses Extract button.

Any23 library generates RDF triplets. Wandora is based on Topic Maps technology and converts RDF triplets to Topic Maps associations. Conversion schema is very simple. RDF triplets are converted to binary associations where RDF predicate will be association type and RDF resource and object association players. Association roles are predefined topics. Conversion schema is described in detail in wiki page Importing RDF. Addition to this simple schema, RDF triplet's source plays very important role in this picture. RDF triplet's source is the web resource or file the RDF triplet originates from. Wandora's Any23 extractor creates a topic for this source and adds it as a third player in every association the extractor generates. This addition is important if you consider extracting similar triplets from different sources. Feature ables the user to track and verify information sources.

Personal tools