HCard microformat extractor
HCard microformat extractor reads HTML documents and fragments, and creates topics and associations for HCard structures. HCard structures are used to mark up people, company, organization, and place related data in HTML and XML documents. HCard structure resembles address book record with name, address, phone number, etc. columns. Extractor starts with menu option File > Extract > Microformats > HCard microformat extractor... or selecting the extractor as the Drag and drop extractor and dropping HTML fragments directly from the WWW browser.
HCard extraction example
Lets say I have a HTML document with following HCard fragment
<div class="vcard"> <span class="n" style="display:none"> <span class="family-name">Kivelä</span> <span class="given-name">Aki</span> </span> <span class="fn">Aki Kivelä</span> <span class="nickname" style="display:none">akivela</span> <span class="org">Grip Studios Interactive Inc.</span> <span class="adr"> <span class="street-address">Kristianinkatu 15</span> <span class="postal-code">FIN-00170 Helsinki</span> <span class="country-name">Finland</span> </span> <span class="email">akivela@gripstudios.com</span> </div>
Now I activate Wandora's HCard extractor and drop the HTML file into Wandora's Drag and drop extractor. Wandora recognizes the HCard structure and creates equivalent topic map structures for the HCard. Single topic is created for the HCard and all related data such as names and addresses are associated to the created HCard topic. HCard topic is labeled with HCard's fn element.
Address data builds up a separate topic with address details associated to the address topic:
Complete extraction viewed as graph: