HTML instance list extractor

From WandoraWiki
Jump to: navigation, search

HTML instance list extractor starts with menu option File > Extract > HTML structures > HTML instance list extractor... or selecting the extractor as the Drag and Drop extractor and dropping HTML fragments from WWW browser.

Extractor read HTML file or fragment and interprets encountered HTML ordered and unordered lists as nested instance relations. A topic is created for each list element (content of li element). Created topic is set as an instance to outer list element. Outer element is considered as a type of inner element. List structure depth is unlimited. Topic named as ListRoot is set a type of first level topics. As an example consider simple list:

  • Movies
    • Blade Runner
    • 2001: A Space Odyssey
    • Spaceballs
  • Directors
    • Ridley Scott
    • Stanley Kubrick
    • Mel Brooks

If given to the extractor, this HTML fragment generates topics for Movies and Blade Runner for example and Blade Runned topic is set as an instance of the Movies. Movies and Directors are set as instances of ListRoot topic.

Actually, you can try this one now if you have already installed Wandora. Open Wandora and

  • In Wandora select HTML instance list extractor as active Drag and Drop extractor.
  • Select the movie and director list above and
  • Drag it to Wandora's Drag and Drop extractor.
  • Wandora extracts topics and leaves the log dialog open.
  • Close log dialog and use finder to locate root node. Search for ListRoot.
  • Open located ListRoot and browse instances...

Wandora has also similar extractor named HTML superclass-subclass list extractor generating superclass-subclass associations instead of instance-of relations.

Personal tools