Hello Aparna
I am a bit amazed you didn't get the extractor working. If you have another machine, maybe on another network, you might give it a try.
About automatic document extraction. Unfortunately all extractions in Wandora require manual user actions at the moment. User has to point out which extractor is used and which document is extracted. Most automatic extraction user reaches using Wandora's Drag and Drop Extractor described at [1].
PDF extraction is similar to simple text file extraction where text in PDF document is stored as an occurrence to a document topic. JPG metadata extractor reads metadata out of JPG images and constructs occurrences for the metadata fields. Moreover, Wandora doesn't support image-data (looking at image or video pixels) extraction at all at the moment. In general extractors can't read text out of MS doc files. However, dropping MS doc file into a Raw text area converts doc to text and you can then continue extraction with raw text.
Kind Regards,
Aki / Wandora Team
[1]
http://www.wandora.org/wandora/wiki/ind ... _extractor
-----Edit-----
Oh, I nearly forgot. If you need to extract WWW pages, you might also try Wandora's Firefox/Thunderbird plugin allowing extractions directly in Firefox WWW browser (or Thunderbird email client). See details at
http://www.wandora.org/wandora/wiki/ind ... fox_plugin