Email extractor

From WandoraWiki
Jump to: navigation, search

Wandora's email extractor can transfer standard email files and MBOX email repositories to a topic map. MBOX is an email repository format used by Thunderbird and many Linux and Unix email applications.

To start Wandora's email extractor choose menu option File > Extract > Other > Simple Email Extrator. A dialog open where you select the email resource type (either a single email file or a MBOX repository) and the email resource file.

Once the email extraction finishes, you can locate extraction results in topic tree below Emails topic.

Email extraction example

In this example Wandora user first extracts single email file saved from Thunderbird email client as a file [topicmapmail] Opera and the TAO.eml. This email was sent to Topic Maps mailing list by Steven Pepper 2nd of November 2010. After extraction Wandora user looks at the email body stored as an occurrence, and applies OpenCalais classifier to the occurrence. Resulting topics are merged with the topic representing Steve's email.

Next Wandora user starts Wandora's email extractor again and chooses to extract MBOX email repository in her local Thunderbird folder. This email repository, called topicmaps contains little over 1000 emails sent to Topic Maps emailing list. After extraction user can browse email topics, email address topics, and date topics for example. This example stops here but the reader is encouraged to try Wandora's batch extraction options to all extracted emails. Occurrence batch extraction is described in tutorial Refining occurrences.

Email extractor v2 01.gif

Email extractor v2 02.gif

Email extractor v2 03.gif

Email extractor v2 04.gif

Email extractor v2 05.gif

Email extractor v2 06.gif

Email extractor v2 07.gif

Email extractor v2 08.gif

Email extractor v2 09.gif

Email extractor v2 10.gif

Email extractor v2 11.gif

Email extractor v2 12.gif

Email extractor v2 13.gif


  • Emails may contain a very wide range of attachments. Wandora's email extractor handles only text type attachments and gives a notification of unrecognized attachment types.
  • Wandora's email extractor can't handle HTML representations of email repositories. Such HTML representation is the archive of topicmapmail at for example.
  • We have been reported that Java's file dialog doesn't see folders starting with a dot character in Linuxes. Thus, you may need to copy your email repository out of it's default folder in order to access repository with email extractor in Linuxes.

See also

Personal tools