Email extractor

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
 
(12 intermediate revisions by one user not shown)
Line 1: Line 1:
Email extractor can transfer standard email files and repositories to a topic map. Email extractor supports DBX and MBOX formats enabling you to extract Thunderbird's and Outlook Express' repositories for example.
+
Wandora's email extractor can transfer standard email files and [http://en.wikipedia.org/wiki/Mbox MBOX] email repositories to a topic map. MBOX is an email repository format used by Thunderbird and many Linux and Unix email applications.  
  
To start with email extractor choose '''File > Extract > Other > Simple Email Extrator'''. You can choose to extract metadata from local or internet files. If you decide to process local files, Wandora asks you next where to start looking for the files. You can point the actual '''email''' files or a folder containing the files. If you decided to crawl internet files, Wandora asks you the URL where to start. Starting URL can be either an '''email''' file or an HTML page containing links to '''email''' files. Current email extractor does not support email protocols such as IMAP and POP.
+
To start Wandora's email extractor choose menu option '''File > Extract > Other > Simple Email Extrator'''. A dialog open where you select the email resource type (either a single email file or a MBOX repository) and the email resource file.
  
Emails may contain very wide range of attachments and it is very likely that extractor faces unknown file formats while crawling larger email repository, and eventually generates exceptions. Wandora has been designed to tolerate exceptions and errors during email extractions. Note also that Wandora '''does not''' extract metadata from email attachments.
+
Once the email extraction finishes, you can locate extraction results in topic tree below '''Emails''' topic.
  
After the extraction you can find resulting topic island with finder and search word ''email''. Below is an example of emails extracted with Email extractor. Email in topic panel is one of those spams we all are familiar with.
 
  
 +
== Email extraction example ==
  
[[Image:Email_extractor_example.gif]]
+
In this example Wandora user first extracts single email file saved from Thunderbird email client as a file '''[topicmapmail] Opera and the TAO.eml'''. [http://www.infoloom.com/pipermail/topicmapmail/2010q4/008588.html This email] was sent to Topic Maps mailing list by Steven Pepper 2nd of November 2010. After extraction Wandora user looks at the email body stored as an occurrence, and applies [[OpenCalais classifier]] to the occurrence. Resulting topics are merged with the topic representing Steve's email.
  
 +
Next Wandora user starts Wandora's email extractor again and chooses to extract MBOX email repository in her local Thunderbird folder. This email repository, called '''topicmaps''' contains little over 1000 emails sent to Topic Maps emailing list. After extraction user can browse email topics, email address topics, and date topics for example. This example stops here but the reader is encouraged to try Wandora's batch extraction options to all extracted emails. Occurrence batch extraction is described in tutorial [[Refining occurrences]].
 +
 +
 +
[[Image:email_extractor_v2_01.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_02.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_03.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_04.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_05.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_06.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_07.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_08.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_09.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_10.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_11.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_12.gif|center]]
 +
 +
 +
[[Image:email_extractor_v2_13.gif|center]]
 +
 +
== Limitations ==
 +
 +
* Emails may contain a very wide range of attachments. Wandora's email extractor handles only text type attachments and gives a notification of unrecognized attachment types.
 +
* Wandora's email extractor can't handle HTML representations of email repositories. Such HTML representation is the archive of topicmapmail at  http://www.infoloom.com/pipermail/topicmapmail/ for example.
 +
* We have been reported that Java's file dialog doesn't see folders starting with a dot character in Linuxes. Thus, you may need to copy your email repository out of it's default folder in order to access repository with email extractor in Linuxes.
  
 
== See also ==
 
== See also ==
  
 
* [[Wandora Firefox plugin]] can be used to extract topics directly from the Thunderbird email client.
 
* [[Wandora Firefox plugin]] can be used to extract topics directly from the Thunderbird email client.

Latest revision as of 13:33, 27 January 2011

Wandora's email extractor can transfer standard email files and MBOX email repositories to a topic map. MBOX is an email repository format used by Thunderbird and many Linux and Unix email applications.

To start Wandora's email extractor choose menu option File > Extract > Other > Simple Email Extrator. A dialog open where you select the email resource type (either a single email file or a MBOX repository) and the email resource file.

Once the email extraction finishes, you can locate extraction results in topic tree below Emails topic.


[edit] Email extraction example

In this example Wandora user first extracts single email file saved from Thunderbird email client as a file [topicmapmail] Opera and the TAO.eml. This email was sent to Topic Maps mailing list by Steven Pepper 2nd of November 2010. After extraction Wandora user looks at the email body stored as an occurrence, and applies OpenCalais classifier to the occurrence. Resulting topics are merged with the topic representing Steve's email.

Next Wandora user starts Wandora's email extractor again and chooses to extract MBOX email repository in her local Thunderbird folder. This email repository, called topicmaps contains little over 1000 emails sent to Topic Maps emailing list. After extraction user can browse email topics, email address topics, and date topics for example. This example stops here but the reader is encouraged to try Wandora's batch extraction options to all extracted emails. Occurrence batch extraction is described in tutorial Refining occurrences.


Email extractor v2 01.gif


Email extractor v2 02.gif


Email extractor v2 03.gif


Email extractor v2 04.gif


Email extractor v2 05.gif


Email extractor v2 06.gif


Email extractor v2 07.gif


Email extractor v2 08.gif


Email extractor v2 09.gif


Email extractor v2 10.gif


Email extractor v2 11.gif


Email extractor v2 12.gif


Email extractor v2 13.gif

[edit] Limitations

  • Emails may contain a very wide range of attachments. Wandora's email extractor handles only text type attachments and gives a notification of unrecognized attachment types.
  • Wandora's email extractor can't handle HTML representations of email repositories. Such HTML representation is the archive of topicmapmail at http://www.infoloom.com/pipermail/topicmapmail/ for example.
  • We have been reported that Java's file dialog doesn't see folders starting with a dot character in Linuxes. Thus, you may need to copy your email repository out of it's default folder in order to access repository with email extractor in Linuxes.

[edit] See also

Personal tools