Yahoo! YQL term extractor

From WandoraWiki
Jump to: navigation, search

Wandora's Yahoo! YQL term extractor is a text classifier based on Yahoo's YQL (Yahoo Query Language) interface and web service. Yahoo's YQL web service accepts short text fragments and returns concepts the service found in given text. Wandora creates topics and associations from given text and concepts provided by the Yahoo! YQL web service. Wandora's Yahoo! YQL term extractor can be used to refine occurrences for example.

Yahoo! YQL term extractor example

In this example Wandora user extracts concepts out of first chapter in Wandora Wiki's first page. First user starts Yahoo! YQL term extractor by selecting menu option File > Extract > Classification > Yahoo! YQL term extractor.... Wandora opens a dialog window with File, URL and Raw tabs.

Yahoo yql tem extractor 01.gif

User selects to extract raw text. She writes her text to the text area.

Yahoo yql tem extractor 02.gif

When user clicks the Extract button, Wandora sends user text to the web service and receives an XML document containing concepts found in the text. Wandora converts the XML document to topics and associations.

Yahoo yql tem extractor 03.gif

After extraction Wandora user can explore created topics and associations. User opens a topic representing the text fragment and reviews associated concept topics.

Yahoo yql tem extractor 04.gif

Additional notes

I originally spotted Yahoo's term extraction service and programmed an extractor for that. However, shortly after finishing the extractor Yahoo announced they are going to shut down the term extraction web service API. In their message at the term extraction page they suggest to use YQL based term extraction instead. Luckily it was only a tiny job to create YQL based extractor after original term extractor. The original Yahoo term extractor is still in Wandora as a hidden feature. You'll find it in Tool manager (tab All Tools) with a name Yahoo Term Extractor.

Yahoo! YQL term extractor wraps user text to an URL parameter and sends parameters to the service using GET method. Therefore the maximum length of an URL address limits also the length of extracted texts. Notice, Wandora does NOT truncate user text to limit URL length.

Wandora contains five different concept extraction tools at the moment. All extractors can be used separately from each other extracting same text document. As an implicit consequence Wandora can be used to compare different concept extraction services.

See also

Personal tools