New York Times Article Search API extractor

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
(Example)
 
Line 1: Line 1:
Wandora's New York Times Article Search API extractor performs an api request to [http://developer.nytimes.com/docs/read/article_search_api New York Times Article Search API] and transforms the response JSON to topics and associations. After successful extraction Wandora contains information about New York Times articles and extracted information can be visualized, exported and modified in Wandora.
+
The New York Times API Extractor is used to parse articles from the [http://developer.nytimes.com/docs/read/article_search_api_v2 New York Times Article Search service] into Topic Map data. The service requires credentials for access. API keys for the article search as well as other NYT end points can be requested on the NYT [http://developer.nytimes.com/ Developer Pages]. In Wandora, the API key is not saved between sessions.
 
+
Wandora user has to [http://developer.nytimes.com/apps/register sign up for a personal API key] in order to make extractions using Wandora application. Wandora doesn't store your api-key between use sessions. If you find the extracted information useful and want to use it, read [http://developer.nytimes.com/Api_terms_of_use API Terms of Use] carefully.
+
  
 
== Example ==
 
== Example ==
  
In this example, Wandora user performs an article search to New York Times Article Search API with a query '''obama'''. First, user locates the extractor in '''File > Extract > News > New York Times Article Search API extractor''' and selects the menu option.
+
As an example, the service is queried for articles related to Barack Obama. Extractors using NYT services are found in '''File > Extract > News > New York Times API Extractor'''
 
+
 
+
[[Image:nytimes_example_01.gif|center]]
+
 
+
 
+
Wandora opens up an extractor dialog. Wandora user fills in a search query and selects fields to be extracted. Listed fields are a subset of fields described in [http://developer.nytimes.com/docs/read/article_search_api New York Times Article Search API documentation]. We experienced some difficulties in extracting too many fields: The number of result rows in response dropped down to one instead of default ten. Notice, you can always perform sequential extractions with a different set of fields. As Wandora merges article topics automatically, latter results merge nicely with earlier results. In this example user selects to extract fields ''author'', ''body'', ''byline'', ''column_facet'', ''date'' and ''title''. Article ''url'' is added to the selected fields automatically by Wandora. Article url is used as a subject identifier of an article topic.
+
 
+
 
+
[[Image:nytimes_example_02.gif|center]]
+
 
+
 
+
Wandora user clicks the '''Extract''' button and Wandora asks user's api-key. Wandora keeps your api-key stored in memory during a use session. Your api-key is not stored between use sessions. If you need to change the api-key during a use session, press the '''Forget api-key''' button in query dialog. Wandora asks the key again once you perform an extraction.
+
 
+
 
+
[[Image:nytimes_example_03.gif|center]]
+
 
+
 
+
If Wandora gets a valid response, the application parses response JSON and transforms included entities to topics and associations. One response JSON contains information about 10 articles. Wandora notices if there is more articles available and asks how to proceed.
+
 
+
 
+
[[Image:nytimes_example_04.gif|center]]
+
 
+
 
+
Available options are
+
 
+
* Do not extract any more pages.
+
* Extract only next page
+
* Extract next page
+
* Extract 10 next pages
+
* Extract all next pages
+
 
+
Wandora user should notice that the Article Search API usage is limited to 5000 requests per day and extracting one page takes one request. Thus, use option ''Extract all next pages'' carefully.
+
 
+
After successful extraction, Wandora user finds a topic '''New york Times API''' just below '''Wandora class'''. Opening the topic reveals extracted articles and used type topics.
+
 
+
 
+
[[Image:nytimes_example_05.gif|center]]
+
 
+
 
+
Opening single article topic in [[Traditional topic panel]] views all information (i.e. selected fields) associated with the article.
+
 
+
 
+
[[Image:nytimes_example_06.gif|center]]
+
 
+
 
+
For example, article body is modeled as an occurrence attached to the article topic.
+
 
+
 
+
[[Image:nytimes_example_07.gif|center]]
+
 
+
 
+
By-lines are modeled as topics associated with the article topic. Similarly, facets are generally modeled as topics associated with the article topic.
+
  
 +
[[File:nyt_article_1.png]]
  
[[Image:nytimes_example_08.gif|center]]
+
The Article Search dialog mirrors options available in the Article Search service. In addition to a search query, the service allows selecting a subset of fields to return for each article. The results may also be restricted to a certain date range, offset with a page number (i.e. a multiple of 10 articles) or sorted newest or oldest first.
  
 +
[[File:nyt_article_2.png]]
  
All facet topics are typed with article-facet topic.
+
If an API key isn't already stored it is requested.
  
 +
[[File:nyt_article_3.png]]
  
[[Image:nytimes_example_09.gif|center]]
+
Often the query returns more than one page, or 10 articles, and the extractor prompts for further action. The pagination options allow for either
  
 +
* Stopping and not retrieving additional articles
 +
* Extracting one more page and stopping
 +
* Extracting one more page and prompting again
 +
* Extract 10 more pages and prompting again
 +
* Extracting all remaining pages
  
Opening one facet topic, say '''Letter''', lists all articles that have the facet value.
+
The NYT article database is large, and vague queries might return results sets in the order of thousands of articles. Review the API rate limits before running excessive queries in order to avoid exhausting available quota.
  
 +
[[File:nyt_article_4.png]]
  
[[Image:nytimes_example_10.gif|center]]
+
The fetched articles are then parsed into topics and associations representing relations between articles, keywords etc.
  
== See also ==
+
[[File:nyt_article_5.png]]
  
Once you have performed an article search using the New York Times Article Search API extractor, you might be interested in classifying articles:
+
In particular the keyword structure used by the service is parsed into a topic structure associating articles with common keywords.
  
* [[GATE/ANNIE integration|GATE/ANNIE]]
+
[[File:nyt_article_6.png]]
* [[Stanford Named Entity Recognizer integration|Stanford Named Entity Recognizer (NER)]]
+
* [[OpenCalais classifier]]
+
* [[AlchemyAPI extractors]]
+
* [[Yahoo! YQL term extractor]]
+
* [[Tagthe extractor]]
+
* [[SemanticHacker classifier]]
+
* [[Zemanta extractor]]
+
* [[UClassify integration]]
+

Latest revision as of 12:16, 2 March 2015

The New York Times API Extractor is used to parse articles from the New York Times Article Search service into Topic Map data. The service requires credentials for access. API keys for the article search as well as other NYT end points can be requested on the NYT Developer Pages. In Wandora, the API key is not saved between sessions.

[edit] Example

As an example, the service is queried for articles related to Barack Obama. Extractors using NYT services are found in File > Extract > News > New York Times API Extractor

Nyt article 1.png

The Article Search dialog mirrors options available in the Article Search service. In addition to a search query, the service allows selecting a subset of fields to return for each article. The results may also be restricted to a certain date range, offset with a page number (i.e. a multiple of 10 articles) or sorted newest or oldest first.

Nyt article 2.png

If an API key isn't already stored it is requested.

Nyt article 3.png

Often the query returns more than one page, or 10 articles, and the extractor prompts for further action. The pagination options allow for either

  • Stopping and not retrieving additional articles
  • Extracting one more page and stopping
  • Extracting one more page and prompting again
  • Extract 10 more pages and prompting again
  • Extracting all remaining pages

The NYT article database is large, and vague queries might return results sets in the order of thousands of articles. Review the API rate limits before running excessive queries in order to avoid exhausting available quota.

Nyt article 4.png

The fetched articles are then parsed into topics and associations representing relations between articles, keywords etc.

Nyt article 5.png

In particular the keyword structure used by the service is parsed into a topic structure associating articles with common keywords.

Nyt article 6.png

Personal tools