Extracting Reuters using OpenCalais

Forum is for miscellaneous user help requests.

Extracting Reuters using OpenCalais

Postby manno » Sat Feb 15, 2014 4:24 pm

When i try to Extract Reuters Dataset using OpenCalais Extractor I'm getting following error
Wait while seeking files with metadata!
Extracting from C:\Users\reuters21578.tar.gz
org.apache.axis2.AxisFault: api.opencalais.com
at org.apache.axis2.AxisFault.makeFault(AxisFault.java:430)
at org.apache.axis2.transport.http.HTTPSender.sendViaPost(HTTPSender.java:203)
at org.apache.axis2.transport.http.HTTPSender.send(HTTPSender.java:76)
at org.apache.axis2.transport.http.CommonsHTTPTransportSender.writeMessageWithCommons(CommonsHTTPTransportSender.java:400)
at org.apache.axis2.transport.http.CommonsHTTPTransportSender.invoke(CommonsHTTPTransportSender.java:225)
at org.apache.axis2.engine.AxisEngine.send(AxisEngine.java:438)
at org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:402)
at org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:229)
at org.apache.axis2.client.OperationClient.execute(OperationClient.java:165)
at org.wandora.application.tools.extractors.opencalais.webservice.CalaisStub.enlighten(CalaisStub.java:190)
at org.wandora.application.tools.extractors.opencalais.webservice.CalaisClient.enlighten(CalaisClient.java:62)
at org.wandora.application.tools.extractors.opencalais.OpenCalaisClassifier._extractTopicsFrom(OpenCalaisClassifier.java:158)
at org.wandora.application.tools.extractors.opencalais.OpenCalaisClassifier._extractTopicsFrom(OpenCalaisClassifier.java:128)
at org.wandora.application.tools.extractors.opencalais.OpenCalaisClassifier._extractTopicsFrom(OpenCalaisClassifier.java:122)
at org.wandora.application.tools.extractors.AbstractExtractor.extractTopicsFrom(AbstractExtractor.java:760)
at org.wandora.application.tools.extractors.AbstractExtractor.handleFiles(AbstractExtractor.java:329)
at org.wandora.application.tools.extractors.AbstractExtractor.execute(AbstractExtractor.java:278)
at org.wandora.application.tools.AbstractWandoraTool.run(AbstractWandoraTool.java:209)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.UnknownHostException: api.opencalais.com
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.<init>(Unknown Source)
at java.net.Socket.<init>(Unknown Source)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$1.doit(ControllerThreadSocketFactory.java:91)
at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$SocketTask.run(ControllerThreadSocketFactory.java:158)
... 1 more

Ok.
manno
 
Posts: 3
Joined: Fri Feb 14, 2014 8:44 pm

Re: Extracting Reuters using OpenCalais

Postby akivela » Mon Feb 17, 2014 7:23 pm

Hello Manno

It appears the OpenCalais extractor is broken at the moment. I have to investigate what is the problem.

In general, the Java exception in your post shows that you are passing a gzipped tar file to the OpenCalais. Unfortunately Wandora nor OpenCalais can't process gzipped tar files directly. Wandora and OpenCalais both assume the file is a text file. You need to decompress-extract the package before you can extract named entities out of the data in the package. To decompress the package first, use the 7zip application in Windows or the gunzip and tar commands in Linux.

But as I said in the very beginning, it looks like Wandora's OpenCalais extractor has some other problems too.

Kind Regards,
Aki / Wandora Team
akivela
Site Admin
 
Posts: 256
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Re: Extracting Reuters using OpenCalais

Postby manno » Tue Feb 18, 2014 9:33 pm

Hello Aki

Thanks for your reply.
I had pass simple text file also getting same error.
Can you please tell me how long it take time for fixing OpenCalais issue.
manno
 
Posts: 3
Joined: Fri Feb 14, 2014 8:44 pm

Re: Extracting Reuters using OpenCalais

Postby akivela » Wed Feb 19, 2014 1:09 pm

Hello Manno

It looks like the OpenCalais API key stored in the Wandora application has been suspended. To perform OpenCalais extractors in Wandora you need to:

Request your personal Open Calais API key at http://www.opencalais.com. First create new account at http://www.opencalais.com/apps/register. Once you have logged in with your credentials, you can Get or retrieve an API key. The API key will be posted to you via email.

Start Wandora application. Select menu option File > Extract > Classification > OpenCalais Classifier... while keeping the CTRL key pressed. Wandora opens up a configuration dialog for the OpenCalais extractor. The configuration dialog has only one button with a label Forget OpenCalais APIKEY.

Press the button Forget OpenCalais APIKEY in the configuration dialog of OpenCalais extractor. Wandora resets the pre-stored API key. Close the configuration dialog.

Start the OpenCalais extractor with a menu option File > Extract > Classification > OpenCalais Classifier.... Select Raw tab and enter some text to the text area. Press Extract button.

Now Wandora asks your personal OpenCalais API key. Enter it to the text field and proceed.

If network connection is available, Wandora sends your text to the OpenCalais SOAP api and creates topics and associations that model tags and topics of the content of your text file.

I hope this helps you using Wandora's OpenCalais classifier/extractor.

Wandora keeps your API key stored during the use session. If you restart Wandora application, you need to reset the pre-stored API key once again. We'll remove the pre-stored API key in the next Wandora release. At the moment we have not decided the schedule for the next Wandora release.

Remember that you can't extract gzipped tar files. Also read the limitations of OpenCalais service. There is a maximum content length and the number of daily API requests is limited.

Kind Regards,
Aki / Wandora Team
akivela
Site Admin
 
Posts: 256
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Re: Extracting Reuters using OpenCalais

Postby manno » Sun Feb 23, 2014 12:05 pm

Hello Aki

Thanks for you kind help, it worked and helped me.
manno
 
Posts: 3
Joined: Fri Feb 14, 2014 8:44 pm


Return to How to... and problems

Who is online

Users browsing this forum: No registered users and 1 guest

cron