public class ExtractSIFFKeywords extends AbstractExtractor implements WandoraTool
Class implements special extractor for Sinebrychoff's artwork files. Sinebrychoff's artwork file is basically a database dump file containing artwork specific information. Class extracts only artworks and keywords found in given file(s).
About the file format. In the given file artworks are supposed to be separated with "\n\n" characters. Artwork specific data record begins with a line containing artwork identifier (inventory code). Identifier line is recognized with string "@@NO:". Artwork's keyword record line is recognized with "AH:". Keyword record may contain multiple semicolon separated keywords. Below is an example fragment of the Sinebrychoff's artwork file.
NO:A IV 3299 MU:SFF OM:Valtion taidemuseo NI:INTIALAINEN MINIATYYRI ; Kaksi naista soittaa kahdelle pyhälle miehelle (sadhulle), Rajput-miniatyyri VV:n 1800 MA:vesiväri, kulta ja hopea MI:20x12,7 ME:merk. MJ:takana MS:Rajasthan School SI:V;SFF HA:osto HT:taiteilija Per Stenius OA:1.6.1959 HH:30.000 VY:MV VN: 11085 PÄ:maalaus ER:miniatyyri AH:KOHTAUS;musiikki;naiset;pyhät miehet;intialaiset;koira KI:B.Robinson LI:RV/87
Modifier and Type | Field and Description |
---|---|
boolean |
createArtworkTopics |
CUSTOM_EXTRACTOR, DONE_FAILED, DONE_MANY, DONE_ONE, EXACTLY_GIVEN_URLS, FILE_EXTRACTOR, FILE_PATTERN, GIVEN_URLS_AND_ALL_CRAWLED_DOCUMENTS, GIVEN_URLS_AND_CRAWLED_DOCUMENTS_IN_URL_DOMAIN, GIVEN_URLS_AND_LINKED_DOCUMENTS, GIVEN_URLS_AND_URL_BELOW, INFO_WAIT_WHILE_WORKING, LOG_TITLE, POINT_START_URL_TEXT, RAW_EXTRACTOR, SELECT_DIALOG_TITLE, STRING_EXTRACTOR_NOT_SUPPORTED_MESSAGE, URL_EXTRACTOR
CLOSE, EXECUTE, INVISIBLE, VISIBLE, WAIT
RETURN_ERROR, RETURN_INFO
Constructor and Description |
---|
ExtractSIFFKeywords()
Creates a new instance of ExtractSIFFKeywords
|
Modifier and Type | Method and Description |
---|---|
boolean |
_extractTopicsFrom(java.io.BufferedReader breader,
TopicMap topicMap) |
boolean |
_extractTopicsFrom(java.io.File keywordFile,
TopicMap topicMap) |
boolean |
_extractTopicsFrom(java.lang.String str,
TopicMap topicMap) |
boolean |
_extractTopicsFrom(java.net.URL url,
TopicMap topicMap) |
java.lang.String |
getDescription()
AdminToolManager views tool descriptions while user browses available
tools and build user customizable GUI elements such as Tools menu.
|
java.lang.String |
getGUIText(int textType) |
java.lang.String |
getName()
Tools name represent the tool in UI unless the tool has been given
explicitly another GUI name.
|
boolean |
useTempTopicMap() |
acceptBrowserExtractRequest, addCrawlerUrl, browserExtractorConsumesPlainText, buildSI, buildSL, clearMasterSubject, createAssociation, createAssociation, createTopic, createTopic, createTopic, createTopic, createTopic, createTopic, createTopic, croppedFilename, croppedFilename, croppedUrlString, croppedUrlString, doBrowserExtract, dropExtract, dropExtract, dropExtract, execute, extractTopicsFrom, extractTopicsFrom, extractTopicsFrom, extractTopicsFrom, extractTopicsFromText, getBrowserExtractorName, getContentTypes, getCrawlerMode, getExtractorType, getForceContent, getForceFiles, getForceUrls, getGUIText, getIcon, getInterruptsHandled, getMasterSubject, getType, getWandora, handle, handleContent, handleCustomType, handleFiles, handleForcedContent, handleInterrupt, handleStringContent, handleUrls, initializeCustomType, instantDropHandle, makeSubclassOfWandoraClass, runInOwnThread, setData, setDisplayName, setForceContent, setForceFiles, setForceUrls, setMasterSubject, setMasterSubject, setTopicMap, setupCrawler, setWandora, takeNap, urlEncode, useURLCrawler
addUndoMarker, addUndoMarker, allowMultipleInvocations, clearAllThreads, clearThreads, clearThreads, clearToolLock, clearToolLock, clearToolLocks, configure, execute, execute, forceStop, forceStop, getContext, getCurrentLogger, getDefaultLogger, getHistory, getLastLogger, getState, getThreads, getThreads, getToolMenuItem, getToolMenuItem, getTopicName, hlog, initialize, interruptAllThreads, interruptThreads, interruptThreads, isConfigurable, isRunning, isRunning, lockLog, log, log, log, log, requiresRefresh, run, setContext, setDefaultLogger, setLogTitle, setProgress, setProgressMax, setState, setToolLogger, singleLog, singleLog, singleLog, solveContextTopicMap, solveNameForTopicMap, writeOptions
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
configure, execute, execute, execute, getContext, getIcon, getToolMenuItem, getType, hlog, initialize, isConfigurable, isRunning, log, log, log, log, requiresRefresh, setContext, setToolLogger, writeOptions
forceStop, getHistory, getState, lockLog, setLogTitle, setProgress, setProgressMax, setState
public ExtractSIFFKeywords()
public java.lang.String getName()
AbstractWandoraTool
getName
in interface WandoraTool
getName
in class AbstractExtractor
public java.lang.String getDescription()
AbstractWandoraTool
getDescription
in interface WandoraTool
getDescription
in class AbstractExtractor
public java.lang.String getGUIText(int textType)
getGUIText
in class AbstractExtractor
public boolean _extractTopicsFrom(java.lang.String str, TopicMap topicMap) throws java.lang.Exception
_extractTopicsFrom
in class AbstractExtractor
java.lang.Exception
public boolean _extractTopicsFrom(java.net.URL url, TopicMap topicMap) throws java.lang.Exception
_extractTopicsFrom
in class AbstractExtractor
java.lang.Exception
public boolean _extractTopicsFrom(java.io.File keywordFile, TopicMap topicMap) throws java.lang.Exception
_extractTopicsFrom
in class AbstractExtractor
java.lang.Exception
public boolean _extractTopicsFrom(java.io.BufferedReader breader, TopicMap topicMap) throws java.lang.Exception
java.lang.Exception
public boolean useTempTopicMap()
useTempTopicMap
in class AbstractExtractor
Copyright 2004-2015 Wandora Team