Topic map conversion of OpenCyc
OpenCyc is a large general knowledge base and commonsense reasoning engine, and limited version of the Cyc. Topic map conversion of the OpenCyc is based on RDF conversion of the OpenCyc provided by Stephen L. Reed for his Texai project. Topic map conversion was created with Wandora's RDF import feature following light manual processing.
Contents |
Download
There are two versions of the OpenCyc topic map available:
- OpenCyc Wandora project file (14.6MB) is targeted for Wandora users. Wandora requires at least 1.4G of memory to open the OpenCyc project file successfully.
- OpenCyc XTM dump (zipped 14.8MB, uncompressed 250 MB) is targeted for all topic map applications capable to import XTM format.
History
- 2008-07-15. First version published.
Metrics
Metrics have been measured on OpenCYC layer of the Wandora project file. The XTM dump metrics may differ a bit.
- Number of topics: 120410
- Number of associations: 424064
- Number of topic base names: 120409
- Number of subject identifiers: 120415
- Number of subject locators: 0
- Number of occurrences: 244173
- Number of distinct topic classes: 1
- Number of distinct types of associations: 73
- Number of distinct roles in associations: 4
- Number of distinct players in associations: 116212
- Average clustering coefficient: 0.16878
Conversion details
Topic map conversion of OpenCyc has a navigation structure:
- OpenCyc (http://www.wandora.org/opencyc) is a subclass of Wandora class topic. It collects both the OpenCyc types and root node of OpenCyc i.e. Thing topic together.
- OpenCyc Types (http://www.wandora.org/opencyc/types) is a subclass of OpenCyc. It collects all OpenCyc's association and occurrence types as instances.
- Thing (http://www.w3.org/2002/07/owl#Thing) is a root node of OpenCyc ontology. It can be used to navigate anywhere in the ontology. However, it appears to contain a lot more subclasses than OpenCyc Upper Ontology diagrams usually suggest. Thing is also a subclass of OpenCyc topic.
Each OpenCyc topic contains a subject identifier of format http://sw.cyc.com/2006/07/27/cyc/Concept where Concept is the CycLConstant i.e. #$Concept. Subject identifier resolves a WWW page of the concept. In some cases subject identifier is equivalent to a concept of RDFS and OWL vocabulary. Such concepts are domain with SI http://www.w3.org/2000/01/rdf-schema#domain and subPropertyOf with SI http://www.w3.org/2000/01/rdf-schema#subPropertyOf for example.
Each OpenCyc topic contains a base name equal to CycLConstant. For example the topic for a concept #$DistributedFilesystem has a base name DistributedFilesystem.
Most OpenCyc topics contains occurrences for prettyString's of the OpenCyc concept. PrettyString is a string representation of the concept. You could think it as the variant name of the OpenCyc topic. However, variant names are not used to model prettyStrings. Design decision is a consequence of an idea to keep OpenCyc changes minimal. Occurrence's type is prettyString http://sw.cyc.com/2006/07/27/cyc/prettyString and scope Lang.indep. http://www.wandora.org/core/langindependent.
Most OpenCyc topics contain occurrences for prettyString-Canonical. These occurrences are similar to prettyStrings except the name is canonical.
Many OpenCyc topics contain occurrence for a comment. Comment is a free text description of the OpenCyc concept. Usually comment contains references to other OpenCyc concepts with a prefix #$. Topic maps have no standard mechanism to link a topic in occurrence text and Wandora user has not automated method to follow occurrence links in Wandora. However, Wandora features a special tool Topics > Associations > Find associations in occurrences... used to extract associations out of occurrence texts but user has to use the tool manually.
Two basic relations in OpenCyc are isa and genls. First one, the isa is a individual-collection relation identical to class-instance relation specified in Topic Map standard. However, standard Topic Map relation was not used to represent OpenCyc's isa relations. The problem was that Wandora's data model doesn't contain explicit association type topic nor role topics for class-instance relation. This limitation inhibits other associations for the class-instance association type. For example it is impossible to specify a subclass for the class-instance relation. OpenCyc contains not only subclasses for the class-instance relation but also inverse and subproperty relations. Thus, a separate association type and roles were constructed to represent openCyc's isa relations. Association type's base name is isa and SI http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Role topics are discussed below. If you require different association type or roles use Wandora's Change type and Change role tools in context of isa association table.
Note: Some other knowledge representation languages such as OBO use isa to represent superclass-subclass relation.
Note 2: Yes, Topic Map standard specifies explicit topics for class-instance relation but Wandora doesn't use them at the moment.
Other typical OpenCyc relation is genls (generalizes). Relation is equal to superclass-subclass relation. As superclass-subclass relation has explicit type and role topics, genls relations were mapped to standard Topic Map constructs. However, association type's base name is overwritten and is genls.
Other widely used association types are genlInverse, domain, range, quotedIsa, etc. Association type names are identical to original OpenCyc concepts. See openCyc documentation for detailed description of relation semantics.
OpenCyc represents relations using Lisp type formulas such as
( #$isa #$BramStoker #$FantasyWriter )
It is rather straightforward to convert this triplet to a binary association where first list member is association type and rest two members construct association players. However, it is not as straightforward to conclude which roles to use. One option is to conclude roles using association type. Isa relation uses always same roles, class and instance for example. Other method is to conclude roles using player topics. If #$BramStoker is a #$FantasyWriter then #$BramStoker's role is #$FantasyWriter. However, OpenCyc addresses relation slots using numbers i.e. #$BramStoker is the first argument of #$isa relation and #$FantasyWriter is the second argument of the relation. OpenCyc addresses relation slots by numbers while specifying which kind of terms can be used within the slot and what is the format of the argument used within the slot. These OpenCyc schema topics are instances of OpenCYC Types and are named argXFormat and argXGenl where X is the number of addressed slot in relation. Thus almost all associations in Topic Map conversion of OpenCyc uses simply roles arg1 and arg2. Otherwise the implicit relation between schema and the slot would be lost. One could not conclude which role is the first one etc. Only superclass-subclass associations use roles specified in Topic Map standard to ease the navigation within OpenCyc topic map.
Note: Topic Map standard specifies no equal constructs to define a schema. Thus OpenCyc's schema is just stored within the Topic Map conversion.
Below is a screenshot of Wandora with OpenCyc topic map open. Left column contains topic tree opened in Thing > FictionalThing > Fictional Character. FictionalThing has been opened to the right column topic panel. Topic panel views topic's subject identifier, prettyString and comment occurrences and associations. There are both genls i.e. superclass-subclass and isa i.e. class-instance associations where FictionalThing is a player.
Limitations
- The topic map conversion contains only OpenCyc's binary relations.
- The topic map conversion contains only atomic terms of OpenCyc.
- Topic Maps do not support semantics of many OpenCyc relations.
- Each Cyc topic contains at most one arbitrary selected PrettyString and PrettyStringCanonical.
Development ideas and additional notes
- Having converted both OpenCyc and Wordnet to a topic map it would be very straightforward to construct an adapter topic map to merge equivalent topics in OpenCyc and Wordnet. This adapter topic map would contain only topic stubs with two subject identifiers. First subject identifier would refer the topic in Wordnet. Second subject identifier would refer the equivalent topic in OpenCyc. Importing Wordnet topic map, adapter topic map, and OpenCyc topic map to Wandora would construct a seamless knowledge package merging both Wordnet and OpenCyc.
- I assume such system requires more than 1.4G of memory. One should be able to give JRE more than 1.4G of memory to use such systems. JRE is able to address only ~1.5G in current 32bit operating systems. Thus you need a 64bit operating system and JRE to access enough memory to import Wordnet, OpenCyc, and adapter simultaneously. Other option is to use slower database topic maps.
- As Wandora Team has no plans to implement such adapter topic map, I would throw a ball to the Topic Maps community. If you are interested in such a "hobby", please share your experiences at Wandora Forum.
- Current Wandora version features also OpenCyc web api extractor. Topic map fragments generated by the OpenCyc extractor are not compatible with the complete Topic map conversion of OpenCyc reviewed in this document.
License
GNU General Public License (GPL)