OBO flat file import
(→Conversion details) |
(→Conversion details) |
||
Line 13: | Line 13: | ||
OBO import creates a root topic for each namespace. Root topic is named '''obo (namespace)''' where '''namespace''' is a name of the namespace. Namespace specific root topic is associated to a collection of meta-topics: | OBO import creates a root topic for each namespace. Root topic is named '''obo (namespace)''' where '''namespace''' is a name of the namespace. Namespace specific root topic is associated to a collection of meta-topics: | ||
− | * '''category (namespace)''' collects all subsets specified in the OBO file. Subset is a named category of terms in OBO file. Subset is defined in OBO file header with tag '''subsetdef'''. If namespace doesn't | + | * '''category (namespace)''' collects all subsets specified in the OBO file. Subset is a named category of terms in OBO file. Subset is defined in OBO file header with tag '''subsetdef'''. If namespace doesn't define categories '''category (namespace)''' is also missing. |
* '''header (namespace)''' is a header topic collecting OBO header properties. Each header property is stored as a text occurrences with a type generated using header tag name. | * '''header (namespace)''' is a header topic collecting OBO header properties. Each header property is stored as a text occurrences with a type generated using header tag name. | ||
* '''term (namespace)''' collects all OBO terms in given namespace. OBO term is an instance of the '''term (namespace)''' topic. | * '''term (namespace)''' collects all OBO terms in given namespace. OBO term is an instance of the '''term (namespace)''' topic. |
Revision as of 16:03, 29 January 2008
Converts OBO flat file v1.2 ontology to a topic map and merges converted topic map to Wandora. Import begins with menu option File > Import > OBO import.... Wandora accepts also OBO files to be dropped over Wandora window. If OBO file is dropped to layer stack, new layer is created for the imported file.
OBO flat file format is used mainly in bioinformatics to store and share ontologies related to biosciences. OBO flat file format was initially developed for The Gene Ontology. However, there exists over 60 different and public ontologies in OBO format today. These ontologies can be browsed and downloaded at Open Biological Ontologies Foundry.
As an example of OBO import we have converted the Gene Ontology to a topic map. Below is a screenshot of Wandora with Gene Ontology topic map open. Addition to OBO import, Wandora is also capable to export topic map back to OBO flat file format. Read more at OBO flat file export and OBO round trip.
Conversion details
OBO import creates a root topic for each namespace. Root topic is named obo (namespace) where namespace is a name of the namespace. Namespace specific root topic is associated to a collection of meta-topics:
- category (namespace) collects all subsets specified in the OBO file. Subset is a named category of terms in OBO file. Subset is defined in OBO file header with tag subsetdef. If namespace doesn't define categories category (namespace) is also missing.
- header (namespace) is a header topic collecting OBO header properties. Each header property is stored as a text occurrences with a type generated using header tag name.
- term (namespace) collects all OBO terms in given namespace. OBO term is an instance of the term (namespace) topic.
- obsolete (namespace) collects all obsolete terms in given namespace. Obsolete OBO term is an instance of the obsolete (namespace) topic. Obsolete term topic is also an instance of term (namespace).
- synonym (namespace) collects all OBO term synonyms in given namespace. OBO term synonym is an instance of the synonym (namespace) topic.
- description (namespace) collects all dbxref description topics.
- typedef collects all relationship topics used in OBO file. Typical relationships used in OBO file are part_of and develops_from. Namespace is not used in relationship names.
OBO terms
OBO term is described with [Term] stanza in OBO file. Each OBO term is converted a topic. Wandora gives term topic a subject identifier constructed using term id. Subject identifier pattern is
http://www.wandora.org/obo/ID
where ID is term's id. As an example term GO:0010480 gets subject identifier http://www.wandora.org/obo/GO:0010480. Wandora gives term topic also a base name constructed with term name and id. As an example term GO:0010480 is given a base name microsporocyte differentiation (GO:0010480). Term name is set as English display variant name of the term topic.
Additionally term id is attached to the term topic as obo-id text occurrence. Term definition and comment are also attached to the term topic as text occurrences of type definition and comment. Wandora creates topics for definition origin and origin description. An association of type definition-origin is created to link term topic, defining authorship, and description of the authorship. Wandora creates one definition-origin association for each definition origin.
Wandora creates a stub topic for each alternative id using the subject identifier schema described above and links term topic with the alternative term using alternative-term associations.
Wandora creates an association of type Namespace to link term topic and it's namespace.
Xrefs are used in OBO format to link similar terms in external ontologies. Wandora creates a stub topic for xref term and links the term topic and xref topic with a xref association. It is assumed the xref term gets detailed structure and properties within a merge of ontology describing the term.
OBO term may have multiple synonym names. Each synonym name has scope, type, and origin. Although scope and type are also features found in variant names of Topic Maps, OBO synonym is not converted to a variant name but a topic associated to the term. Design decision is due to a rather rigid variant name schema of Wandora. Instead of a variant name, Wandora creates a topic for each term synonym, scope, type, origin, and origin description. An association of type synonym is created to link the term and synonym. If synonym has scope, type, and described origin, the association has 5 players.
A note should be taken here. Notice the origin description is not a real property of the origin but a floating property i.e. a player of each synonym and definition-origin association. This design decision is due to an observation that dbxref description used in synonyms and definitions are not consistent. Actually some OBO ontologies such as Protein Modification ontology psi-mod uses dbxrefs and their descriptions as if they were slots and properties. I don't really know if this was the intention of OBO file format authors but the format allows such usage. However, this means that same dbxref may have different description along the ontology and each description is valid only in given context.
OBO term may also relate to other OBO terms. Typical relationships are is_a, intersection_of, union_of, disjoint_from, and a general purpose relationship. Wandora creates an association for each term relation. Relationship type specifies the association type and roles.
is_a
is_a relationships are converted to superclass-subclass associations with standard roles given in topic map specification.
intersection_of
intersection_of relationships are converted to intersection-of associations where stanza term topic plays a role term and relationship topic plays a role related-term. Wandora does not ensure the relationship is used consistently. If OBO relationship contains a modifier, a topic is created for the modifier and it is added to the the association as third player with role modifier.
union_of
Union relationships are converted to union-of associations where stanza term topic plays a role term and relationship topic plays a role related-term. Wandora does not ensure the relationship is used consistently. If OBO relationship contains a modifier, a topic is created for the modifier and it is added to the the association as third player with role modifier.
disjoint_from
Relationships are converted to disjoint-from associations using a schema similar to union.
part_of
Relationships are converted to part-of associations using a schema similar to union.
relationship
Relationship specifies a general relation between terms. Usually relation type is described using a modifier. If Wandora finds such modifier, a topic is created for the modifier and it is used as a type within an association between term topics. Within the association, stanza term topic plays a role term and relationship topic plays a role related-term.