OBO flat file import

From WandoraWiki
Jump to: navigation, search

Wandora's OBO flat file import feature converts OBO flat file ontology to a topic map and merges converted topic map to Wandora. Import begins with menu option File > Import > OBO import.... Wandora accepts also OBO file drops. If OBO file is dropped to layer stack, new layer is created for the imported file.

OBO flat file format is used mainly in bioinformatics to store and share ontologies related to biosciences. OBO flat file format was initially developed for The Gene Ontology. However, there exists over 60 public ontologies in OBO format today. These ontologies can be browsed and downloaded at Open Biological Ontologies Foundry.

As an example of OBO import we have converted the Gene Ontology to a topic map. Below is a screenshot of Wandora with Gene Ontology topic map open. Addition to OBO import, Wandora is also capable to export topic map back to OBO flat file format. Read more at OBO flat file export and OBO round trip.


Gene ontology.gif


Conversion details

OBO import creates a root topic for each namespace. Root topic is named obo (namespace) where namespace is a name of the namespace. Root topic is associated to a collection of namespace related meta-topics:

  • category (namespace) collects all subsets specified in the OBO file. Subset is a named category of terms in OBO file. Subset is defined in OBO file header with tag subsetdef. If namespace doesn't define any categories then collecting category (namespace) is not created.
  • header (namespace) is a header topic collecting OBO header properties. Each header property is stored as a text occurrences with a type generated using header tag name.
  • term (namespace) collects all OBO terms in given namespace. OBO term is an instance of the term (namespace) topic.
  • obsolete (namespace) collects all obsolete terms in given namespace. Obsolete OBO term is an instance of the obsolete (namespace) topic. Obsolete term topic is also an instance of term (namespace).
  • synonym (namespace) collects all OBO term synonyms in given namespace. OBO term synonym is an instance of the synonym (namespace) topic.
  • description (namespace) collects all dbxref description topics.
  • typedef collects all relationship topics used in OBO file. Typical relationships used in OBO file are part_of and develops_from. Namespace is not used in relationship names.

Next chapters describe conversion details of OBO terms, instances and type definitions. It is assumed the readed is familiar with Topic Maps and OBO flat file format.

Long version of conversion details can be found in paper Kivelä A.: OBO-ontologioiden kuvaaminen Topic Map-muotoon. MSc Theses, 2008. (in Finnish). (English Abstract).

Converting OBO terms

OBO term is specified with [Term] stanza. OBO terms are also called OBO classes. For example Gene Ontology contains term stanza

 [Term]
 id: GO:0010480
 name: microsporocyte differentiation
 namespace: biological_process
 def: "The process aimed at the progression of a microsporocyte cell over time, from initial commitment of the cell to a specific fate, to the fully functional differentiated cell. A microsporocyte is a diploid (2n) cell that undergoes meiosis and forms four haploid (1n) microspores; also called microspore mother cell and, in seed plants, pollen mother cell." [CL:000248 "Cell type ontology", PMID:16751349]
 synonym: "pollen mother cell differentiation" RELATED []
 is_a: GO:0030154 ! cell differentiation
 relationship: part_of GO:0048653 ! anther development

This stanza defines OBO term with unique id and name. It also gives a definition for the term and specifies one synonym, "pollen mother cell differentiation" for the term. Term specification includes two relationships. Is_a defines superclass of the stanza topic. microsporocyte differentiation is a subclass of GO:0030154 also named as cell differentiation. Exclamation mark starts a comment but it is usual that related term id is named with the comment. The example stanza term has also general relationship to GO:0048653. This general relationship definition contains a modifier part_of describing the type of relationship. To get an overview of term conversion examine the screenshot below representing Wandora conversion of the example stanza.


Go0010480.gif


In general each OBO term is converted a topic. Wandora gives term topic a subject identifier constructed using term id. Subject identifier pattern is

http://www.wandora.org/obo/ID

where ID is term's id. Thus the example term above GO:0010480 gets subject identifier http://www.wandora.org/obo/GO:0010480. Wandora gives term topic also a base name constructed with term name and id. As an example term GO:0010480 is given a base name microsporocyte differentiation (GO:0010480). Term name is also set as English display variant name of the term topic. If term stanza doesn't specify name for the term, base name and variant name are undefined.

Term id is also attached to the term topic as obo-id text occurrence. Term definition and comment are also attached to the term topic as text occurrences of type definition and comment. Wandora creates topics for definition origin and origin description. An association of type definition-origin is created to link term topic, defining authorship, and description of the authorship. Wandora creates one definition-origin association for each definition origin.

Wandora creates a stub topic for each alternative id using the subject identifier schema described above and links term topic with the alternative term using alternative-term associations.

Wandora creates an association of type namespace to link term topic and it's namespace.

Xrefs are used in OBO format to link similar terms in external ontologies. Wandora creates a stub topic for xref term and links the term topic and xref topic with a xref association. It is assumed the xref term gets detailed structure and properties within a merge of ontology describing the term. Deprecated variants xref_analog and xref_unknown used in some OBO ontologies are also converted to xref associations.

OBO term may have multiple synonym names. Each synonym name has scope, type, and origin. Although scope and type are also features found in variant names of Topic Maps, OBO synonym is not converted to a variant name but a topic associated to the term. Design decision is due to a rather rigid variant name schema of Wandora. Instead of a variant name, Wandora creates a topic for each term synonym, scope, type, origin, and origin description. An association of type synonym is created to link the term and synonym. If synonym has scope, type, and described origin, the association has 6 players. If synonym contains two or more described origins, these synonyms are considered separate. Separate association with maximum six players is created for same synonym with different origin identifier and description.

A note should be taken here. Notice the origin description is not a real property of the origin but a floating property i.e. a player of each synonym and definition-origin association. This design decision is due to an observation that dbxref description used in synonyms and definitions are not consistent. Actually some OBO ontologies such as Protein Modification ontology psi-mod uses dbxrefs and their descriptions as if they were slots and properties. I don't really know if this was the intention of OBO file format authors but the format allows such usage. However, this means that same dbxref may have different description elsewhere in the ontology and each description is valid only in given context.

Some OBO ontologies use also a deprecated synonym tags exact_synonym, narrow_synonym, and broad_synonym. These synonym variants are converted to synonym associations with scope modifier set to EXACT, NARROW, and BROAD.

If term is considered obsolete, stanza includes a tag is_obsolete with value true. Term topic is marked as obsolete by classification. Term topic is set as an instance of obsolete topic.

OBO term may also relate to other OBO terms. Typical relationships are is_a, intersection_of, union_of, disjoint_from, and a general purpose relationship. Wandora creates an association for each term relation. Relationship type specifies the association type and roles. Usually relationship associations are binary except when modifier is defined. When modifier is available relationship association has three players. Reader should also note Topic Map standard doesn't restrict associations. There is no automated mechanism available to detect intersections doesn't contain internal conflicts for example.

is_a

is_a relationships are converted to superclass-subclass associations with standard roles given in topic map specification. Below is an example graph viewed by Wandora (except blue arrows and labels) containing few is_a aka superclass-subclass relations of the Gene Ontology.</dd>


Gene ontology graph example.gif

intersection_of

intersection_of relationships are converted to intersection-of associations where stanza term topic plays a role term and relationship topic plays a role related-term. Wandora does not ensure the relationship is used consistently. If OBO relationship contains a modifier, a topic is created for the modifier and it is added to the the association as third player with role modifier.</dd>

union_of

Union relationships are converted to union-of associations where stanza term topic plays a role term and relationship topic plays a role related-term. Wandora does not ensure the relationship is used consistently. If OBO relationship contains a modifier, a topic is created for the modifier and it is added to the the association as third player with role modifier.</dd>

disjoint_from

Relationships are converted to disjoint-from associations using a schema similar to union.</dd>

part_of

Relationships are converted to part-of associations using a schema similar to union. Some OBO ontologies also use variant tags integral_part_of, proper_part_of, improper_part_of. These variant tags are converted to a part-of associations with a predefined modifier INTEGRAL, PROPER, and IMPROPER.</dd>

Some OBO ontologies also use reverse relation has_part and has_improper_part. These reverse relations are also converted to part-of associations but association roles are swapped. Stanza topic plays a role of related-term and the tag term topic plays a role term.</dd>

relationship

Relationship specifies a general relation between terms. Usually relation type is described using a modifier. If Wandora finds such modifier, a topic is created for the modifier and it is used as a type within an association between term topics. Within the association, stanza term topic plays a role term and relationship topic plays a role related-term.</dd>

is_anonymous

Is a boolean type tag property used to define the status of term id. Tag and it's value are converted to a binary association of type is-anonymous and roles is-anonymous and term.</dd>

consider

If consider relation is specified in the stanza with term id, this specified term should be used instead of the stanza term. Consider relation is converted to a binary association with type consider-using and roles consider-using and term. Deprecated stanza tag use_term is processed as it was consider relation.</dd>

replaced_by

replaced_by specifies term id used to replace obsolete terms. replaced_by tag should appear only in obsolete stanzas. However, Wandora doesn't check this requirement but allows replaced_by tags to be used in any stanza. Tag is converted to a binary association of type replaced-by and roles replaced-by and term. </dd>

Converting OBO Instances

OBO instance stanza is similar to a term stanza described above. In general the conversion process is similar to terms. OBO instance has also an unique id and name to identify the instance. In general a topic is created to an instance stanza. Topic's subject identifier is generated using the id and prefix http://www.wandora.org/obo/. Notice that same schema is used to subject identify term topics also. Similar schema opens up possibility to merge term and instance topics. However, it is assumed that instances have different ids or id space compared to terms and instances don't merge with terms.

Wandora gives OBO instance topic also a base name composed with stanza name and id. Instance topic gets also English variant name composed with stanza name. Naming schema is similar to terms.

OBO instance may also contain alternative ids. A stub topic is created with alternative id and it is associated with instance topic. Association type is alternative-term. Instance topic plays a role instance and alternative id a role alternative-term. Naming convention is similar to terms but undeniably little confusing.

Namespace of OBO instance is associated with instance topic using association type namespace, and roles namespace and instance.

Xref, synonym, comment, definition, consider, replaced_by, is_obsolete, and is_anonymous tags are converted as described above in term chapter.

Class of the instance is defined with tag instance_of. Tag follows id of an OBO term. Remember OBO terms are considered as classes. instance_of tag is converted to an association of type instance-of and roles instance and term. Notice a built-in typing feature of topic maps is not used to represent OBO instances! This is due to a design decision where built-in types and instances are used to group OBO topics.

OBO instances may contain property-values. Property values are defined in the stanza with property_value tag. Tag follows property name and value, and optional property type. At the moment Wandora handles this set as a whole. Wandora creates one topic containg property name, value, and optional type. This rather clumsy property handling may change in future versions of Wandora.

Converting OBO type definitions

OBO type definition specifies a relation between OBO terms. Wandora only stores type definitions! Type definitions do not restrict or guide topic map editing in Wandora. Example of type definition stanza defining develops_from relation is below

 [Typedef]
 id: develops_from
 name: develops_from
 is_transitive: true

A topic is created for the type definition. Type definition has unique id and name. Type topic is given a subject identifier composed with id and prefix http://www.wandora.org/obo/. Also a base name and English display variant are set like in terms.

In general tags are converted as described above. However, type definitions may contain tags not found in terms and instances. Such tags and their conversion procedures are

  • domain converts to a binary association domain where type topic plays a role typedef and domain plays a role domain.
  • range converts to a binary association range where type topic plays a role typedef and range plays a role range. Range value is handled as an entity.
  • inverse_of converts to a binary association inverse-of.
  • transitive_over converts to a binary association transitive-over.
  • is_cyclic converts to a binary association is-cyclic.
  • is_reflexive converts to a binary association is-reflexive.
  • is_symmetric converts to a binary association is-symmetric.
  • is_anti_symmetric converts to a binary association is-anti_symmetric.
  • is_transitive converts to a binary association is-transitive.
  • is_metadata_tag converts to a binary association is-metadata-tag.

Post-processing the OBO imported Topic Map

OBO ontology should use is_a and part_of relations consistently and without conflicts (True Path Rule). Such usage ensures OBO ontology has only a limited set of root terms. Root term of is_a relations is a superclass of every other term. Root term of part_of relations has no enclosing terms. Finding such root topics is important as they are convenient starting points to browse the ontology. However, Wandora doesn't automatically search and mark such term topics but leaves this to the Wandora user. To find root topics of superclass-subclass associations

  • Open any topic containing such association.
  • Select the subclass term in association table.
  • Right click and select option Open edge of associations. Wandora travels the supeclass-subclass associations as far as it can and opens the last superclass topic found in association chain.

Same applies to part-of associations but you should select part-of player in the association table. Some notes follow

  • If there are multiple root topics, Wandora opens first one.
  • If association path is cyclic i.e. Wandora faces a node already passed, Wandora stops and opens the topic.
  • Same method can be used to find a leaf node - the other edge - of association tree. To open leaf node with Open edge of associations user should select superclass term instead of subclass term in association table. First leaf node is opened.

Wandora also features a tool Copy edge path of associations copying each passed topic to the clipboard. As a result the clipboard contains a tab separated list of base names. Last name represents the edge of association chain. Copy edge path of associations locates near the Open edge of associations.


Personal tools