The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements.
Bottom Line: The Orthologous Matrix (OMA) project is a method and associated database inferring evolutionary relationships amongst currently 1706 complete proteomes (i.e. the protein sequence associated for every protein-coding gene in all genomes).In this update article, we present six major new developments in OMA: (i) a new web interface; (ii) Gene Ontology function predictions as part of the OMA pipeline; (iii) better support for plant genomes and in particular homeologs in the wheat genome; (iv) a new synteny viewer providing the genomic context of orthologs; (v) statically computed hierarchical orthologous groups subsets downloadable in OrthoXML format; and (vi) possibility to export parts of the all-against-all computations and to combine them with custom data for 'client-side' orthology prediction.OMA can be accessed through the OMA Browser and various programmatic interfaces at http://omabrowser.org.
Affiliation: University College London, Gower Street, London WC1E 6BT, UK Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland.Show MeSH
Mentions: To infer GO annotations, we start with curated annotations that are based on direct evidence from the literature: GO evidence codes EXP, IDA, IPI, IMP, IGI and IEP (http://geneontology.org/page/guide-go-evidence-codes). We then propagate them across OMA groups—sets of genes for which all members are inferred to be mutually orthologous—as these have been previously shown to be highly coherent in terms of functional annotations (25). Additionally, to avoid over-propagating clade-specific terms (e.g. ‘nematode larval development’ outside the nematodes), we require that propagated terms be used in at least one literature-based annotation in the clade in question. For example, the OMA group with fingerprint ‘VWQCDTP’ contains a Caenorhabditis elegans gene annotated with the GO term ‘nematode larval development’ (Figure 2); this term is not appropriate for genes outside of the Nematoda phylum. Therefore, when propagating this GO term to, for example, the poorly annotated Arabidopsis thaliana protein within the same OMA group, we only propagate those parent terms of ‘nematode larval development’ that are known to be associated with plant proteins; in this case, the most specific amongst those is ‘post-embryonic development’ (Figure 2). Indeed, the propagated annotation complements one of the known annotations for the A. thaliana protein, ‘embryo sac development’.
Affiliation: University College London, Gower Street, London WC1E 6BT, UK Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland.