Limits...
The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements.

Altenhoff AM, Škunca N, Glover N, Train CM, Sueki A, Piližota I, Gori K, Tomiczek B, Müller S, Redestig H, Gonnet GH, Dessimoz C - Nucleic Acids Res. (2014)

Bottom Line: The Orthologous Matrix (OMA) project is a method and associated database inferring evolutionary relationships amongst currently 1706 complete proteomes (i.e. the protein sequence associated for every protein-coding gene in all genomes).In this update article, we present six major new developments in OMA: (i) a new web interface; (ii) Gene Ontology function predictions as part of the OMA pipeline; (iii) better support for plant genomes and in particular homeologs in the wheat genome; (iv) a new synteny viewer providing the genomic context of orthologs; (v) statically computed hierarchical orthologous groups subsets downloadable in OrthoXML format; and (vi) possibility to export parts of the all-against-all computations and to combine them with custom data for 'client-side' orthology prediction.OMA can be accessed through the OMA Browser and various programmatic interfaces at http://omabrowser.org.

View Article: PubMed Central - PubMed

Affiliation: University College London, Gower Street, London WC1E 6BT, UK Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland.

Show MeSH
Gene Ontology propagation in the OMA pipeline. New Gene Ontology (GO) annotations for the sparsely annotated Arabidopsis thaliana protein Q8VYZ5 are inferred by propagating annotations from other members of the OMA group, taking into account implied parental terms and lineage-specific terms (see main text). For example, the inferred biological process Gene Ontology (GO) term ‘post-embryonic development’ is based on the more specific GO term ‘nematode larval development’; the latter is in itself inappropriate to assign to a protein in the plant clade. Proteins are labelled with their SwissProt/UniProt identifiers. The abbreviations ARATH, CAEEL, SCHIPO, DROME, HUMAN and YEAST refer to species Arabidopsis thaliana, Caenorhabditis elegans, Schizosaccharomyces pombe, Drosophila melanogaster, Homo sapiens and Saccharomyces cerevisiae, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4383958&req=5

Figure 2: Gene Ontology propagation in the OMA pipeline. New Gene Ontology (GO) annotations for the sparsely annotated Arabidopsis thaliana protein Q8VYZ5 are inferred by propagating annotations from other members of the OMA group, taking into account implied parental terms and lineage-specific terms (see main text). For example, the inferred biological process Gene Ontology (GO) term ‘post-embryonic development’ is based on the more specific GO term ‘nematode larval development’; the latter is in itself inappropriate to assign to a protein in the plant clade. Proteins are labelled with their SwissProt/UniProt identifiers. The abbreviations ARATH, CAEEL, SCHIPO, DROME, HUMAN and YEAST refer to species Arabidopsis thaliana, Caenorhabditis elegans, Schizosaccharomyces pombe, Drosophila melanogaster, Homo sapiens and Saccharomyces cerevisiae, respectively.

Mentions: To infer GO annotations, we start with curated annotations that are based on direct evidence from the literature: GO evidence codes EXP, IDA, IPI, IMP, IGI and IEP (http://geneontology.org/page/guide-go-evidence-codes). We then propagate them across OMA groups—sets of genes for which all members are inferred to be mutually orthologous—as these have been previously shown to be highly coherent in terms of functional annotations (25). Additionally, to avoid over-propagating clade-specific terms (e.g. ‘nematode larval development’ outside the nematodes), we require that propagated terms be used in at least one literature-based annotation in the clade in question. For example, the OMA group with fingerprint ‘VWQCDTP’ contains a Caenorhabditis elegans gene annotated with the GO term ‘nematode larval development’ (Figure 2); this term is not appropriate for genes outside of the Nematoda phylum. Therefore, when propagating this GO term to, for example, the poorly annotated Arabidopsis thaliana protein within the same OMA group, we only propagate those parent terms of ‘nematode larval development’ that are known to be associated with plant proteins; in this case, the most specific amongst those is ‘post-embryonic development’ (Figure 2). Indeed, the propagated annotation complements one of the known annotations for the A. thaliana protein, ‘embryo sac development’.


The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements.

Altenhoff AM, Škunca N, Glover N, Train CM, Sueki A, Piližota I, Gori K, Tomiczek B, Müller S, Redestig H, Gonnet GH, Dessimoz C - Nucleic Acids Res. (2014)

Gene Ontology propagation in the OMA pipeline. New Gene Ontology (GO) annotations for the sparsely annotated Arabidopsis thaliana protein Q8VYZ5 are inferred by propagating annotations from other members of the OMA group, taking into account implied parental terms and lineage-specific terms (see main text). For example, the inferred biological process Gene Ontology (GO) term ‘post-embryonic development’ is based on the more specific GO term ‘nematode larval development’; the latter is in itself inappropriate to assign to a protein in the plant clade. Proteins are labelled with their SwissProt/UniProt identifiers. The abbreviations ARATH, CAEEL, SCHIPO, DROME, HUMAN and YEAST refer to species Arabidopsis thaliana, Caenorhabditis elegans, Schizosaccharomyces pombe, Drosophila melanogaster, Homo sapiens and Saccharomyces cerevisiae, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4383958&req=5

Figure 2: Gene Ontology propagation in the OMA pipeline. New Gene Ontology (GO) annotations for the sparsely annotated Arabidopsis thaliana protein Q8VYZ5 are inferred by propagating annotations from other members of the OMA group, taking into account implied parental terms and lineage-specific terms (see main text). For example, the inferred biological process Gene Ontology (GO) term ‘post-embryonic development’ is based on the more specific GO term ‘nematode larval development’; the latter is in itself inappropriate to assign to a protein in the plant clade. Proteins are labelled with their SwissProt/UniProt identifiers. The abbreviations ARATH, CAEEL, SCHIPO, DROME, HUMAN and YEAST refer to species Arabidopsis thaliana, Caenorhabditis elegans, Schizosaccharomyces pombe, Drosophila melanogaster, Homo sapiens and Saccharomyces cerevisiae, respectively.
Mentions: To infer GO annotations, we start with curated annotations that are based on direct evidence from the literature: GO evidence codes EXP, IDA, IPI, IMP, IGI and IEP (http://geneontology.org/page/guide-go-evidence-codes). We then propagate them across OMA groups—sets of genes for which all members are inferred to be mutually orthologous—as these have been previously shown to be highly coherent in terms of functional annotations (25). Additionally, to avoid over-propagating clade-specific terms (e.g. ‘nematode larval development’ outside the nematodes), we require that propagated terms be used in at least one literature-based annotation in the clade in question. For example, the OMA group with fingerprint ‘VWQCDTP’ contains a Caenorhabditis elegans gene annotated with the GO term ‘nematode larval development’ (Figure 2); this term is not appropriate for genes outside of the Nematoda phylum. Therefore, when propagating this GO term to, for example, the poorly annotated Arabidopsis thaliana protein within the same OMA group, we only propagate those parent terms of ‘nematode larval development’ that are known to be associated with plant proteins; in this case, the most specific amongst those is ‘post-embryonic development’ (Figure 2). Indeed, the propagated annotation complements one of the known annotations for the A. thaliana protein, ‘embryo sac development’.

Bottom Line: The Orthologous Matrix (OMA) project is a method and associated database inferring evolutionary relationships amongst currently 1706 complete proteomes (i.e. the protein sequence associated for every protein-coding gene in all genomes).In this update article, we present six major new developments in OMA: (i) a new web interface; (ii) Gene Ontology function predictions as part of the OMA pipeline; (iii) better support for plant genomes and in particular homeologs in the wheat genome; (iv) a new synteny viewer providing the genomic context of orthologs; (v) statically computed hierarchical orthologous groups subsets downloadable in OrthoXML format; and (vi) possibility to export parts of the all-against-all computations and to combine them with custom data for 'client-side' orthology prediction.OMA can be accessed through the OMA Browser and various programmatic interfaces at http://omabrowser.org.

View Article: PubMed Central - PubMed

Affiliation: University College London, Gower Street, London WC1E 6BT, UK Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland.

Show MeSH