Limits...
Exploiting ontology graph for predicting sparsely annotated gene function.

Wang S, Cho H, Zhai C, Berger B, Peng J - Bioinformatics (2015)

Bottom Line: There exist a variety of algorithms for function prediction, but none properly address this 'overfitting' issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog.Our method is scalable to datasets with a large number of annotations.In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and Department of Mathematics, MIT, Cambridge, MA, USA.

Show MeSH

Related in: MedlinePlus

A breakdown of GO labels by the number of annotated genes in (a) human and (b) yeast
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4542782&req=5

btv260-F1: A breakdown of GO labels by the number of annotated genes in (a) human and (b) yeast

Mentions: Despite the success of existing algorithms, a major difficulty that has not been sufficiently addressed is that of predicting rare labels. Because many molecular functions (MFs) are inherently specific in their scope, a large number of functional labels have only a few annotated genes (or positive annotations); for instance, in the human GO annotation database (Ashburner et al., 2000), there are currently 8626 GO labels with at least 3 annotations, 4178 of which have <10 annotated genes and 7905 labels have <100 genes. The distributions of GO labels with different numbers of annotations in yeast and human are shown in Figure 1. Nearly half of the GO labels have <10 annotations in both species.Fig. 1.


Exploiting ontology graph for predicting sparsely annotated gene function.

Wang S, Cho H, Zhai C, Berger B, Peng J - Bioinformatics (2015)

A breakdown of GO labels by the number of annotated genes in (a) human and (b) yeast
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4542782&req=5

btv260-F1: A breakdown of GO labels by the number of annotated genes in (a) human and (b) yeast
Mentions: Despite the success of existing algorithms, a major difficulty that has not been sufficiently addressed is that of predicting rare labels. Because many molecular functions (MFs) are inherently specific in their scope, a large number of functional labels have only a few annotated genes (or positive annotations); for instance, in the human GO annotation database (Ashburner et al., 2000), there are currently 8626 GO labels with at least 3 annotations, 4178 of which have <10 annotated genes and 7905 labels have <100 genes. The distributions of GO labels with different numbers of annotations in yeast and human are shown in Figure 1. Nearly half of the GO labels have <10 annotations in both species.Fig. 1.

Bottom Line: There exist a variety of algorithms for function prediction, but none properly address this 'overfitting' issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog.Our method is scalable to datasets with a large number of annotations.In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and Department of Mathematics, MIT, Cambridge, MA, USA.

Show MeSH
Related in: MedlinePlus