Limits...
GOParGenPy: a high throughput method to generate gene ontology data matrices.

Kumar AA, Holm L, Toronen P - BMC Bioinformatics (2013)

Bottom Line: It can use any available version of the GO structure and allows the user to select the source of GO annotation.GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases.GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biotechnology, University of Helsinki, PO Box 56, (Viikinkaari 5), Helsinki 00014, Finland. ajay.kumar@helsinki.fi

ABSTRACT

Background: Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes.

Results: We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases.

Conclusions: GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.

Show MeSH
Venn diagram representation of total number of GO classes present in OBO file ‘gene_ontology_edit.obo’ (01.02.2012) and OBO file (01.04.2010) from agriGO. Figure 4A: Venn representation of number of non obsolete GO classes from OBO file and agriGO. Figure 4B: Venn representation of number of obsolete GO classes in OBO file and distinct non obsolete GO classes in agriGO.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750654&req=5

Figure 4: Venn diagram representation of total number of GO classes present in OBO file ‘gene_ontology_edit.obo’ (01.02.2012) and OBO file (01.04.2010) from agriGO. Figure 4A: Venn representation of number of non obsolete GO classes from OBO file and agriGO. Figure 4B: Venn representation of number of obsolete GO classes in OBO file and distinct non obsolete GO classes in agriGO.

Mentions: These files were parsed for GO classes using GOParGenPy. Next we calculated 1) the number of actual GO classes with unaltered definitions, 2) the number of GO classes which became obsolete and 3) the number of GO classes that have an altered definition with respect to the reference OBO file. Finally, we present a Venn diagram to show the percentage of missing GO classes and actual classes present (Figures 3, 4, 5 in Results).


GOParGenPy: a high throughput method to generate gene ontology data matrices.

Kumar AA, Holm L, Toronen P - BMC Bioinformatics (2013)

Venn diagram representation of total number of GO classes present in OBO file ‘gene_ontology_edit.obo’ (01.02.2012) and OBO file (01.04.2010) from agriGO. Figure 4A: Venn representation of number of non obsolete GO classes from OBO file and agriGO. Figure 4B: Venn representation of number of obsolete GO classes in OBO file and distinct non obsolete GO classes in agriGO.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750654&req=5

Figure 4: Venn diagram representation of total number of GO classes present in OBO file ‘gene_ontology_edit.obo’ (01.02.2012) and OBO file (01.04.2010) from agriGO. Figure 4A: Venn representation of number of non obsolete GO classes from OBO file and agriGO. Figure 4B: Venn representation of number of obsolete GO classes in OBO file and distinct non obsolete GO classes in agriGO.
Mentions: These files were parsed for GO classes using GOParGenPy. Next we calculated 1) the number of actual GO classes with unaltered definitions, 2) the number of GO classes which became obsolete and 3) the number of GO classes that have an altered definition with respect to the reference OBO file. Finally, we present a Venn diagram to show the percentage of missing GO classes and actual classes present (Figures 3, 4, 5 in Results).

Bottom Line: It can use any available version of the GO structure and allows the user to select the source of GO annotation.GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases.GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biotechnology, University of Helsinki, PO Box 56, (Viikinkaari 5), Helsinki 00014, Finland. ajay.kumar@helsinki.fi

ABSTRACT

Background: Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes.

Results: We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases.

Conclusions: GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.

Show MeSH