Limits...
GOParGenPy: a high throughput method to generate gene ontology data matrices.

Kumar AA, Holm L, Toronen P - BMC Bioinformatics (2013)

Bottom Line: It can use any available version of the GO structure and allows the user to select the source of GO annotation.GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases.GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biotechnology, University of Helsinki, PO Box 56, (Viikinkaari 5), Helsinki 00014, Finland. ajay.kumar@helsinki.fi

ABSTRACT

Background: Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes.

Results: We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases.

Conclusions: GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.

Show MeSH
Work flow of GOParGenPy. Matrix parameters are row names (R, column number(s) for the input data column(s) where the gene names are reported), column names (C, column number(s) for the input data column(s) where the GO class associations are reported), and sparse or full matrix format (S/F).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750654&req=5

Figure 1: Work flow of GOParGenPy. Matrix parameters are row names (R, column number(s) for the input data column(s) where the gene names are reported), column names (C, column number(s) for the input data column(s) where the GO class associations are reported), and sparse or full matrix format (S/F).

Mentions: FigureĀ 1 shows the workflow of GOParGenPy. It takes in a tab separated input annotation file that contains a list of GO annotated genes, the selected OBO file and a set of parameters. These parameters denote the column number of gene name and the column number(s) of linked GO classes. Depending on the input annotation file type, an intermediate tab-delimited annotation file is then parsed from the annotation file where one row represents the gene name and all the collected GO annotations of this gene.


GOParGenPy: a high throughput method to generate gene ontology data matrices.

Kumar AA, Holm L, Toronen P - BMC Bioinformatics (2013)

Work flow of GOParGenPy. Matrix parameters are row names (R, column number(s) for the input data column(s) where the gene names are reported), column names (C, column number(s) for the input data column(s) where the GO class associations are reported), and sparse or full matrix format (S/F).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750654&req=5

Figure 1: Work flow of GOParGenPy. Matrix parameters are row names (R, column number(s) for the input data column(s) where the gene names are reported), column names (C, column number(s) for the input data column(s) where the GO class associations are reported), and sparse or full matrix format (S/F).
Mentions: FigureĀ 1 shows the workflow of GOParGenPy. It takes in a tab separated input annotation file that contains a list of GO annotated genes, the selected OBO file and a set of parameters. These parameters denote the column number of gene name and the column number(s) of linked GO classes. Depending on the input annotation file type, an intermediate tab-delimited annotation file is then parsed from the annotation file where one row represents the gene name and all the collected GO annotations of this gene.

Bottom Line: It can use any available version of the GO structure and allows the user to select the source of GO annotation.GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases.GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biotechnology, University of Helsinki, PO Box 56, (Viikinkaari 5), Helsinki 00014, Finland. ajay.kumar@helsinki.fi

ABSTRACT

Background: Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes.

Results: We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases.

Conclusions: GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.

Show MeSH