Limits...
dcGOR: an R package for analysing ontologies and protein domain annotations.

Fang H - PLoS Comput. Biol. (2014)

Bottom Line: To reduce runtime, most analyses support high-performance parallel computing.Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding.More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well.

View Article: PubMed Central - PubMed

Affiliation: Computational Genomics Group, Department of Computer Science, University of Bristol, Bristol, United Kingdom.

ABSTRACT
I introduce an open-source R package 'dcGOR' to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i) domain-based enrichment analysis and visualisation; (ii) construction of a domain (semantic similarity) network according to ontology annotations; and (iii) significance analysis for estimating a contact (statistical significance) network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

Show MeSH
Enrichment analysis of promiscuous Pfam domains using GOBP terms (left) and GOMF terms (right).Only the most significant terms/nodes (adjusted p-values<0.05; outlined in black) are visualised along with their ancestral terms. Nodes are coloured according to adjusted p-values.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4214615&req=5

pcbi-1003929-g003: Enrichment analysis of promiscuous Pfam domains using GOBP terms (left) and GOMF terms (right).Only the most significant terms/nodes (adjusted p-values<0.05; outlined in black) are visualised along with their ancestral terms. Nodes are coloured according to adjusted p-values.

Mentions: Next, I extend the analysis to a list of Pfam domains that tend to occur in diverse domain architectures; this tendency is called ‘promiscuous’. In this study [23], a total of 215 domains were identified as strongly promiscuous, in which 76 domains were taken from Pfam. Enrichment analysis of these 76 Pfam domains using GOBP terms and GOMF terms identifies two most significant terms ‘mismatch repair’ and ‘ATPase activity’ (Figure 3). These two functional categories are consistent with previous report, however, there is a lack of the statistical support for the relevance to ‘signal transduction’ as claimed previously [23]. Unlike DO, GO contains three sub-ontologies GOBP, GOMF and GO Cellular Component (GOCC). Therefore, the semantic similarity between pairs of these 76 domains was first calculated separately for each GO sub-ontology and then additively summed up to obtain the GO overall semantic similarity (Figure 4).


dcGOR: an R package for analysing ontologies and protein domain annotations.

Fang H - PLoS Comput. Biol. (2014)

Enrichment analysis of promiscuous Pfam domains using GOBP terms (left) and GOMF terms (right).Only the most significant terms/nodes (adjusted p-values<0.05; outlined in black) are visualised along with their ancestral terms. Nodes are coloured according to adjusted p-values.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4214615&req=5

pcbi-1003929-g003: Enrichment analysis of promiscuous Pfam domains using GOBP terms (left) and GOMF terms (right).Only the most significant terms/nodes (adjusted p-values<0.05; outlined in black) are visualised along with their ancestral terms. Nodes are coloured according to adjusted p-values.
Mentions: Next, I extend the analysis to a list of Pfam domains that tend to occur in diverse domain architectures; this tendency is called ‘promiscuous’. In this study [23], a total of 215 domains were identified as strongly promiscuous, in which 76 domains were taken from Pfam. Enrichment analysis of these 76 Pfam domains using GOBP terms and GOMF terms identifies two most significant terms ‘mismatch repair’ and ‘ATPase activity’ (Figure 3). These two functional categories are consistent with previous report, however, there is a lack of the statistical support for the relevance to ‘signal transduction’ as claimed previously [23]. Unlike DO, GO contains three sub-ontologies GOBP, GOMF and GO Cellular Component (GOCC). Therefore, the semantic similarity between pairs of these 76 domains was first calculated separately for each GO sub-ontology and then additively summed up to obtain the GO overall semantic similarity (Figure 4).

Bottom Line: To reduce runtime, most analyses support high-performance parallel computing.Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding.More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well.

View Article: PubMed Central - PubMed

Affiliation: Computational Genomics Group, Department of Computer Science, University of Bristol, Bristol, United Kingdom.

ABSTRACT
I introduce an open-source R package 'dcGOR' to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i) domain-based enrichment analysis and visualisation; (ii) construction of a domain (semantic similarity) network according to ontology annotations; and (iii) significance analysis for estimating a contact (statistical significance) network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

Show MeSH