Limits...
dcGOR: an R package for analysing ontologies and protein domain annotations.

Fang H - PLoS Comput. Biol. (2014)

Bottom Line: To reduce runtime, most analyses support high-performance parallel computing.Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding.More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well.

View Article: PubMed Central - PubMed

Affiliation: Computational Genomics Group, Department of Computer Science, University of Bristol, Bristol, United Kingdom.

ABSTRACT
I introduce an open-source R package 'dcGOR' to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i) domain-based enrichment analysis and visualisation; (ii) construction of a domain (semantic similarity) network according to ontology annotations; and (iii) significance analysis for estimating a contact (statistical significance) network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

Show MeSH
In-depth analysis for network-level understanding.(A) Heatmap visualisation of the semantic similarity between pairs of domains according to their annotations by Disease Ontology (DO). (B) Network representation of the pairwise domain semantic similarity. It is a weighted and undirected network, with edge thickness indicating semantic similarity between a pair of domains/nodes. Nodes are labeled by both numeric id and textual description. (C) A table listing GOMF terms and their annotated domains (used as domain seeds for random walk with restart, RWR). Notably, terms used here are only those with at least 3 annotatable domains that are also in the domain network (see Figure 2B). (D) Contact (statistical significance) network between GOMF terms in Figure 2C, as estimated by RWR on the domain network in Figure 2B. Only those significant contacts/edges (adjusted p-values<0.1) are shown, with thickness indicating the contact strength (z-score).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4214615&req=5

pcbi-1003929-g002: In-depth analysis for network-level understanding.(A) Heatmap visualisation of the semantic similarity between pairs of domains according to their annotations by Disease Ontology (DO). (B) Network representation of the pairwise domain semantic similarity. It is a weighted and undirected network, with edge thickness indicating semantic similarity between a pair of domains/nodes. Nodes are labeled by both numeric id and textual description. (C) A table listing GOMF terms and their annotated domains (used as domain seeds for random walk with restart, RWR). Notably, terms used here are only those with at least 3 annotatable domains that are also in the domain network (see Figure 2B). (D) Contact (statistical significance) network between GOMF terms in Figure 2C, as estimated by RWR on the domain network in Figure 2B. Only those significant contacts/edges (adjusted p-values<0.1) are shown, with thickness indicating the contact strength (z-score).

Mentions: To further understand the relevance of these 58 domains to diseases, I use dcDAGdomainSim to construct a domain network according to domain-centric annotations by DO. This is done via calculating the semantic similarity between pairs of domains (Figure 2A). The resulting domain (semantic similarity) network contains 11 disease domains; they are similar to each other but to a varying degree (Figure 2B). Finally, based on the resultant domain network, I use dcRWRpipeline to estimate the contact strength and significance between sets of domains. The example domain set used here is a GO Molecular Function (GOMF) term and its annotated domains (see Figure 2C). The statistically significant contacts between terms are visualised in Figure 2D. These results suggest that (i) domains de novo gained during the evolution of the human lineage tend to form a disease similarity domain network, and that (ii) this network has a functional preference. Taken together, this example greatly encourages domain-centric approaches to genome evolution, function and phenotype/disease.


dcGOR: an R package for analysing ontologies and protein domain annotations.

Fang H - PLoS Comput. Biol. (2014)

In-depth analysis for network-level understanding.(A) Heatmap visualisation of the semantic similarity between pairs of domains according to their annotations by Disease Ontology (DO). (B) Network representation of the pairwise domain semantic similarity. It is a weighted and undirected network, with edge thickness indicating semantic similarity between a pair of domains/nodes. Nodes are labeled by both numeric id and textual description. (C) A table listing GOMF terms and their annotated domains (used as domain seeds for random walk with restart, RWR). Notably, terms used here are only those with at least 3 annotatable domains that are also in the domain network (see Figure 2B). (D) Contact (statistical significance) network between GOMF terms in Figure 2C, as estimated by RWR on the domain network in Figure 2B. Only those significant contacts/edges (adjusted p-values<0.1) are shown, with thickness indicating the contact strength (z-score).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4214615&req=5

pcbi-1003929-g002: In-depth analysis for network-level understanding.(A) Heatmap visualisation of the semantic similarity between pairs of domains according to their annotations by Disease Ontology (DO). (B) Network representation of the pairwise domain semantic similarity. It is a weighted and undirected network, with edge thickness indicating semantic similarity between a pair of domains/nodes. Nodes are labeled by both numeric id and textual description. (C) A table listing GOMF terms and their annotated domains (used as domain seeds for random walk with restart, RWR). Notably, terms used here are only those with at least 3 annotatable domains that are also in the domain network (see Figure 2B). (D) Contact (statistical significance) network between GOMF terms in Figure 2C, as estimated by RWR on the domain network in Figure 2B. Only those significant contacts/edges (adjusted p-values<0.1) are shown, with thickness indicating the contact strength (z-score).
Mentions: To further understand the relevance of these 58 domains to diseases, I use dcDAGdomainSim to construct a domain network according to domain-centric annotations by DO. This is done via calculating the semantic similarity between pairs of domains (Figure 2A). The resulting domain (semantic similarity) network contains 11 disease domains; they are similar to each other but to a varying degree (Figure 2B). Finally, based on the resultant domain network, I use dcRWRpipeline to estimate the contact strength and significance between sets of domains. The example domain set used here is a GO Molecular Function (GOMF) term and its annotated domains (see Figure 2C). The statistically significant contacts between terms are visualised in Figure 2D. These results suggest that (i) domains de novo gained during the evolution of the human lineage tend to form a disease similarity domain network, and that (ii) this network has a functional preference. Taken together, this example greatly encourages domain-centric approaches to genome evolution, function and phenotype/disease.

Bottom Line: To reduce runtime, most analyses support high-performance parallel computing.Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding.More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well.

View Article: PubMed Central - PubMed

Affiliation: Computational Genomics Group, Department of Computer Science, University of Bristol, Bristol, United Kingdom.

ABSTRACT
I introduce an open-source R package 'dcGOR' to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i) domain-based enrichment analysis and visualisation; (ii) construction of a domain (semantic similarity) network according to ontology annotations; and (iii) significance analysis for estimating a contact (statistical significance) network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

Show MeSH