Limits...
AraNet v2: an improved database of co-functional gene networks for the study of Arabidopsis thaliana and 27 other nonmodel plant species.

Lee T, Yang S, Kim E, Ko Y, Hwang S, Shin J, Shim JE, Shim H, Kim H, Kim C, Lee I - Nucleic Acids Res. (2014)

Bottom Line: Recent advances in high-throughput experimental technology have enabled the generation of an unprecedented amount of data from A. thaliana, which has facilitated data-driven approaches to unravel the genetic organization of plant phenotypes.We previously published a description of a genome-scale functional gene network for A. thaliana, AraNet, which was constructed by integrating multiple co-functional gene networks inferred from diverse data types, and we demonstrated the predictive power of this network for complex phenotypes.To enhance the usability of the network, we implemented an AraNet v2 web server, which generates functional predictions for A. thaliana and 27 nonmodel plant species using an orthology-based projection of nonmodel plant genes on the A. thaliana gene network.

View Article: PubMed Central - PubMed

Affiliation: Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea.

Show MeSH
Network assessment using a set of validation gene pairs based on SUBA3 (a) and GO-CC (b). The accuracy of the co-functional links of each network was calculated as the percentage of true positives for each bin of 1000 gene pairs. The resultant plot shows that AraNet v2 outperforms AraNet for the entire genome coverage range. (c) A box-and-whisker plot of network prediction power for 212 GO-CC terms with more than four annotated genes, measured by area under the curve from ROC analysis. AraNet v2 is also superior to the previous network in prediction for GO-CC annotations. (d) x-axis and y-axis represent the size of each GO-CC term and measured prediction power for the terms by AUC, respectively. These two variables have no significant correlation (r2 = 0.012), indicating no impact of gene set size on network prediction power.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4383895&req=5

Figure 1: Network assessment using a set of validation gene pairs based on SUBA3 (a) and GO-CC (b). The accuracy of the co-functional links of each network was calculated as the percentage of true positives for each bin of 1000 gene pairs. The resultant plot shows that AraNet v2 outperforms AraNet for the entire genome coverage range. (c) A box-and-whisker plot of network prediction power for 212 GO-CC terms with more than four annotated genes, measured by area under the curve from ROC analysis. AraNet v2 is also superior to the previous network in prediction for GO-CC annotations. (d) x-axis and y-axis represent the size of each GO-CC term and measured prediction power for the terms by AUC, respectively. These two variables have no significant correlation (r2 = 0.012), indicating no impact of gene set size on network prediction power.

Mentions: If the accuracy of the co-functional links in AraNet v2 is equal or higher than AraNet, then we expect that the expansion of the genome coverage in AraNet v2 may lead to enhanced predictions for functions and phenotypes in A. thaliana. We used a validation set, which was composed of gene pairs independent from the gold standard gene pairs used for network training, to assess the predictive power of each network. Assuming two genes participating in the same pathway are likely to belong to the same protein complex or to be localized within the same subcellular compartment, we generated two distinct validation sets: (i) gene pairs that share subcellular localization annotations by SUBcellular localization database for Arabidopsis proteins (SUBA3) (12), (ii) gene pairs that share GO cellular component (GO-CC) annotations (11). To avoid misleading co-functional relationships due to ambiguous subcellular compartments, we ignored the SUBA3 annotations for ‘cytosol’ and ‘plasma membrane’. For the same reason, we excluded 16 GO-CC terms with more than 350 annotated genes from a total of 420 terms. These procedures resulted in 335 641 gene pairs by SUBA3, of which only 418 pairs overlapped with the gold standard gene pairs that we used for network training (∼0.3%), and 394 076 gene pairs by GO-CC, of which less than 9% overlapped with the gold standard gene pairs. The accuracy of the network links for the given genome coverage was compared between AraNet v2 and AraNet based on the validation set. A substantially higher accuracy over the entire genome coverage range was observed using AraNet v2 for both SUBA3 and GO-CC annotations (Figure 1a and b). From this result, we conclude that AraNet v2 significantly improves both genome coverage and linkage accuracy.


AraNet v2: an improved database of co-functional gene networks for the study of Arabidopsis thaliana and 27 other nonmodel plant species.

Lee T, Yang S, Kim E, Ko Y, Hwang S, Shin J, Shim JE, Shim H, Kim H, Kim C, Lee I - Nucleic Acids Res. (2014)

Network assessment using a set of validation gene pairs based on SUBA3 (a) and GO-CC (b). The accuracy of the co-functional links of each network was calculated as the percentage of true positives for each bin of 1000 gene pairs. The resultant plot shows that AraNet v2 outperforms AraNet for the entire genome coverage range. (c) A box-and-whisker plot of network prediction power for 212 GO-CC terms with more than four annotated genes, measured by area under the curve from ROC analysis. AraNet v2 is also superior to the previous network in prediction for GO-CC annotations. (d) x-axis and y-axis represent the size of each GO-CC term and measured prediction power for the terms by AUC, respectively. These two variables have no significant correlation (r2 = 0.012), indicating no impact of gene set size on network prediction power.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4383895&req=5

Figure 1: Network assessment using a set of validation gene pairs based on SUBA3 (a) and GO-CC (b). The accuracy of the co-functional links of each network was calculated as the percentage of true positives for each bin of 1000 gene pairs. The resultant plot shows that AraNet v2 outperforms AraNet for the entire genome coverage range. (c) A box-and-whisker plot of network prediction power for 212 GO-CC terms with more than four annotated genes, measured by area under the curve from ROC analysis. AraNet v2 is also superior to the previous network in prediction for GO-CC annotations. (d) x-axis and y-axis represent the size of each GO-CC term and measured prediction power for the terms by AUC, respectively. These two variables have no significant correlation (r2 = 0.012), indicating no impact of gene set size on network prediction power.
Mentions: If the accuracy of the co-functional links in AraNet v2 is equal or higher than AraNet, then we expect that the expansion of the genome coverage in AraNet v2 may lead to enhanced predictions for functions and phenotypes in A. thaliana. We used a validation set, which was composed of gene pairs independent from the gold standard gene pairs used for network training, to assess the predictive power of each network. Assuming two genes participating in the same pathway are likely to belong to the same protein complex or to be localized within the same subcellular compartment, we generated two distinct validation sets: (i) gene pairs that share subcellular localization annotations by SUBcellular localization database for Arabidopsis proteins (SUBA3) (12), (ii) gene pairs that share GO cellular component (GO-CC) annotations (11). To avoid misleading co-functional relationships due to ambiguous subcellular compartments, we ignored the SUBA3 annotations for ‘cytosol’ and ‘plasma membrane’. For the same reason, we excluded 16 GO-CC terms with more than 350 annotated genes from a total of 420 terms. These procedures resulted in 335 641 gene pairs by SUBA3, of which only 418 pairs overlapped with the gold standard gene pairs that we used for network training (∼0.3%), and 394 076 gene pairs by GO-CC, of which less than 9% overlapped with the gold standard gene pairs. The accuracy of the network links for the given genome coverage was compared between AraNet v2 and AraNet based on the validation set. A substantially higher accuracy over the entire genome coverage range was observed using AraNet v2 for both SUBA3 and GO-CC annotations (Figure 1a and b). From this result, we conclude that AraNet v2 significantly improves both genome coverage and linkage accuracy.

Bottom Line: Recent advances in high-throughput experimental technology have enabled the generation of an unprecedented amount of data from A. thaliana, which has facilitated data-driven approaches to unravel the genetic organization of plant phenotypes.We previously published a description of a genome-scale functional gene network for A. thaliana, AraNet, which was constructed by integrating multiple co-functional gene networks inferred from diverse data types, and we demonstrated the predictive power of this network for complex phenotypes.To enhance the usability of the network, we implemented an AraNet v2 web server, which generates functional predictions for A. thaliana and 27 nonmodel plant species using an orthology-based projection of nonmodel plant genes on the A. thaliana gene network.

View Article: PubMed Central - PubMed

Affiliation: Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea.

Show MeSH