Limits...
Identification of differentially expressed subnetworks based on multivariate ANOVA.

Hwang T, Park T - BMC Bioinformatics (2009)

Bottom Line: Our approach was successfully applied to human microarray datasets.Each identified subnetwork was annotated with the Gene Ontology (GO) term, resulting in the phenotype-related functional pathway or complex.We also compared these results with those of other scoring methods such as t statistic- and mutual information-based scoring methods.

View Article: PubMed Central - HTML - PubMed

Affiliation: Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, Republic of Korea. hwangty@snu.ac.kr

ABSTRACT

Background: Since high-throughput protein-protein interaction (PPI) data has recently become available for humans, there has been a growing interest in combining PPI data with other genome-wide data. In particular, the identification of phenotype-related PPI subnetworks using gene expression data has been of great concern. Successful integration for the identification of significant subnetworks requires the use of a search algorithm with a proper scoring method. Here we propose a multivariate analysis of variance (MANOVA)-based scoring method with a greedy search for identifying differentially expressed PPI subnetworks.

Results: Given the MANOVA-based scoring method, we performed a greedy search to identify the subnetworks with the maximum scores in the PPI network. Our approach was successfully applied to human microarray datasets. Each identified subnetwork was annotated with the Gene Ontology (GO) term, resulting in the phenotype-related functional pathway or complex. We also compared these results with those of other scoring methods such as t statistic- and mutual information-based scoring methods. The MANOVA-based method produced subnetworks with a larger number of proteins than the other methods. Furthermore, the subnetworks identified by the MANOVA-based method tended to consist of highly correlated proteins.

Conclusion: This article proposes a MANOVA-based scoring method to combine PPI data with expression data using a greedy search. This method is recommended for the highly sensitive detection of large subnetworks.

Show MeSH

Related in: MedlinePlus

The box plots of correlation coefficients between seeds and proteins in identified subnetworks. (a) Serum response data. (b) Prostate cancer metastasis data. The correlation coefficients are absolute values.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2696448&req=5

Figure 2: The box plots of correlation coefficients between seeds and proteins in identified subnetworks. (a) Serum response data. (b) Prostate cancer metastasis data. The correlation coefficients are absolute values.

Mentions: Finally, we investigated the distribution of the correlation coefficients within the subnetworks identified as being significant by all three scoring methods to determine the strength of correlation among the proteins in the subnetworks. The correlation coefficient for every protein in the sub-networks with the seed protein was calculated. Figure 2 shows the distribution of the absolute values of the correlation coefficients for the subnetworks identified by each scoring method. The MI-based scoring method and the MANOVA-based scoring method had higher correlations than the TΣ – scoring method. We then calculated the percentages of the correlation coefficients higher than various thresholds. As shown by Tables 4 and Table 5, the MANOVA-based scoring method tended to have higher percentages than the other methods. This suggests that the MANOVA-based scoring method tends to construct subnetworks containing relatively larger numbers of highly co-regulated genes.


Identification of differentially expressed subnetworks based on multivariate ANOVA.

Hwang T, Park T - BMC Bioinformatics (2009)

The box plots of correlation coefficients between seeds and proteins in identified subnetworks. (a) Serum response data. (b) Prostate cancer metastasis data. The correlation coefficients are absolute values.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2696448&req=5

Figure 2: The box plots of correlation coefficients between seeds and proteins in identified subnetworks. (a) Serum response data. (b) Prostate cancer metastasis data. The correlation coefficients are absolute values.
Mentions: Finally, we investigated the distribution of the correlation coefficients within the subnetworks identified as being significant by all three scoring methods to determine the strength of correlation among the proteins in the subnetworks. The correlation coefficient for every protein in the sub-networks with the seed protein was calculated. Figure 2 shows the distribution of the absolute values of the correlation coefficients for the subnetworks identified by each scoring method. The MI-based scoring method and the MANOVA-based scoring method had higher correlations than the TΣ – scoring method. We then calculated the percentages of the correlation coefficients higher than various thresholds. As shown by Tables 4 and Table 5, the MANOVA-based scoring method tended to have higher percentages than the other methods. This suggests that the MANOVA-based scoring method tends to construct subnetworks containing relatively larger numbers of highly co-regulated genes.

Bottom Line: Our approach was successfully applied to human microarray datasets.Each identified subnetwork was annotated with the Gene Ontology (GO) term, resulting in the phenotype-related functional pathway or complex.We also compared these results with those of other scoring methods such as t statistic- and mutual information-based scoring methods.

View Article: PubMed Central - HTML - PubMed

Affiliation: Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, Republic of Korea. hwangty@snu.ac.kr

ABSTRACT

Background: Since high-throughput protein-protein interaction (PPI) data has recently become available for humans, there has been a growing interest in combining PPI data with other genome-wide data. In particular, the identification of phenotype-related PPI subnetworks using gene expression data has been of great concern. Successful integration for the identification of significant subnetworks requires the use of a search algorithm with a proper scoring method. Here we propose a multivariate analysis of variance (MANOVA)-based scoring method with a greedy search for identifying differentially expressed PPI subnetworks.

Results: Given the MANOVA-based scoring method, we performed a greedy search to identify the subnetworks with the maximum scores in the PPI network. Our approach was successfully applied to human microarray datasets. Each identified subnetwork was annotated with the Gene Ontology (GO) term, resulting in the phenotype-related functional pathway or complex. We also compared these results with those of other scoring methods such as t statistic- and mutual information-based scoring methods. The MANOVA-based method produced subnetworks with a larger number of proteins than the other methods. Furthermore, the subnetworks identified by the MANOVA-based method tended to consist of highly correlated proteins.

Conclusion: This article proposes a MANOVA-based scoring method to combine PPI data with expression data using a greedy search. This method is recommended for the highly sensitive detection of large subnetworks.

Show MeSH
Related in: MedlinePlus