Limits...
Gene network landscape of the ciliate Tetrahymena thermophila.

Xiong J, Yuan D, Fillingham JS, Garg J, Lu X, Chang Y, Liu Y, Fu C, Pearlman RE, Miao W - PLoS ONE (2011)

Bottom Line: The TGN was partitioned, and 55 modules were found.We also investigated human disease orthologs in Tetrahymena that are missing in yeast and provide evidence indicating that some of these are involved in the same process in Tetrahymena as in human.This study constructed a Tetrahymena gene network, provided new insights to the properties of this biological network, and presents an important resource to study Tetrahymena genes at the pathway level.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.

ABSTRACT

Background: Genome-wide expression data of gene microarrays can be used to infer gene networks. At a cellular level, a gene network provides a picture of the modules in which genes are densely connected, and of the hub genes, which are highly connected with other genes. A gene network is useful to identify the genes involved in the same pathway, in a protein complex or that are co-regulated. In this study, we used different methods to find gene networks in the ciliate Tetrahymena thermophila, and describe some important properties of this network, such as modules and hubs.

Methodology/principal findings: Using 67 single channel microarrays, we constructed the Tetrahymena gene network (TGN) using three methods: the Pearson correlation coefficient (PCC), the Spearman correlation coefficient (SCC) and the context likelihood of relatedness (CLR) algorithm. The accuracy and coverage of the three networks were evaluated using four conserved protein complexes in yeast. The CLR network with a Z-score threshold 3.49 was determined to be the most robust. The TGN was partitioned, and 55 modules were found. In addition, analysis of the arbitrarily determined 1200 hubs showed that these hubs could be sorted into six groups according to their expression profiles. We also investigated human disease orthologs in Tetrahymena that are missing in yeast and provide evidence indicating that some of these are involved in the same process in Tetrahymena as in human.

Conclusions/significance: This study constructed a Tetrahymena gene network, provided new insights to the properties of this biological network, and presents an important resource to study Tetrahymena genes at the pathway level.

Show MeSH
Overall performance of three methods for four protein complexes.The F-score against the cutoff values (X-axis) of three methods for each protein complex is presented. Blue, CLR method; Pink, PCC method; Green, SCC method. For the CLR method, the cutoff value means the different confidence levels of the FDR test; for the PCC and SCC methods, the cutoff values represent the correlation coefficient.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3102692&req=5

pone-0020124-g002: Overall performance of three methods for four protein complexes.The F-score against the cutoff values (X-axis) of three methods for each protein complex is presented. Blue, CLR method; Pink, PCC method; Green, SCC method. For the CLR method, the cutoff value means the different confidence levels of the FDR test; for the PCC and SCC methods, the cutoff values represent the correlation coefficient.

Mentions: The correlation coefficient was used as the cutoff value for Pearson and Spearman correlation methods, and the Z-score was used for the CLR method. The number of nodes (genes) and edges (interactions of one gene to another determined by threshold) computed using different methods are shown in Figure 1. With increasing correlation coefficients or Z-score, both the node and edge number decreased. However, as the cutoff reached a relatively high value, the decrease in edge values became slower than that of nodes, leading to an increase in the network density. As shown in Figure 1, 0.6 was used as the minimal cutoff value for the two correlation methods and 3.34 (corresponding to 60% confidence level in the FDR test) was used as the minimal cutoff Z-score for the CLR method. Under these minimal values, the networks of the three methods contained about the same number of nodes (Figure 1), however, the edge numbers of these three methods were very different. For the two correlation methods, the edge number for the Pearson method was greater than the Spearman method with the same accuracy, suggesting a higher false positive rate for the PCC method. However, the PCC and the SCC methods were 2.4 times and 1.5 times respectively the edge number as those of the CLR method. This indicates that the CLR method may have higher prediction accuracy than the two correlation methods. To verify this and choose an appropriate cutoff, we selected four yeast protein complexes and identified the one to one orthologs between yeast and T. thermophila. The cytoplasmic ribosomal large subunit, cytoplasmic ribosomal small subunit, 20S proteasome core particle and the 19S proteasome regulatory particle, were used as benchmarks to determine the best of these three methods and the appropriate cutoff value. Using these four complexes, the accuracy (p value), the coverage (r value) and the overall performance (F-score) (see Methods) were calculated and are shown in Figure 2 and Figure S1. Comparing the three methods, the F-score, accuracy and coverage of CLR is consistently better than those of the other methods, especially for the 19S proteasome regulatory particle complex which contained 19 orthologous genes. Seventeen genes were shown to exist in a Tetrahymena proteasome complex by mass spectrometry (see below, Module-19). It is worth noting that the PCC and SCC networks would have to be two times larger than the CLR network (data not shown) for getting the same accuracy and coverage, so the specificity of the CLR method is also better than the correlation coefficient methodology. Based on the above results, CLR was used as the method of choice. For presentation of CLR gene network data, the X-axis represents the FDR test confidence level. It has been reported that the CLR algorithm performed best at 60% confidence level [26]. In our study, the four complexes analyzed showed that the appropriate threshold is 77% for the cytoplasmic ribosomal large subunit, 81% for the cytoplasmic ribosomal small subunit, 99% for the 20S proteasome core particle and 86% for the 19S proteasome regulatory particle. Taking into account the accuracy and coverage, 77%, corresponding to a Z-score of 3.49, was used as the cutoff confidence level. At this threshold, the CLR network possessed 15,049 nodes and 1,958,477 edges, and is considered the TGN.


Gene network landscape of the ciliate Tetrahymena thermophila.

Xiong J, Yuan D, Fillingham JS, Garg J, Lu X, Chang Y, Liu Y, Fu C, Pearlman RE, Miao W - PLoS ONE (2011)

Overall performance of three methods for four protein complexes.The F-score against the cutoff values (X-axis) of three methods for each protein complex is presented. Blue, CLR method; Pink, PCC method; Green, SCC method. For the CLR method, the cutoff value means the different confidence levels of the FDR test; for the PCC and SCC methods, the cutoff values represent the correlation coefficient.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3102692&req=5

pone-0020124-g002: Overall performance of three methods for four protein complexes.The F-score against the cutoff values (X-axis) of three methods for each protein complex is presented. Blue, CLR method; Pink, PCC method; Green, SCC method. For the CLR method, the cutoff value means the different confidence levels of the FDR test; for the PCC and SCC methods, the cutoff values represent the correlation coefficient.
Mentions: The correlation coefficient was used as the cutoff value for Pearson and Spearman correlation methods, and the Z-score was used for the CLR method. The number of nodes (genes) and edges (interactions of one gene to another determined by threshold) computed using different methods are shown in Figure 1. With increasing correlation coefficients or Z-score, both the node and edge number decreased. However, as the cutoff reached a relatively high value, the decrease in edge values became slower than that of nodes, leading to an increase in the network density. As shown in Figure 1, 0.6 was used as the minimal cutoff value for the two correlation methods and 3.34 (corresponding to 60% confidence level in the FDR test) was used as the minimal cutoff Z-score for the CLR method. Under these minimal values, the networks of the three methods contained about the same number of nodes (Figure 1), however, the edge numbers of these three methods were very different. For the two correlation methods, the edge number for the Pearson method was greater than the Spearman method with the same accuracy, suggesting a higher false positive rate for the PCC method. However, the PCC and the SCC methods were 2.4 times and 1.5 times respectively the edge number as those of the CLR method. This indicates that the CLR method may have higher prediction accuracy than the two correlation methods. To verify this and choose an appropriate cutoff, we selected four yeast protein complexes and identified the one to one orthologs between yeast and T. thermophila. The cytoplasmic ribosomal large subunit, cytoplasmic ribosomal small subunit, 20S proteasome core particle and the 19S proteasome regulatory particle, were used as benchmarks to determine the best of these three methods and the appropriate cutoff value. Using these four complexes, the accuracy (p value), the coverage (r value) and the overall performance (F-score) (see Methods) were calculated and are shown in Figure 2 and Figure S1. Comparing the three methods, the F-score, accuracy and coverage of CLR is consistently better than those of the other methods, especially for the 19S proteasome regulatory particle complex which contained 19 orthologous genes. Seventeen genes were shown to exist in a Tetrahymena proteasome complex by mass spectrometry (see below, Module-19). It is worth noting that the PCC and SCC networks would have to be two times larger than the CLR network (data not shown) for getting the same accuracy and coverage, so the specificity of the CLR method is also better than the correlation coefficient methodology. Based on the above results, CLR was used as the method of choice. For presentation of CLR gene network data, the X-axis represents the FDR test confidence level. It has been reported that the CLR algorithm performed best at 60% confidence level [26]. In our study, the four complexes analyzed showed that the appropriate threshold is 77% for the cytoplasmic ribosomal large subunit, 81% for the cytoplasmic ribosomal small subunit, 99% for the 20S proteasome core particle and 86% for the 19S proteasome regulatory particle. Taking into account the accuracy and coverage, 77%, corresponding to a Z-score of 3.49, was used as the cutoff confidence level. At this threshold, the CLR network possessed 15,049 nodes and 1,958,477 edges, and is considered the TGN.

Bottom Line: The TGN was partitioned, and 55 modules were found.We also investigated human disease orthologs in Tetrahymena that are missing in yeast and provide evidence indicating that some of these are involved in the same process in Tetrahymena as in human.This study constructed a Tetrahymena gene network, provided new insights to the properties of this biological network, and presents an important resource to study Tetrahymena genes at the pathway level.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.

ABSTRACT

Background: Genome-wide expression data of gene microarrays can be used to infer gene networks. At a cellular level, a gene network provides a picture of the modules in which genes are densely connected, and of the hub genes, which are highly connected with other genes. A gene network is useful to identify the genes involved in the same pathway, in a protein complex or that are co-regulated. In this study, we used different methods to find gene networks in the ciliate Tetrahymena thermophila, and describe some important properties of this network, such as modules and hubs.

Methodology/principal findings: Using 67 single channel microarrays, we constructed the Tetrahymena gene network (TGN) using three methods: the Pearson correlation coefficient (PCC), the Spearman correlation coefficient (SCC) and the context likelihood of relatedness (CLR) algorithm. The accuracy and coverage of the three networks were evaluated using four conserved protein complexes in yeast. The CLR network with a Z-score threshold 3.49 was determined to be the most robust. The TGN was partitioned, and 55 modules were found. In addition, analysis of the arbitrarily determined 1200 hubs showed that these hubs could be sorted into six groups according to their expression profiles. We also investigated human disease orthologs in Tetrahymena that are missing in yeast and provide evidence indicating that some of these are involved in the same process in Tetrahymena as in human.

Conclusions/significance: This study constructed a Tetrahymena gene network, provided new insights to the properties of this biological network, and presents an important resource to study Tetrahymena genes at the pathway level.

Show MeSH