Limits...
Infection and genotype remodel the entire soybean transcriptome.

Zhou L, Mideros SX, Bao L, Hanlon R, Arredondo FD, Tripathy S, Krampis K, Jerauld A, Evans C, St Martin SK, Maroof MA, Hoeschele I, Dorrance AE, Tyler BM - BMC Genomics (2009)

Bottom Line: However understanding the results of these analyses and in particular understanding the very wide range of levels of transcriptional changes observed is still a significant challenge.We show that low amplitude modulation of gene expression (less than two-fold changes) is highly statistically significant and consistent across biological replicates, even for modulations of less than 20%.Our results are consistent through two different normalization methods and two different statistical analysis procedures.

View Article: PubMed Central - HTML - PubMed

Affiliation: Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA. lzhou@vbi.vt.edu

ABSTRACT

Background: High throughput methods, such as high density oligonucleotide microarray measurements of mRNA levels, are popular and critical to genome scale analysis and systems biology. However understanding the results of these analyses and in particular understanding the very wide range of levels of transcriptional changes observed is still a significant challenge. Many researchers still use an arbitrary cut off such as two-fold in order to identify changes that may be biologically significant. We have used a very large-scale microarray experiment involving 72 biological replicates to analyze the response of soybean plants to infection by the pathogen Phytophthora sojae and to analyze transcriptional modulation as a result of genotypic variation.

Results: With the unprecedented level of statistical sensitivity provided by the high degree of replication, we show unambiguously that almost the entire plant genome (97 to 99% of all detectable genes) undergoes transcriptional modulation in response to infection and genetic variation. The majority of the transcriptional differences are less than two-fold in magnitude. We show that low amplitude modulation of gene expression (less than two-fold changes) is highly statistically significant and consistent across biological replicates, even for modulations of less than 20%. Our results are consistent through two different normalization methods and two different statistical analysis procedures.

Conclusion: Our findings demonstrate that the entire plant genome undergoes transcriptional modulation in response to infection and genetic variation. The pervasive low-magnitude remodeling of the transcriptome may be an integral component of physiological adaptation in soybean, and in all eukaryotes.

Show MeSH

Related in: MedlinePlus

Venn diagram showing the intersection between the sub-groups A and F and the whole experiment (W). Sub-group A is the first four blocks and F is the last four blocks of the whole experiment, which includes a total of 24 blocks. GC-RMA preprocessed data of A, F, and W were analyzed separately using the same LMMA model in SAS Proc Mixed. Genes with significant genotype × treatment interaction were determined using a cutoff of a TST-FDR adjusted p ≤ 0.01, as described in the methods. AW indicates the intersection between the A and W, FW the intersection between F and W, and AF the intersection of the two sub-groups A and F. W/(A+F) refers to genes in W but not in A or F, A/W to genes in A but not W, and F/W to genes in F but not W. AF/W refers to genes in A and F but not W; the three genes in this set were not found significant in the other four sub-groups (B, C, D, or E).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2662884&req=5

Figure 1: Venn diagram showing the intersection between the sub-groups A and F and the whole experiment (W). Sub-group A is the first four blocks and F is the last four blocks of the whole experiment, which includes a total of 24 blocks. GC-RMA preprocessed data of A, F, and W were analyzed separately using the same LMMA model in SAS Proc Mixed. Genes with significant genotype × treatment interaction were determined using a cutoff of a TST-FDR adjusted p ≤ 0.01, as described in the methods. AW indicates the intersection between the A and W, FW the intersection between F and W, and AF the intersection of the two sub-groups A and F. W/(A+F) refers to genes in W but not in A or F, A/W to genes in A but not W, and F/W to genes in F but not W. AF/W refers to genes in A and F but not W; the three genes in this set were not found significant in the other four sub-groups (B, C, D, or E).

Mentions: To compare the power of our experiment with experiments of a size comparable to those more commonly found in the literature, we again divided the 24 blocks into 6 sub-groups, each consisting of 4 consecutive blocks, and then reanalyzed the data by LMMA for each sub-group separately. As expected, each individual data set had less power than the overall data set. An average of 40.4%, 65.7% and 33.8% genes were detected as significant for genotype, treatment and genotype × treatment interaction, respectively, with TST-FDR control at a level of 0.01 [see Additional file 2]. However, different sub-experiments detected different subsets of the genes detected in the overall data set (Figure 1). Thus genes significant in all the 6 sub-groups comprised only 48.9% of those detected in the overall experiment [see Additional file 3]. This lack of agreement among the gene lists became worse at more stringent (lower) FDR levels. As is well-known, increased stringency produces more precise results (fewer false positives) but at the expense of a sometimes major decrease in power, i.e. an increase in the number of false negatives. However, the union of all genes that were significant in any of the six sub-groups comprised 81.0% of the genes with significant changes detected in the overall experiment [see Additional file 3]. These results demonstrate that the most common error made with small experiments is false negatives, rather than false positives, and the common practice of combining results of multiple experiments by only considering the intersection of the gene lists is unnecessarily extremely conservative. A simple union, the techniques of meta-analysis, or where possible a joint analysis, may be most appropriate for combining data from multiple independent experiments.


Infection and genotype remodel the entire soybean transcriptome.

Zhou L, Mideros SX, Bao L, Hanlon R, Arredondo FD, Tripathy S, Krampis K, Jerauld A, Evans C, St Martin SK, Maroof MA, Hoeschele I, Dorrance AE, Tyler BM - BMC Genomics (2009)

Venn diagram showing the intersection between the sub-groups A and F and the whole experiment (W). Sub-group A is the first four blocks and F is the last four blocks of the whole experiment, which includes a total of 24 blocks. GC-RMA preprocessed data of A, F, and W were analyzed separately using the same LMMA model in SAS Proc Mixed. Genes with significant genotype × treatment interaction were determined using a cutoff of a TST-FDR adjusted p ≤ 0.01, as described in the methods. AW indicates the intersection between the A and W, FW the intersection between F and W, and AF the intersection of the two sub-groups A and F. W/(A+F) refers to genes in W but not in A or F, A/W to genes in A but not W, and F/W to genes in F but not W. AF/W refers to genes in A and F but not W; the three genes in this set were not found significant in the other four sub-groups (B, C, D, or E).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2662884&req=5

Figure 1: Venn diagram showing the intersection between the sub-groups A and F and the whole experiment (W). Sub-group A is the first four blocks and F is the last four blocks of the whole experiment, which includes a total of 24 blocks. GC-RMA preprocessed data of A, F, and W were analyzed separately using the same LMMA model in SAS Proc Mixed. Genes with significant genotype × treatment interaction were determined using a cutoff of a TST-FDR adjusted p ≤ 0.01, as described in the methods. AW indicates the intersection between the A and W, FW the intersection between F and W, and AF the intersection of the two sub-groups A and F. W/(A+F) refers to genes in W but not in A or F, A/W to genes in A but not W, and F/W to genes in F but not W. AF/W refers to genes in A and F but not W; the three genes in this set were not found significant in the other four sub-groups (B, C, D, or E).
Mentions: To compare the power of our experiment with experiments of a size comparable to those more commonly found in the literature, we again divided the 24 blocks into 6 sub-groups, each consisting of 4 consecutive blocks, and then reanalyzed the data by LMMA for each sub-group separately. As expected, each individual data set had less power than the overall data set. An average of 40.4%, 65.7% and 33.8% genes were detected as significant for genotype, treatment and genotype × treatment interaction, respectively, with TST-FDR control at a level of 0.01 [see Additional file 2]. However, different sub-experiments detected different subsets of the genes detected in the overall data set (Figure 1). Thus genes significant in all the 6 sub-groups comprised only 48.9% of those detected in the overall experiment [see Additional file 3]. This lack of agreement among the gene lists became worse at more stringent (lower) FDR levels. As is well-known, increased stringency produces more precise results (fewer false positives) but at the expense of a sometimes major decrease in power, i.e. an increase in the number of false negatives. However, the union of all genes that were significant in any of the six sub-groups comprised 81.0% of the genes with significant changes detected in the overall experiment [see Additional file 3]. These results demonstrate that the most common error made with small experiments is false negatives, rather than false positives, and the common practice of combining results of multiple experiments by only considering the intersection of the gene lists is unnecessarily extremely conservative. A simple union, the techniques of meta-analysis, or where possible a joint analysis, may be most appropriate for combining data from multiple independent experiments.

Bottom Line: However understanding the results of these analyses and in particular understanding the very wide range of levels of transcriptional changes observed is still a significant challenge.We show that low amplitude modulation of gene expression (less than two-fold changes) is highly statistically significant and consistent across biological replicates, even for modulations of less than 20%.Our results are consistent through two different normalization methods and two different statistical analysis procedures.

View Article: PubMed Central - HTML - PubMed

Affiliation: Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA. lzhou@vbi.vt.edu

ABSTRACT

Background: High throughput methods, such as high density oligonucleotide microarray measurements of mRNA levels, are popular and critical to genome scale analysis and systems biology. However understanding the results of these analyses and in particular understanding the very wide range of levels of transcriptional changes observed is still a significant challenge. Many researchers still use an arbitrary cut off such as two-fold in order to identify changes that may be biologically significant. We have used a very large-scale microarray experiment involving 72 biological replicates to analyze the response of soybean plants to infection by the pathogen Phytophthora sojae and to analyze transcriptional modulation as a result of genotypic variation.

Results: With the unprecedented level of statistical sensitivity provided by the high degree of replication, we show unambiguously that almost the entire plant genome (97 to 99% of all detectable genes) undergoes transcriptional modulation in response to infection and genetic variation. The majority of the transcriptional differences are less than two-fold in magnitude. We show that low amplitude modulation of gene expression (less than two-fold changes) is highly statistically significant and consistent across biological replicates, even for modulations of less than 20%. Our results are consistent through two different normalization methods and two different statistical analysis procedures.

Conclusion: Our findings demonstrate that the entire plant genome undergoes transcriptional modulation in response to infection and genetic variation. The pervasive low-magnitude remodeling of the transcriptome may be an integral component of physiological adaptation in soybean, and in all eukaryotes.

Show MeSH
Related in: MedlinePlus