Limits...
Statistical analysis of differential gene expression relative to a fold change threshold on NanoString data of mouse odorant receptor genes.

Vaes E, Khan M, Mombaerts P - BMC Bioinformatics (2014)

Bottom Line: Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.We show the benefits on simulated and real data.Gene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Research Unit for Neurogenetics, Max-von-Laue-Strasse 3, 60438 Frankfurt, Germany. peter.mombaerts@biophys.mpg.de.

ABSTRACT

Background: A challenge in gene expression studies is the reliable identification of differentially expressed genes. In many high-throughput studies, genes are accepted as differentially expressed only if they satisfy simultaneously a p value criterion and a fold change criterion. A statistical method, TREAT, has been developed for microarray data to assess formally if fold changes are significantly higher than a predefined threshold. We have recently applied the NanoString digital platform to study expression of mouse odorant receptor genes, which form with 1,200 members the largest gene family in the mouse genome. Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.

Results: Statistical tests have been developed for microarray data to identify genes that are differentially expressed relative to a fold change threshold. Here we report that another approach, which we refer to as tTREAT, is more appropriate for our NanoString data, where false discoveries lead to costly and time-consuming follow-up experiments. Methods that we refer to as tTREAT2 and the running fold change model improve the performance of the statistical tests by protecting or selecting the fold change threshold more objectively. We show the benefits on simulated and real data.

Conclusions: Gene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes. Because it can be difficult to set in advance a fold change threshold that is meaningful for the available data, we developed methods that enable a better choice (thus reducing false discoveries and/or missed genes) or avoid this choice altogether. This set of tools may be useful for the analysis of other types of gene expression data.

Show MeSH
Running fold change model on biological data: ΔHxΔP mice. (A) MA plot of 558 OR genes. Filled green squares represent genes that were identified as DE by tTREAT with a FC threshold of 1.5, comparing ΔHxΔP double mutant mice to control (wild-type) mice. Empty circles represent non-DE genes. The empty pink circles indicate genes that satisfy the combined criteria of a significant p value for a regular t-test (p < 0.01) and a FC value >1.5 or <1/1.5. The black vertical stippled line indicates the maximum of the negative controls. The red horizontal stippled line is the mean of the M values. The yellow stippled lines are equal to M values that represent 1.3 fold up or down: ± log2(1.3). (B) MA plot of 558 OR genes. Filled green squares represent genes that were identified as DE by a running FC model based on tTREAT. Empty circles represent non-DE genes. The grey vertical stippled lines delineate ranges of gene expression, for which various FC thresholds have been used. Orange numbers indicate the FC thresholds that are applied in the subsequent tTREAT analysis. (C) MC plot corresponding to the results shown in panel A. M values are arranged according to the relative gene order along the chromosomes, which are indicated in various colors. The black arrows indicate two OR genes that have been identified as DE, but reside outside of the H and P clusters; these are most likely false discoveries. (D) MC plot corresponding to the results shown in panel C. The solid black arrow indicates an OR gene that has been identified as DE, but resides outside of the H and P clusters, and is most likely a false discovery. The dashed black arrow indicates an OR gene that is identified as DE and resides within the P cluster; this gene is missed in tTREAT.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4016238&req=5

Figure 5: Running fold change model on biological data: ΔHxΔP mice. (A) MA plot of 558 OR genes. Filled green squares represent genes that were identified as DE by tTREAT with a FC threshold of 1.5, comparing ΔHxΔP double mutant mice to control (wild-type) mice. Empty circles represent non-DE genes. The empty pink circles indicate genes that satisfy the combined criteria of a significant p value for a regular t-test (p < 0.01) and a FC value >1.5 or <1/1.5. The black vertical stippled line indicates the maximum of the negative controls. The red horizontal stippled line is the mean of the M values. The yellow stippled lines are equal to M values that represent 1.3 fold up or down: ± log2(1.3). (B) MA plot of 558 OR genes. Filled green squares represent genes that were identified as DE by a running FC model based on tTREAT. Empty circles represent non-DE genes. The grey vertical stippled lines delineate ranges of gene expression, for which various FC thresholds have been used. Orange numbers indicate the FC thresholds that are applied in the subsequent tTREAT analysis. (C) MC plot corresponding to the results shown in panel A. M values are arranged according to the relative gene order along the chromosomes, which are indicated in various colors. The black arrows indicate two OR genes that have been identified as DE, but reside outside of the H and P clusters; these are most likely false discoveries. (D) MC plot corresponding to the results shown in panel C. The solid black arrow indicates an OR gene that has been identified as DE, but resides outside of the H and P clusters, and is most likely a false discovery. The dashed black arrow indicates an OR gene that is identified as DE and resides within the P cluster; this gene is missed in tTREAT.

Mentions: We applied the running FC model on our NanoString data obtained from six ΔHxΔP mutant mice and 12 control mice. As these mice are from a mixed genetic background, we used a less-stringent tTREAT with τ = 1.5, and p = 0.01. The results are shown as an MA plot in Figure 5A and as an MC plot in Figure 5C. We find that nine genes from the P cluster and 12 genes from the H cluster have a FC significantly lower than 1/1.5. But two additional genes, Olfr362 on Chromosome 2 and Olfr107 on Chromosome 17, are identified as DE by tTREAT with τ = 1.5. When we apply a running FC model, a FC threshold value τ is not chosen. Instead the model calculates a different τ for ten gene expression levels. These values are ≥1.5 for the lower expression levels and drop to 1.36 for the very high expression levels (Figure 5B). After calculating the τ values, we applied them together with a tTREAT test in our running FC model. This model identified 23 DE genes, 22 of which were also identified by tTREAT with τ = 1.5 (Figure 5D). The lowly expressed gene Olfr107 (on Chromosome 17) is no longer identified as DE by the running FC model, but the model identified an additional OR gene, Olfr695, which is plausible as it resides within the P cluster.The advantage of tests relative to a FC threshold over combined approaches (statistical test outcome + FC criterion) is illustrated in Figure 5A. By combining the regular t-test and a FC criterion of 1.5, a total of 49 genes are identified as DE. Of these, 20 reside outside the H and P clusters (pink circles in Figure 5A), and are thus in reality most likely not DE.


Statistical analysis of differential gene expression relative to a fold change threshold on NanoString data of mouse odorant receptor genes.

Vaes E, Khan M, Mombaerts P - BMC Bioinformatics (2014)

Running fold change model on biological data: ΔHxΔP mice. (A) MA plot of 558 OR genes. Filled green squares represent genes that were identified as DE by tTREAT with a FC threshold of 1.5, comparing ΔHxΔP double mutant mice to control (wild-type) mice. Empty circles represent non-DE genes. The empty pink circles indicate genes that satisfy the combined criteria of a significant p value for a regular t-test (p < 0.01) and a FC value >1.5 or <1/1.5. The black vertical stippled line indicates the maximum of the negative controls. The red horizontal stippled line is the mean of the M values. The yellow stippled lines are equal to M values that represent 1.3 fold up or down: ± log2(1.3). (B) MA plot of 558 OR genes. Filled green squares represent genes that were identified as DE by a running FC model based on tTREAT. Empty circles represent non-DE genes. The grey vertical stippled lines delineate ranges of gene expression, for which various FC thresholds have been used. Orange numbers indicate the FC thresholds that are applied in the subsequent tTREAT analysis. (C) MC plot corresponding to the results shown in panel A. M values are arranged according to the relative gene order along the chromosomes, which are indicated in various colors. The black arrows indicate two OR genes that have been identified as DE, but reside outside of the H and P clusters; these are most likely false discoveries. (D) MC plot corresponding to the results shown in panel C. The solid black arrow indicates an OR gene that has been identified as DE, but resides outside of the H and P clusters, and is most likely a false discovery. The dashed black arrow indicates an OR gene that is identified as DE and resides within the P cluster; this gene is missed in tTREAT.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4016238&req=5

Figure 5: Running fold change model on biological data: ΔHxΔP mice. (A) MA plot of 558 OR genes. Filled green squares represent genes that were identified as DE by tTREAT with a FC threshold of 1.5, comparing ΔHxΔP double mutant mice to control (wild-type) mice. Empty circles represent non-DE genes. The empty pink circles indicate genes that satisfy the combined criteria of a significant p value for a regular t-test (p < 0.01) and a FC value >1.5 or <1/1.5. The black vertical stippled line indicates the maximum of the negative controls. The red horizontal stippled line is the mean of the M values. The yellow stippled lines are equal to M values that represent 1.3 fold up or down: ± log2(1.3). (B) MA plot of 558 OR genes. Filled green squares represent genes that were identified as DE by a running FC model based on tTREAT. Empty circles represent non-DE genes. The grey vertical stippled lines delineate ranges of gene expression, for which various FC thresholds have been used. Orange numbers indicate the FC thresholds that are applied in the subsequent tTREAT analysis. (C) MC plot corresponding to the results shown in panel A. M values are arranged according to the relative gene order along the chromosomes, which are indicated in various colors. The black arrows indicate two OR genes that have been identified as DE, but reside outside of the H and P clusters; these are most likely false discoveries. (D) MC plot corresponding to the results shown in panel C. The solid black arrow indicates an OR gene that has been identified as DE, but resides outside of the H and P clusters, and is most likely a false discovery. The dashed black arrow indicates an OR gene that is identified as DE and resides within the P cluster; this gene is missed in tTREAT.
Mentions: We applied the running FC model on our NanoString data obtained from six ΔHxΔP mutant mice and 12 control mice. As these mice are from a mixed genetic background, we used a less-stringent tTREAT with τ = 1.5, and p = 0.01. The results are shown as an MA plot in Figure 5A and as an MC plot in Figure 5C. We find that nine genes from the P cluster and 12 genes from the H cluster have a FC significantly lower than 1/1.5. But two additional genes, Olfr362 on Chromosome 2 and Olfr107 on Chromosome 17, are identified as DE by tTREAT with τ = 1.5. When we apply a running FC model, a FC threshold value τ is not chosen. Instead the model calculates a different τ for ten gene expression levels. These values are ≥1.5 for the lower expression levels and drop to 1.36 for the very high expression levels (Figure 5B). After calculating the τ values, we applied them together with a tTREAT test in our running FC model. This model identified 23 DE genes, 22 of which were also identified by tTREAT with τ = 1.5 (Figure 5D). The lowly expressed gene Olfr107 (on Chromosome 17) is no longer identified as DE by the running FC model, but the model identified an additional OR gene, Olfr695, which is plausible as it resides within the P cluster.The advantage of tests relative to a FC threshold over combined approaches (statistical test outcome + FC criterion) is illustrated in Figure 5A. By combining the regular t-test and a FC criterion of 1.5, a total of 49 genes are identified as DE. Of these, 20 reside outside the H and P clusters (pink circles in Figure 5A), and are thus in reality most likely not DE.

Bottom Line: Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.We show the benefits on simulated and real data.Gene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Research Unit for Neurogenetics, Max-von-Laue-Strasse 3, 60438 Frankfurt, Germany. peter.mombaerts@biophys.mpg.de.

ABSTRACT

Background: A challenge in gene expression studies is the reliable identification of differentially expressed genes. In many high-throughput studies, genes are accepted as differentially expressed only if they satisfy simultaneously a p value criterion and a fold change criterion. A statistical method, TREAT, has been developed for microarray data to assess formally if fold changes are significantly higher than a predefined threshold. We have recently applied the NanoString digital platform to study expression of mouse odorant receptor genes, which form with 1,200 members the largest gene family in the mouse genome. Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.

Results: Statistical tests have been developed for microarray data to identify genes that are differentially expressed relative to a fold change threshold. Here we report that another approach, which we refer to as tTREAT, is more appropriate for our NanoString data, where false discoveries lead to costly and time-consuming follow-up experiments. Methods that we refer to as tTREAT2 and the running fold change model improve the performance of the statistical tests by protecting or selecting the fold change threshold more objectively. We show the benefits on simulated and real data.

Conclusions: Gene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes. Because it can be difficult to set in advance a fold change threshold that is meaningful for the available data, we developed methods that enable a better choice (thus reducing false discoveries and/or missed genes) or avoid this choice altogether. This set of tools may be useful for the analysis of other types of gene expression data.

Show MeSH