Limits...
Statistical analysis of differential gene expression relative to a fold change threshold on NanoString data of mouse odorant receptor genes.

Vaes E, Khan M, Mombaerts P - BMC Bioinformatics (2014)

Bottom Line: Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.We show the benefits on simulated and real data.Gene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Research Unit for Neurogenetics, Max-von-Laue-Strasse 3, 60438 Frankfurt, Germany. peter.mombaerts@biophys.mpg.de.

ABSTRACT

Background: A challenge in gene expression studies is the reliable identification of differentially expressed genes. In many high-throughput studies, genes are accepted as differentially expressed only if they satisfy simultaneously a p value criterion and a fold change criterion. A statistical method, TREAT, has been developed for microarray data to assess formally if fold changes are significantly higher than a predefined threshold. We have recently applied the NanoString digital platform to study expression of mouse odorant receptor genes, which form with 1,200 members the largest gene family in the mouse genome. Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.

Results: Statistical tests have been developed for microarray data to identify genes that are differentially expressed relative to a fold change threshold. Here we report that another approach, which we refer to as tTREAT, is more appropriate for our NanoString data, where false discoveries lead to costly and time-consuming follow-up experiments. Methods that we refer to as tTREAT2 and the running fold change model improve the performance of the statistical tests by protecting or selecting the fold change threshold more objectively. We show the benefits on simulated and real data.

Conclusions: Gene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes. Because it can be difficult to set in advance a fold change threshold that is meaningful for the available data, we developed methods that enable a better choice (thus reducing false discoveries and/or missed genes) or avoid this choice altogether. This set of tools may be useful for the analysis of other types of gene expression data.

Show MeSH

Related in: MedlinePlus

The two-stage design in a stringent test situation. (A) Data simulation experiment: empirical density functions of the DE genes (solid curve), noisy non-DE genes (dashed curve), and non-DE genes (dotted curve). The vertical olive lines indicate the simulated FC difference ω between test and control groups. The choice of ω determines the peak of the density function of the DE genes, which lies a little further. The non-DE genes are distributed around a FC difference of 0. A stringent test relative to a FC threshold τ, as indicated by the purple dashed lines, identifies only the DE genes to the right or left of these lines. (B) The positive y axis shows the average number of false discoveries, and the negative y axis shows the average number of missed genes, over 400 generated datasets for three tests relative to a FC threshold. The x axis shows the percentage of DE genes (1%, 10% and 20%) that is simulated in each case. Significance is set at p = 0.01. The data are simulated with respect to a FC difference ω of 1.5. The reference tTREAT with a FC threshold τ of 1.5, thus a test with τ = ω, is depicted in gray and black. The stringent test with a FC threshold τ of 2.5 is in cyan and blue. The decrease (prefixed with a minus sign) or increase (prefixed with a plus sign) in false discoveries or missed genes with respect to the reference (tTREAT with τ of 1.5) is indicated above or below the respective colored bar. When applying tTREAT2 (orange and red) the false discoveries are decreased with regard to the reference test, but the missed genes are not as much increased with regard to the reference test as is the case with the stringent test.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4016238&req=5

Figure 2: The two-stage design in a stringent test situation. (A) Data simulation experiment: empirical density functions of the DE genes (solid curve), noisy non-DE genes (dashed curve), and non-DE genes (dotted curve). The vertical olive lines indicate the simulated FC difference ω between test and control groups. The choice of ω determines the peak of the density function of the DE genes, which lies a little further. The non-DE genes are distributed around a FC difference of 0. A stringent test relative to a FC threshold τ, as indicated by the purple dashed lines, identifies only the DE genes to the right or left of these lines. (B) The positive y axis shows the average number of false discoveries, and the negative y axis shows the average number of missed genes, over 400 generated datasets for three tests relative to a FC threshold. The x axis shows the percentage of DE genes (1%, 10% and 20%) that is simulated in each case. Significance is set at p = 0.01. The data are simulated with respect to a FC difference ω of 1.5. The reference tTREAT with a FC threshold τ of 1.5, thus a test with τ = ω, is depicted in gray and black. The stringent test with a FC threshold τ of 2.5 is in cyan and blue. The decrease (prefixed with a minus sign) or increase (prefixed with a plus sign) in false discoveries or missed genes with respect to the reference (tTREAT with τ of 1.5) is indicated above or below the respective colored bar. When applying tTREAT2 (orange and red) the false discoveries are decreased with regard to the reference test, but the missed genes are not as much increased with regard to the reference test as is the case with the stringent test.

Mentions: Area under the ROC curve (AUC) for various methods on simulated data


Statistical analysis of differential gene expression relative to a fold change threshold on NanoString data of mouse odorant receptor genes.

Vaes E, Khan M, Mombaerts P - BMC Bioinformatics (2014)

The two-stage design in a stringent test situation. (A) Data simulation experiment: empirical density functions of the DE genes (solid curve), noisy non-DE genes (dashed curve), and non-DE genes (dotted curve). The vertical olive lines indicate the simulated FC difference ω between test and control groups. The choice of ω determines the peak of the density function of the DE genes, which lies a little further. The non-DE genes are distributed around a FC difference of 0. A stringent test relative to a FC threshold τ, as indicated by the purple dashed lines, identifies only the DE genes to the right or left of these lines. (B) The positive y axis shows the average number of false discoveries, and the negative y axis shows the average number of missed genes, over 400 generated datasets for three tests relative to a FC threshold. The x axis shows the percentage of DE genes (1%, 10% and 20%) that is simulated in each case. Significance is set at p = 0.01. The data are simulated with respect to a FC difference ω of 1.5. The reference tTREAT with a FC threshold τ of 1.5, thus a test with τ = ω, is depicted in gray and black. The stringent test with a FC threshold τ of 2.5 is in cyan and blue. The decrease (prefixed with a minus sign) or increase (prefixed with a plus sign) in false discoveries or missed genes with respect to the reference (tTREAT with τ of 1.5) is indicated above or below the respective colored bar. When applying tTREAT2 (orange and red) the false discoveries are decreased with regard to the reference test, but the missed genes are not as much increased with regard to the reference test as is the case with the stringent test.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4016238&req=5

Figure 2: The two-stage design in a stringent test situation. (A) Data simulation experiment: empirical density functions of the DE genes (solid curve), noisy non-DE genes (dashed curve), and non-DE genes (dotted curve). The vertical olive lines indicate the simulated FC difference ω between test and control groups. The choice of ω determines the peak of the density function of the DE genes, which lies a little further. The non-DE genes are distributed around a FC difference of 0. A stringent test relative to a FC threshold τ, as indicated by the purple dashed lines, identifies only the DE genes to the right or left of these lines. (B) The positive y axis shows the average number of false discoveries, and the negative y axis shows the average number of missed genes, over 400 generated datasets for three tests relative to a FC threshold. The x axis shows the percentage of DE genes (1%, 10% and 20%) that is simulated in each case. Significance is set at p = 0.01. The data are simulated with respect to a FC difference ω of 1.5. The reference tTREAT with a FC threshold τ of 1.5, thus a test with τ = ω, is depicted in gray and black. The stringent test with a FC threshold τ of 2.5 is in cyan and blue. The decrease (prefixed with a minus sign) or increase (prefixed with a plus sign) in false discoveries or missed genes with respect to the reference (tTREAT with τ of 1.5) is indicated above or below the respective colored bar. When applying tTREAT2 (orange and red) the false discoveries are decreased with regard to the reference test, but the missed genes are not as much increased with regard to the reference test as is the case with the stringent test.
Mentions: Area under the ROC curve (AUC) for various methods on simulated data

Bottom Line: Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.We show the benefits on simulated and real data.Gene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Research Unit for Neurogenetics, Max-von-Laue-Strasse 3, 60438 Frankfurt, Germany. peter.mombaerts@biophys.mpg.de.

ABSTRACT

Background: A challenge in gene expression studies is the reliable identification of differentially expressed genes. In many high-throughput studies, genes are accepted as differentially expressed only if they satisfy simultaneously a p value criterion and a fold change criterion. A statistical method, TREAT, has been developed for microarray data to assess formally if fold changes are significantly higher than a predefined threshold. We have recently applied the NanoString digital platform to study expression of mouse odorant receptor genes, which form with 1,200 members the largest gene family in the mouse genome. Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.

Results: Statistical tests have been developed for microarray data to identify genes that are differentially expressed relative to a fold change threshold. Here we report that another approach, which we refer to as tTREAT, is more appropriate for our NanoString data, where false discoveries lead to costly and time-consuming follow-up experiments. Methods that we refer to as tTREAT2 and the running fold change model improve the performance of the statistical tests by protecting or selecting the fold change threshold more objectively. We show the benefits on simulated and real data.

Conclusions: Gene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes. Because it can be difficult to set in advance a fold change threshold that is meaningful for the available data, we developed methods that enable a better choice (thus reducing false discoveries and/or missed genes) or avoid this choice altogether. This set of tools may be useful for the analysis of other types of gene expression data.

Show MeSH
Related in: MedlinePlus