Limits...
Distributional fold change test - a statistical approach for detecting differential expression in microarray experiments.

Farztdinov V, McDyer F - Algorithms Mol Biol (2012)

Bottom Line: This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed.At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment.Overall the DFC test performed the best - on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Almac Diagnostics, 19 Seagoe Industrial Estate, Craigavon, BT63 5QD, UK. vadim.farztdinov@almacgroup.com.

ABSTRACT

Background: Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment.

Results: A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best - on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets.

Conclusions: The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set.

No MeSH data available.


Related in: MedlinePlus

Application of an expression dependent threshold (14). Scatterplot of features in the two-dimensional space of log2(variance), average expression for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from data set GSE6011 (see Table1) consisting of 37 samples. Blue dot represent features satisfying condition (11) and therefore considered as coming from  distribution. Green points represent features having total variance above expression dependent threshold and considered as non-s. On each panel, marginalized distributions of all and non- features over variance is shown on the left side and marginalized distribution of all and non- features over average expression is shown at the bottom of the panel.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526407&req=5

Figure 2: Application of an expression dependent threshold (14). Scatterplot of features in the two-dimensional space of log2(variance), average expression for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from data set GSE6011 (see Table1) consisting of 37 samples. Blue dot represent features satisfying condition (11) and therefore considered as coming from distribution. Green points represent features having total variance above expression dependent threshold and considered as non-s. On each panel, marginalized distributions of all and non- features over variance is shown on the left side and marginalized distribution of all and non- features over average expression is shown at the bottom of the panel.

Mentions: and can be used as a boundary to set up a variance filter. Its application to remove features is shown in Figure2. We supposed in previous section that fEEM(d/μ) ~ N(0, σ0(μ)2). Basing on approximation (10) and using the definition (11) the dependence σ0(μ) can be estimatedb from fit


Distributional fold change test - a statistical approach for detecting differential expression in microarray experiments.

Farztdinov V, McDyer F - Algorithms Mol Biol (2012)

Application of an expression dependent threshold (14). Scatterplot of features in the two-dimensional space of log2(variance), average expression for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from data set GSE6011 (see Table1) consisting of 37 samples. Blue dot represent features satisfying condition (11) and therefore considered as coming from  distribution. Green points represent features having total variance above expression dependent threshold and considered as non-s. On each panel, marginalized distributions of all and non- features over variance is shown on the left side and marginalized distribution of all and non- features over average expression is shown at the bottom of the panel.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526407&req=5

Figure 2: Application of an expression dependent threshold (14). Scatterplot of features in the two-dimensional space of log2(variance), average expression for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from data set GSE6011 (see Table1) consisting of 37 samples. Blue dot represent features satisfying condition (11) and therefore considered as coming from distribution. Green points represent features having total variance above expression dependent threshold and considered as non-s. On each panel, marginalized distributions of all and non- features over variance is shown on the left side and marginalized distribution of all and non- features over average expression is shown at the bottom of the panel.
Mentions: and can be used as a boundary to set up a variance filter. Its application to remove features is shown in Figure2. We supposed in previous section that fEEM(d/μ) ~ N(0, σ0(μ)2). Basing on approximation (10) and using the definition (11) the dependence σ0(μ) can be estimatedb from fit

Bottom Line: This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed.At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment.Overall the DFC test performed the best - on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Almac Diagnostics, 19 Seagoe Industrial Estate, Craigavon, BT63 5QD, UK. vadim.farztdinov@almacgroup.com.

ABSTRACT

Background: Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment.

Results: A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best - on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets.

Conclusions: The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set.

No MeSH data available.


Related in: MedlinePlus