Limits...
Distributional fold change test - a statistical approach for detecting differential expression in microarray experiments.

Farztdinov V, McDyer F - Algorithms Mol Biol (2012)

Bottom Line: This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed.At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment.Overall the DFC test performed the best - on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Almac Diagnostics, 19 Seagoe Industrial Estate, Craigavon, BT63 5QD, UK. vadim.farztdinov@almacgroup.com.

ABSTRACT

Background: Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment.

Results: A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best - on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets.

Conclusions: The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set.

No MeSH data available.


Related in: MedlinePlus

Average SPA curves. Average standardized partial area (SPA) curves for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from 11 data sets having 284 true DEGs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526407&req=5

Figure 5: Average SPA curves. Average standardized partial area (SPA) curves for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from 11 data sets having 284 true DEGs.

Mentions: Figure3 shows ROC and SPA curves for 3 out of 11 analysed data sets, selected to represent different pre-processing methods and different number of features proved by RT-PCR. The first data set was pre-processed with MAS5 and has the highest number of samples. The other two data sets were pre-processed with RMA and have a reasonable number of samples and features tested by RT-PCR. Curves for all data sets are provided in Additional file1. One can see that independent of the pre-processing method, the DFC test performs in general slightly better than CAT(diag) and much better than t-test. This observation is confirmed when 〈ROC/ν〉 and 〈SPA/ν〉 curves are compared. These curves are obtained by averaging parametric dependences over all 11 data sets (indicated by angular brackets) under a fixed fraction ν of top ranked features selected. The dependences are shown in Figures4 and5 by thick lines and the plots are provided for both pre-processing methods, MAS5 and RMA. To reveal the extent of variance in the data for each method, Figure4 also shows thin lines drawn at half of the standard error above and below the corresponding average curve.


Distributional fold change test - a statistical approach for detecting differential expression in microarray experiments.

Farztdinov V, McDyer F - Algorithms Mol Biol (2012)

Average SPA curves. Average standardized partial area (SPA) curves for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from 11 data sets having 284 true DEGs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526407&req=5

Figure 5: Average SPA curves. Average standardized partial area (SPA) curves for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from 11 data sets having 284 true DEGs.
Mentions: Figure3 shows ROC and SPA curves for 3 out of 11 analysed data sets, selected to represent different pre-processing methods and different number of features proved by RT-PCR. The first data set was pre-processed with MAS5 and has the highest number of samples. The other two data sets were pre-processed with RMA and have a reasonable number of samples and features tested by RT-PCR. Curves for all data sets are provided in Additional file1. One can see that independent of the pre-processing method, the DFC test performs in general slightly better than CAT(diag) and much better than t-test. This observation is confirmed when 〈ROC/ν〉 and 〈SPA/ν〉 curves are compared. These curves are obtained by averaging parametric dependences over all 11 data sets (indicated by angular brackets) under a fixed fraction ν of top ranked features selected. The dependences are shown in Figures4 and5 by thick lines and the plots are provided for both pre-processing methods, MAS5 and RMA. To reveal the extent of variance in the data for each method, Figure4 also shows thin lines drawn at half of the standard error above and below the corresponding average curve.

Bottom Line: This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed.At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment.Overall the DFC test performed the best - on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Almac Diagnostics, 19 Seagoe Industrial Estate, Craigavon, BT63 5QD, UK. vadim.farztdinov@almacgroup.com.

ABSTRACT

Background: Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment.

Results: A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best - on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets.

Conclusions: The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set.

No MeSH data available.


Related in: MedlinePlus