Limits...
IQRray, a new method for Affymetrix microarray quality control, and the homologous organ conservation score, a new benchmark method for quality control metrics.

Rosikiewicz M, Robinson-Rechavi M - Bioinformatics (2014)

Bottom Line: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases.The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.Marta.Rosikiewicz@unil.ch Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Lausanne and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.

ABSTRACT

Motivation: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability.

Results: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments.

Availability and implementation: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.

Contact: Marta.Rosikiewicz@unil.ch

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH
Distribution of probe set average ranks of (a) simulated array with intensity values assigned randomly to probe sets, (b) simulated array with consistent intensity values in probe sets, (c) real array with a low IQRray score (GSM50702) and (d) real array with a high IQRray score (GSM371402)
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4016700&req=5

btu027-F1: Distribution of probe set average ranks of (a) simulated array with intensity values assigned randomly to probe sets, (b) simulated array with consistent intensity values in probe sets, (c) real array with a low IQRray score (GSM50702) and (d) real array with a high IQRray score (GSM371402)

Mentions: Because of the limitations of available methods, we propose a new method for multi-experiment quality control. In Affymetrix technology, the final expression level is computed on the basis of intensity levels of several independent probes matching the same target messenger RNA. In our new IQRray method, we transform all probe signal values into ranks and subsequently compute the average rank of probes that belong to the same probe set. We expect that the higher the quality of a given array, the more consistent the levels of probe signal from the same probe set. The average rank of probes from a probe sets that match highly expressed genes should be high, whereas the average rank of probes sets that match lowly or not expressed genes should be low. All factors that increase signal noise, such as unspecific hybridization or spatial artifacts, are expected to lead to a more random distribution of probe signals among probe sets. Mixing of low and high ranks in the same probe set should shift the value of the average rank of a probe set toward the average rank of all probes on the array. Consequently, lower quality microarrays will have more narrow spreads of distribution of rank averages. As a measure of this tendency, we propose to use IQR of probe set average rank: the IQRray score. Figure 1 shows distributions of probe set average ranks from two idealized arrays: one where intensities of probes in probe sets had consistent values and a second where signal values were assigned randomly to the probe sets. We also selected from microarrays in the Bgee database examples of arrays with extreme IQRray scores. It can be seen that the IQR of probe set average ranks is much smaller when the signal values were distributed randomly among probe sets than when they have consistent signal values. The distribution of probe sets’ average ranks of a presumptive low-quality array resembles the distribution of probes with randomly assigned values. The distribution of a presumptive high-quality array shows, in contrast, a bimodal shape due to probe sets targeting lowly or not expressed genes and highly expressed genes.Fig. 1.


IQRray, a new method for Affymetrix microarray quality control, and the homologous organ conservation score, a new benchmark method for quality control metrics.

Rosikiewicz M, Robinson-Rechavi M - Bioinformatics (2014)

Distribution of probe set average ranks of (a) simulated array with intensity values assigned randomly to probe sets, (b) simulated array with consistent intensity values in probe sets, (c) real array with a low IQRray score (GSM50702) and (d) real array with a high IQRray score (GSM371402)
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4016700&req=5

btu027-F1: Distribution of probe set average ranks of (a) simulated array with intensity values assigned randomly to probe sets, (b) simulated array with consistent intensity values in probe sets, (c) real array with a low IQRray score (GSM50702) and (d) real array with a high IQRray score (GSM371402)
Mentions: Because of the limitations of available methods, we propose a new method for multi-experiment quality control. In Affymetrix technology, the final expression level is computed on the basis of intensity levels of several independent probes matching the same target messenger RNA. In our new IQRray method, we transform all probe signal values into ranks and subsequently compute the average rank of probes that belong to the same probe set. We expect that the higher the quality of a given array, the more consistent the levels of probe signal from the same probe set. The average rank of probes from a probe sets that match highly expressed genes should be high, whereas the average rank of probes sets that match lowly or not expressed genes should be low. All factors that increase signal noise, such as unspecific hybridization or spatial artifacts, are expected to lead to a more random distribution of probe signals among probe sets. Mixing of low and high ranks in the same probe set should shift the value of the average rank of a probe set toward the average rank of all probes on the array. Consequently, lower quality microarrays will have more narrow spreads of distribution of rank averages. As a measure of this tendency, we propose to use IQR of probe set average rank: the IQRray score. Figure 1 shows distributions of probe set average ranks from two idealized arrays: one where intensities of probes in probe sets had consistent values and a second where signal values were assigned randomly to the probe sets. We also selected from microarrays in the Bgee database examples of arrays with extreme IQRray scores. It can be seen that the IQR of probe set average ranks is much smaller when the signal values were distributed randomly among probe sets than when they have consistent signal values. The distribution of probe sets’ average ranks of a presumptive low-quality array resembles the distribution of probes with randomly assigned values. The distribution of a presumptive high-quality array shows, in contrast, a bimodal shape due to probe sets targeting lowly or not expressed genes and highly expressed genes.Fig. 1.

Bottom Line: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases.The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.Marta.Rosikiewicz@unil.ch Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Lausanne and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.

ABSTRACT

Motivation: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability.

Results: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments.

Availability and implementation: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.

Contact: Marta.Rosikiewicz@unil.ch

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH