Limits...
IQRray, a new method for Affymetrix microarray quality control, and the homologous organ conservation score, a new benchmark method for quality control metrics.

Rosikiewicz M, Robinson-Rechavi M - Bioinformatics (2014)

Bottom Line: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases.The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.Marta.Rosikiewicz@unil.ch Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Lausanne and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.

ABSTRACT

Motivation: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability.

Results: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments.

Availability and implementation: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.

Contact: Marta.Rosikiewicz@unil.ch

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH
Spearman ρ values from correlation test between quality metrics and HOC score for (a) human and (b) mouse organ-specific datasets
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4016700&req=5

btu027-F4: Spearman ρ values from correlation test between quality metrics and HOC score for (a) human and (b) mouse organ-specific datasets

Mentions: Each array was evaluated separately by a set of microarray quality control methods. Then we checked how well the quality metrics of each method agreed with the HOC score. There was large variation in the correlation with this external quality indicator (Figs 3 and 4). For example, in the case of the human blood dataset, the largest dataset in the study (Table 1), the IQRray method displayed nearly perfect correlation with the HOC score (Spearman correlation of 0.97) (Figs 3a and 4a). In contrast, NUSE and RLE (McCall et al., 2011), which are frequently used quality control methods, showed only a weak positive correlation (Figs 3b and 4b). For this human blood dataset, only a low fraction of samples came from experiments with fewer than six samples (Supplementary Table S1); thus, the low correlation with HOC cannot be explained simply by a lack of power, owing to an insufficient number of arrays in experiments. In general, across all analyzed datasets, NUSE and RLE performed poorly, which suggests that the scores returned by these methods are not directly comparable between independent experiments. All traditional single-array quality metrics, such as RNA degradation slope, average background (avbg), scaling factor and ratios between signal for the 3′ and 5′ ends of actin and GAPDH transcripts, show low performance, and the correlation was even negative for some of the methods for some datasets (Fig. 4a and b). The score from the GNUSE method, the only published method dedicated to absolute quantification of microarray quality (McCall et al., 2011), correlates well with the external quality metric only for mouse datasets, whereas for humans, GNUSE obtained poor results for nearly all datasets (Fig. 4b). The IQRray performed the best in 8 of 11 datasets. The other methods that displayed high agreement with the HOC score for both mouse and human data were percent present and the PM/MM t-test.Fig. 3.


IQRray, a new method for Affymetrix microarray quality control, and the homologous organ conservation score, a new benchmark method for quality control metrics.

Rosikiewicz M, Robinson-Rechavi M - Bioinformatics (2014)

Spearman ρ values from correlation test between quality metrics and HOC score for (a) human and (b) mouse organ-specific datasets
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4016700&req=5

btu027-F4: Spearman ρ values from correlation test between quality metrics and HOC score for (a) human and (b) mouse organ-specific datasets
Mentions: Each array was evaluated separately by a set of microarray quality control methods. Then we checked how well the quality metrics of each method agreed with the HOC score. There was large variation in the correlation with this external quality indicator (Figs 3 and 4). For example, in the case of the human blood dataset, the largest dataset in the study (Table 1), the IQRray method displayed nearly perfect correlation with the HOC score (Spearman correlation of 0.97) (Figs 3a and 4a). In contrast, NUSE and RLE (McCall et al., 2011), which are frequently used quality control methods, showed only a weak positive correlation (Figs 3b and 4b). For this human blood dataset, only a low fraction of samples came from experiments with fewer than six samples (Supplementary Table S1); thus, the low correlation with HOC cannot be explained simply by a lack of power, owing to an insufficient number of arrays in experiments. In general, across all analyzed datasets, NUSE and RLE performed poorly, which suggests that the scores returned by these methods are not directly comparable between independent experiments. All traditional single-array quality metrics, such as RNA degradation slope, average background (avbg), scaling factor and ratios between signal for the 3′ and 5′ ends of actin and GAPDH transcripts, show low performance, and the correlation was even negative for some of the methods for some datasets (Fig. 4a and b). The score from the GNUSE method, the only published method dedicated to absolute quantification of microarray quality (McCall et al., 2011), correlates well with the external quality metric only for mouse datasets, whereas for humans, GNUSE obtained poor results for nearly all datasets (Fig. 4b). The IQRray performed the best in 8 of 11 datasets. The other methods that displayed high agreement with the HOC score for both mouse and human data were percent present and the PM/MM t-test.Fig. 3.

Bottom Line: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases.The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.Marta.Rosikiewicz@unil.ch Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Lausanne and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.

ABSTRACT

Motivation: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability.

Results: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments.

Availability and implementation: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.

Contact: Marta.Rosikiewicz@unil.ch

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH