qcML: an exchange format for quality control metrics from mass spectrometry experiments.
Bottom Line: We therefore developed the qcML format, an XML-based standard that follows the design principles of the related mzML, mzIdentML, mzQuantML, and TraML standards from the HUPO-PSI (Proteomics Standards Initiative).In addition to the XML format, we also provide tools for the calculation of a wide range of quality metrics as well as a database format and interconversion tools, so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema.We here describe the qcML specification, along with possible use cases and an illustrative example of the subsequent analysis possibilities.
Affiliation: From the ‡Applied Bioinformatics, Center for Bioinformatics, Quantitative Biology Center, and Dept. of Computer Science, University of Tuebingen, Germany;Show MeSH
Mentions: Using the automated implementation of qcML generation in OpenMS, we calculated quality control metrics for several thousand MS runs stored in ms_lims (31). A read-out of metrics “median m/z” and “ratio of 2+ charged features” covering experiments acquired on a single Thermo Scientific LTQ Orbitrap Velos instrument over a time period of three consecutive months is given in Fig. 4. To show the ranges for the calculated metrics over this period, we include all analyses that were run during this period, comprising a heterogeneous collection of proteomic samples ranging from full human, yeast, and E. coli lysates over enolase standard samples, to 2DE purified samples. All these mixtures were analyzed in shotgun mode with different LC-MS protocols, including COFRADIC (32), with custom-made LC columns made to the same specifications. In the OpenMS-KNIME workflow used to generate qcML, all the spectra were searched with X!Tandem (33) against the complete SwissProt (34) database (note that the figures shown here only include MS1 features, so no peptide identification information is used). Given that color coding is used to distinguish different experimental protocols, it is clear that tolerance boundaries are best specified separately for different types of experiments for these metrics. As a result, it is straightforward to conclude that the application of global quality control thresholds will be mostly counterproductive. Indeed, different instruments, different samples, and different protocols will yield differences in a variety of metrics. As such, the importance of recording and archiving quality control information over time becomes all the more important, because this allows the derivation of tolerances and constraints on applicable parameters, in turn allowing the automatic flagging of experiments, a concept that has already been implemented in the SIMPATIQCO software (13). See Fig. 5 for example measures of flagging, wherein a simple standard mixture that was run between sets of normal samples has been analyzed. These data span one month of measurements performed on the same Orbitrap Velos instrument. Spectra have been processed with the same OpenMS-KNIME workflow discussed above, with searches performed with X!Tandem against the complete SwissProt database and the false discovery rate set at 1%. The 95% confidence interval and upper inner fence are both indicated as possible measures, the first proving less stringent for flagging outliers.
Affiliation: From the ‡Applied Bioinformatics, Center for Bioinformatics, Quantitative Biology Center, and Dept. of Computer Science, University of Tuebingen, Germany;