Limits...
Integrated multi-level quality control for proteomic profiling studies using mass spectrometry.

Cairns DA, Perkins DN, Stanley AJ, Thompson D, Barrett JH, Selby PJ, Banks RE - BMC Bioinformatics (2008)

Bottom Line: Manual inspection of those spectral data that were identified as being of poor quality confirmed the efficacy of the algorithms.Variance components analysis demonstrated the relatively small amount of technical variance attributable to day of profile generation and experimental array.The removal of these spectra at the initial stages of the analysis substantially improves the confidence of putative biomarker identification and allows inter-experimental comparisons to be carried out with greater confidence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Cancer Research UK Clinical Centre, Leeds Institute of Molecular Medicine, St James's University Hospital, Leeds, UK. d.a.cairns@leeds.ac.uk

ABSTRACT

Background: Proteomic profiling using mass spectrometry (MS) is one of the most promising methods for the analysis of complex biological samples such as urine, serum and tissue for biomarker discovery. Such experiments are often conducted using MALDI-TOF (matrix-assisted laser desorption/ionisation time-of-flight) and SELDI-TOF (surface-enhanced laser desorption/ionisation time-of-flight) MS. Using such profiling methods it is possible to identify changes in protein expression that differentiate disease states and individual proteins or patterns that may be useful as potential biomarkers. However, the incorporation of quality control (QC) processes that allow the identification of low quality spectra reliably and hence allow the removal of such data before further analysis is often overlooked. In this paper we describe rigorous methods for the assessment of quality of spectral data. These procedures are presented in a user-friendly, web-based program. The data obtained post-QC is then examined using variance components analysis to quantify the amount of variance due to some of the factors in the experimental design.

Results: Using data from a SELDI profiling study of serum from patients with different levels of renal function, we show how the algorithms described in this paper may be used to detect systematic variability within and between sample replicates, pooled samples and SELDI chips and spots. Manual inspection of those spectral data that were identified as being of poor quality confirmed the efficacy of the algorithms. Variance components analysis demonstrated the relatively small amount of technical variance attributable to day of profile generation and experimental array.

Conclusion: Using the techniques described in this paper it is possible to reliably detect poor quality data within proteomic profiling experiments undertaken by MS. The removal of these spectra at the initial stages of the analysis substantially improves the confidence of putative biomarker identification and allows inter-experimental comparisons to be carried out with greater confidence.

Show MeSH

Related in: MedlinePlus

The black solid bars indicate histograms of coefficient of variations expressed as percentages (CVs) for all possible pairs of QC spectra for the variables of interest in the 2–4 kDa mass region (clockwise from top left, TIC, normalised TIC, total intensity of peaks and total number of peaks). The dotted line perpendicular to the abscissa indicates the critical value for these empirical significance tests based on the 95% quantile of these measures. The red hatched histograms shows the distribution of CVs calculated for the duplicate technical replicates. CVs which are greater than the critical value will be rejected at the 5% level. These QC fails will be indicated by crosses in Table 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2657802&req=5

Figure 2: The black solid bars indicate histograms of coefficient of variations expressed as percentages (CVs) for all possible pairs of QC spectra for the variables of interest in the 2–4 kDa mass region (clockwise from top left, TIC, normalised TIC, total intensity of peaks and total number of peaks). The dotted line perpendicular to the abscissa indicates the critical value for these empirical significance tests based on the 95% quantile of these measures. The red hatched histograms shows the distribution of CVs calculated for the duplicate technical replicates. CVs which are greater than the critical value will be rejected at the 5% level. These QC fails will be indicated by crosses in Table 1.

Mentions: The final part of integrated QC analysis is the analysis of replicates. In this case the mass region was split into four equally sized segments for examination of the derived QC parameters. The output of the analysis takes the form of a table, 5 columns of which are devoted to each mass segment. A few selected rows of this output are shown in Table 1 (the full table is shown in Table S2 [see Additional file 4]). A cross or a value greater than zero indicates a discrepancy in this mass segment and this parameter for this pair of spectra as compared to what would be expected from the reference set of spectra. The meaning of the crosses in the first four columns is enhanced by examining Figure 2. These histograms show the distribution (in black) and the calculated statistics (in red) for each variable, with any calculated statistic greater than the red dotted vertical line being marked with a cross in Table 1 and Table S2.


Integrated multi-level quality control for proteomic profiling studies using mass spectrometry.

Cairns DA, Perkins DN, Stanley AJ, Thompson D, Barrett JH, Selby PJ, Banks RE - BMC Bioinformatics (2008)

The black solid bars indicate histograms of coefficient of variations expressed as percentages (CVs) for all possible pairs of QC spectra for the variables of interest in the 2–4 kDa mass region (clockwise from top left, TIC, normalised TIC, total intensity of peaks and total number of peaks). The dotted line perpendicular to the abscissa indicates the critical value for these empirical significance tests based on the 95% quantile of these measures. The red hatched histograms shows the distribution of CVs calculated for the duplicate technical replicates. CVs which are greater than the critical value will be rejected at the 5% level. These QC fails will be indicated by crosses in Table 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2657802&req=5

Figure 2: The black solid bars indicate histograms of coefficient of variations expressed as percentages (CVs) for all possible pairs of QC spectra for the variables of interest in the 2–4 kDa mass region (clockwise from top left, TIC, normalised TIC, total intensity of peaks and total number of peaks). The dotted line perpendicular to the abscissa indicates the critical value for these empirical significance tests based on the 95% quantile of these measures. The red hatched histograms shows the distribution of CVs calculated for the duplicate technical replicates. CVs which are greater than the critical value will be rejected at the 5% level. These QC fails will be indicated by crosses in Table 1.
Mentions: The final part of integrated QC analysis is the analysis of replicates. In this case the mass region was split into four equally sized segments for examination of the derived QC parameters. The output of the analysis takes the form of a table, 5 columns of which are devoted to each mass segment. A few selected rows of this output are shown in Table 1 (the full table is shown in Table S2 [see Additional file 4]). A cross or a value greater than zero indicates a discrepancy in this mass segment and this parameter for this pair of spectra as compared to what would be expected from the reference set of spectra. The meaning of the crosses in the first four columns is enhanced by examining Figure 2. These histograms show the distribution (in black) and the calculated statistics (in red) for each variable, with any calculated statistic greater than the red dotted vertical line being marked with a cross in Table 1 and Table S2.

Bottom Line: Manual inspection of those spectral data that were identified as being of poor quality confirmed the efficacy of the algorithms.Variance components analysis demonstrated the relatively small amount of technical variance attributable to day of profile generation and experimental array.The removal of these spectra at the initial stages of the analysis substantially improves the confidence of putative biomarker identification and allows inter-experimental comparisons to be carried out with greater confidence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Cancer Research UK Clinical Centre, Leeds Institute of Molecular Medicine, St James's University Hospital, Leeds, UK. d.a.cairns@leeds.ac.uk

ABSTRACT

Background: Proteomic profiling using mass spectrometry (MS) is one of the most promising methods for the analysis of complex biological samples such as urine, serum and tissue for biomarker discovery. Such experiments are often conducted using MALDI-TOF (matrix-assisted laser desorption/ionisation time-of-flight) and SELDI-TOF (surface-enhanced laser desorption/ionisation time-of-flight) MS. Using such profiling methods it is possible to identify changes in protein expression that differentiate disease states and individual proteins or patterns that may be useful as potential biomarkers. However, the incorporation of quality control (QC) processes that allow the identification of low quality spectra reliably and hence allow the removal of such data before further analysis is often overlooked. In this paper we describe rigorous methods for the assessment of quality of spectral data. These procedures are presented in a user-friendly, web-based program. The data obtained post-QC is then examined using variance components analysis to quantify the amount of variance due to some of the factors in the experimental design.

Results: Using data from a SELDI profiling study of serum from patients with different levels of renal function, we show how the algorithms described in this paper may be used to detect systematic variability within and between sample replicates, pooled samples and SELDI chips and spots. Manual inspection of those spectral data that were identified as being of poor quality confirmed the efficacy of the algorithms. Variance components analysis demonstrated the relatively small amount of technical variance attributable to day of profile generation and experimental array.

Conclusion: Using the techniques described in this paper it is possible to reliably detect poor quality data within proteomic profiling experiments undertaken by MS. The removal of these spectra at the initial stages of the analysis substantially improves the confidence of putative biomarker identification and allows inter-experimental comparisons to be carried out with greater confidence.

Show MeSH
Related in: MedlinePlus