Limits...
Integrated multi-level quality control for proteomic profiling studies using mass spectrometry.

Cairns DA, Perkins DN, Stanley AJ, Thompson D, Barrett JH, Selby PJ, Banks RE - BMC Bioinformatics (2008)

Bottom Line: Manual inspection of those spectral data that were identified as being of poor quality confirmed the efficacy of the algorithms.Variance components analysis demonstrated the relatively small amount of technical variance attributable to day of profile generation and experimental array.The removal of these spectra at the initial stages of the analysis substantially improves the confidence of putative biomarker identification and allows inter-experimental comparisons to be carried out with greater confidence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Cancer Research UK Clinical Centre, Leeds Institute of Molecular Medicine, St James's University Hospital, Leeds, UK. d.a.cairns@leeds.ac.uk

ABSTRACT

Background: Proteomic profiling using mass spectrometry (MS) is one of the most promising methods for the analysis of complex biological samples such as urine, serum and tissue for biomarker discovery. Such experiments are often conducted using MALDI-TOF (matrix-assisted laser desorption/ionisation time-of-flight) and SELDI-TOF (surface-enhanced laser desorption/ionisation time-of-flight) MS. Using such profiling methods it is possible to identify changes in protein expression that differentiate disease states and individual proteins or patterns that may be useful as potential biomarkers. However, the incorporation of quality control (QC) processes that allow the identification of low quality spectra reliably and hence allow the removal of such data before further analysis is often overlooked. In this paper we describe rigorous methods for the assessment of quality of spectral data. These procedures are presented in a user-friendly, web-based program. The data obtained post-QC is then examined using variance components analysis to quantify the amount of variance due to some of the factors in the experimental design.

Results: Using data from a SELDI profiling study of serum from patients with different levels of renal function, we show how the algorithms described in this paper may be used to detect systematic variability within and between sample replicates, pooled samples and SELDI chips and spots. Manual inspection of those spectral data that were identified as being of poor quality confirmed the efficacy of the algorithms. Variance components analysis demonstrated the relatively small amount of technical variance attributable to day of profile generation and experimental array.

Conclusion: Using the techniques described in this paper it is possible to reliably detect poor quality data within proteomic profiling experiments undertaken by MS. The removal of these spectra at the initial stages of the analysis substantially improves the confidence of putative biomarker identification and allows inter-experimental comparisons to be carried out with greater confidence.

Show MeSH

Related in: MedlinePlus

Variance components, mean and CV spectra for proteomic profiling study of renal function using IMAC-Cu chips). Each bar in the top panel represents the proportion of variance that can be attributed to biological (within and between classes) and technical (within and between days) components of variation for each peak. The peak corresponding to each bar is denoted by a gray line from the mean/CV spectra in the lower panel.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2657802&req=5

Figure 6: Variance components, mean and CV spectra for proteomic profiling study of renal function using IMAC-Cu chips). Each bar in the top panel represents the proportion of variance that can be attributed to biological (within and between classes) and technical (within and between days) components of variation for each peak. The peak corresponding to each bar is denoted by a gray line from the mean/CV spectra in the lower panel.

Mentions: With high quality data, it is possible to estimate the magnitude of the components of variation which can be attributed to different factors in the design of this experiment. The classical estimation method was used to estimate simple variation components attributing variation to technical and biological components and these results compared with a similar model estimated in OpenBUGS and found to produce similar results. The more complex model with terms for day, chip and spot within the technical variation was fitted and convergence assessed for each parameter for each peak using the scale reduction factor. Figure 6 shows graphically the magnitude of variance components as a proportion of the total variation. This is presented in conjunction with a representation of the mean and CV spectra for intensity allowing the attribution of portions of the variation to each of these factors. Further details regarding the mean and CV spectra are provided by the summary statistics shown in Table S3 [see Additional file 5]. Figure 6 in conjunction with Table 2 show summaries of the results of the variance components analysis in the case where we attribute variance to biological variation (both between groups and within groups) and technical variation (attributed to day, chip, spot and within day). Figure 6 shows that in most cases around half of the variation can be attributed to technical variation (black, white, blue and red bars) and half of the variation to biological variation (green and yellow bars). When summarizing variance components as percentages of the total variance, the median variance in peak intensity can be characterised as consisting of approximately 49% technical variance and 51% biological variance. The median biological component of variance is made up of 0.06% of variation between classes (although it is clear that this is much higher in some cases). Similarly, the technical variation can be split into 4% between day, 1% attributable to chip and 7% attributable to spot with the remaining 37% attributable to within-day variation which cannot be characterised by this analysis. Of the factors in technical variation that could be decomposed it can be seen that the variance due to day of profile determination and the variances due to chip and spot for the mean peak are very small when compared to the unexplained factors in technical variation (Table 2) which could be due to robot performance, laser or detector stability or other factors in the experimental process.


Integrated multi-level quality control for proteomic profiling studies using mass spectrometry.

Cairns DA, Perkins DN, Stanley AJ, Thompson D, Barrett JH, Selby PJ, Banks RE - BMC Bioinformatics (2008)

Variance components, mean and CV spectra for proteomic profiling study of renal function using IMAC-Cu chips). Each bar in the top panel represents the proportion of variance that can be attributed to biological (within and between classes) and technical (within and between days) components of variation for each peak. The peak corresponding to each bar is denoted by a gray line from the mean/CV spectra in the lower panel.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2657802&req=5

Figure 6: Variance components, mean and CV spectra for proteomic profiling study of renal function using IMAC-Cu chips). Each bar in the top panel represents the proportion of variance that can be attributed to biological (within and between classes) and technical (within and between days) components of variation for each peak. The peak corresponding to each bar is denoted by a gray line from the mean/CV spectra in the lower panel.
Mentions: With high quality data, it is possible to estimate the magnitude of the components of variation which can be attributed to different factors in the design of this experiment. The classical estimation method was used to estimate simple variation components attributing variation to technical and biological components and these results compared with a similar model estimated in OpenBUGS and found to produce similar results. The more complex model with terms for day, chip and spot within the technical variation was fitted and convergence assessed for each parameter for each peak using the scale reduction factor. Figure 6 shows graphically the magnitude of variance components as a proportion of the total variation. This is presented in conjunction with a representation of the mean and CV spectra for intensity allowing the attribution of portions of the variation to each of these factors. Further details regarding the mean and CV spectra are provided by the summary statistics shown in Table S3 [see Additional file 5]. Figure 6 in conjunction with Table 2 show summaries of the results of the variance components analysis in the case where we attribute variance to biological variation (both between groups and within groups) and technical variation (attributed to day, chip, spot and within day). Figure 6 shows that in most cases around half of the variation can be attributed to technical variation (black, white, blue and red bars) and half of the variation to biological variation (green and yellow bars). When summarizing variance components as percentages of the total variance, the median variance in peak intensity can be characterised as consisting of approximately 49% technical variance and 51% biological variance. The median biological component of variance is made up of 0.06% of variation between classes (although it is clear that this is much higher in some cases). Similarly, the technical variation can be split into 4% between day, 1% attributable to chip and 7% attributable to spot with the remaining 37% attributable to within-day variation which cannot be characterised by this analysis. Of the factors in technical variation that could be decomposed it can be seen that the variance due to day of profile determination and the variances due to chip and spot for the mean peak are very small when compared to the unexplained factors in technical variation (Table 2) which could be due to robot performance, laser or detector stability or other factors in the experimental process.

Bottom Line: Manual inspection of those spectral data that were identified as being of poor quality confirmed the efficacy of the algorithms.Variance components analysis demonstrated the relatively small amount of technical variance attributable to day of profile generation and experimental array.The removal of these spectra at the initial stages of the analysis substantially improves the confidence of putative biomarker identification and allows inter-experimental comparisons to be carried out with greater confidence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Cancer Research UK Clinical Centre, Leeds Institute of Molecular Medicine, St James's University Hospital, Leeds, UK. d.a.cairns@leeds.ac.uk

ABSTRACT

Background: Proteomic profiling using mass spectrometry (MS) is one of the most promising methods for the analysis of complex biological samples such as urine, serum and tissue for biomarker discovery. Such experiments are often conducted using MALDI-TOF (matrix-assisted laser desorption/ionisation time-of-flight) and SELDI-TOF (surface-enhanced laser desorption/ionisation time-of-flight) MS. Using such profiling methods it is possible to identify changes in protein expression that differentiate disease states and individual proteins or patterns that may be useful as potential biomarkers. However, the incorporation of quality control (QC) processes that allow the identification of low quality spectra reliably and hence allow the removal of such data before further analysis is often overlooked. In this paper we describe rigorous methods for the assessment of quality of spectral data. These procedures are presented in a user-friendly, web-based program. The data obtained post-QC is then examined using variance components analysis to quantify the amount of variance due to some of the factors in the experimental design.

Results: Using data from a SELDI profiling study of serum from patients with different levels of renal function, we show how the algorithms described in this paper may be used to detect systematic variability within and between sample replicates, pooled samples and SELDI chips and spots. Manual inspection of those spectral data that were identified as being of poor quality confirmed the efficacy of the algorithms. Variance components analysis demonstrated the relatively small amount of technical variance attributable to day of profile generation and experimental array.

Conclusion: Using the techniques described in this paper it is possible to reliably detect poor quality data within proteomic profiling experiments undertaken by MS. The removal of these spectra at the initial stages of the analysis substantially improves the confidence of putative biomarker identification and allows inter-experimental comparisons to be carried out with greater confidence.

Show MeSH
Related in: MedlinePlus