Limits...
Time-series alignment by non-negative multiple generalized canonical correlation analysis.

Fischer B, Roth V, Buhmann JM - BMC Bioinformatics (2007)

Bottom Line: The alignment function is learned in a supervised fashion.We compare our approach with previously published methods for aligning mass spectrometry data on a large proteomics dataset.The proposed method significantly increases the number of proteins that are identified as being differentially expressed in different biological samples.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Computational Science, ETH Zurich, Switzerland. bernd.fischer@inf.ethz.ch

ABSTRACT

Background: Quantitative analysis of differential protein expressions requires to align temporal elution measurements from liquid chromatography coupled to mass spectrometry (LC/MS). We propose multiple Canonical Correlation Analysis (mCCA) as a method to align the non-linearly distorted time scales of repeated LC/MS experiments in a robust way.

Results: Multiple canonical correlation analysis is able to map several time series to a consensus time scale. The alignment function is learned in a supervised fashion. We compare our approach with previously published methods for aligning mass spectrometry data on a large proteomics dataset. The proposed method significantly increases the number of proteins that are identified as being differentially expressed in different biological samples.

Conclusion: Jointly aligning multiple liquid chromatography/mass spectrometry samples by mCCA substantially increases the detection rate of potential bio-markers which significantly improves the interpretability of LC/MS data.

Show MeSH
Precision-Recall-Curve for the labeled peaks. "Ridge-regression" refers to the method proposed in [6].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2230505&req=5

Figure 2: Precision-Recall-Curve for the labeled peaks. "Ridge-regression" refers to the method proposed in [6].

Mentions: The recall is defined as the number of peaks that are assigned to a peak with the same peptide sequence relative to the total number of (labeled) peaks. Each labeled peak can either be assigned to a peak correctly, to a wrong peak or to no peak. The precision value is the number of peaks that are assigned to the correct peak among the set of peaks that could be assigned to any other peak (excluding the peaks that could not be assigned: in Equation 8). In Figure 2 the precision-recall curves are plotted. We conclude that robust multiple CCA outperforms robust ridge regression consistently by more than five percent in recall for a given precision value. The thin plate splines perform much worse than the robust mCC and robust ridge regression. The runtime for the different methods for the whole dataset are 33 sec. for robust ridge regression, 6 min. 28 sec. for the robust mCCA and 19 hours 45 min. for the thin plate spline implementation by Kirchner [5]. The runtime for the thin plate splines are only for one parameter setting whereas the runtime for ridge regression and CCA includes a model selection over ten different parameter (polynomial degree and σ of hyperbolic tangent functions). There possibly exists a better parameter choice for the thin plate splines, but due to the enormous runtime, we could only select the parameters on a small sized example.


Time-series alignment by non-negative multiple generalized canonical correlation analysis.

Fischer B, Roth V, Buhmann JM - BMC Bioinformatics (2007)

Precision-Recall-Curve for the labeled peaks. "Ridge-regression" refers to the method proposed in [6].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2230505&req=5

Figure 2: Precision-Recall-Curve for the labeled peaks. "Ridge-regression" refers to the method proposed in [6].
Mentions: The recall is defined as the number of peaks that are assigned to a peak with the same peptide sequence relative to the total number of (labeled) peaks. Each labeled peak can either be assigned to a peak correctly, to a wrong peak or to no peak. The precision value is the number of peaks that are assigned to the correct peak among the set of peaks that could be assigned to any other peak (excluding the peaks that could not be assigned: in Equation 8). In Figure 2 the precision-recall curves are plotted. We conclude that robust multiple CCA outperforms robust ridge regression consistently by more than five percent in recall for a given precision value. The thin plate splines perform much worse than the robust mCC and robust ridge regression. The runtime for the different methods for the whole dataset are 33 sec. for robust ridge regression, 6 min. 28 sec. for the robust mCCA and 19 hours 45 min. for the thin plate spline implementation by Kirchner [5]. The runtime for the thin plate splines are only for one parameter setting whereas the runtime for ridge regression and CCA includes a model selection over ten different parameter (polynomial degree and σ of hyperbolic tangent functions). There possibly exists a better parameter choice for the thin plate splines, but due to the enormous runtime, we could only select the parameters on a small sized example.

Bottom Line: The alignment function is learned in a supervised fashion.We compare our approach with previously published methods for aligning mass spectrometry data on a large proteomics dataset.The proposed method significantly increases the number of proteins that are identified as being differentially expressed in different biological samples.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Computational Science, ETH Zurich, Switzerland. bernd.fischer@inf.ethz.ch

ABSTRACT

Background: Quantitative analysis of differential protein expressions requires to align temporal elution measurements from liquid chromatography coupled to mass spectrometry (LC/MS). We propose multiple Canonical Correlation Analysis (mCCA) as a method to align the non-linearly distorted time scales of repeated LC/MS experiments in a robust way.

Results: Multiple canonical correlation analysis is able to map several time series to a consensus time scale. The alignment function is learned in a supervised fashion. We compare our approach with previously published methods for aligning mass spectrometry data on a large proteomics dataset. The proposed method significantly increases the number of proteins that are identified as being differentially expressed in different biological samples.

Conclusion: Jointly aligning multiple liquid chromatography/mass spectrometry samples by mCCA substantially increases the detection rate of potential bio-markers which significantly improves the interpretability of LC/MS data.

Show MeSH