Limits...
Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics.

Cao M, Fraser K, Huege J, Featonby T, Rasmussen S, Jones C - Metabolomics (2014)

Bottom Line: However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task.We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study.The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library.

View Article: PubMed Central - PubMed

Affiliation: AgResearch Grasslands Research Centre, Palmerston North, 4442 New Zealand.

ABSTRACT

Liquid chromatography coupled to mass spectrometry (LCMS) is widely used in metabolomics due to its sensitivity, reproducibility, speed and versatility. Metabolites are detected as peaks which are characterised by mass-over-charge ratio (m/z) and retention time (rt), and one of the most critical but also the most challenging tasks in metabolomics is to annotate the large number of peaks detected in biological samples. Accurate m/z measurements enable the prediction of molecular formulae which provide clues to the chemical identity of peaks, but often a number of metabolites have identical molecular formulae. Chromatographic behaviour, reflecting the physicochemical properties of metabolites, should also provide structural information. However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task. To this end, we conducted Quantitative Structure-Retention Relationship (QSRR) modelling between the calculated molecular descriptors (MDs) and the experimental retention times (rts) of 93 authentic compounds analysed using hydrophilic interaction liquid chromatography (HILIC) coupled to high resolution MS. A predictive QSRR model based on Random Forests algorithm outperformed a Multiple Linear Regression based model, and achieved a high correlation between predicted rts and experimental rts (Pearson's correlation coefficient = 0.97), with mean and median absolute error of 0.52 min and 0.34 min (corresponding to 5.1 and 3.2 % error), respectively. We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study. The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library. The predicted rts were validated using either authentic compounds or ion fragmentation patterns.

No MeSH data available.


Related in: MedlinePlus

Correlation between the predicted retention time (rtPred, min) and the experimental retention time (rtRef, min) for the 93 reference compounds by the established models a Multiple Linear Regression (MLR) (r = 0.85), and b Random Forest (RF) model (r = 0.97)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4419193&req=5

Fig2: Correlation between the predicted retention time (rtPred, min) and the experimental retention time (rtRef, min) for the 93 reference compounds by the established models a Multiple Linear Regression (MLR) (r = 0.85), and b Random Forest (RF) model (r = 0.97)

Mentions: In addition to XLogP we performed a feature selection to determine if we could identify a set of MDs that could better explain the recorded rts of these standards. By exhaustive searching (branch-and-bound algorithm implemented in the “leaps” package) we conducted a model selection to find the best subset of MDs to predict rt in MLR. MLR models were evaluated based on four criteria including Mallow’s Cp and Akaike information criterion (AIC), Bayesian information criterion (BIC) and adjusted R2. Eleven MDs (model size) were selected as the best subset according to these four criteria (see Fig. S2). These 11 MDs (bpol, nHBDon, ATSc1, ATSp1, VP.0, fragC, VABC, VAdjMat, WPATH, WPOL, XLogP, see Data S2 for the details on the descriptors) were then utilized to construct the final predictive MLR model. A repeated 10-fold cross validation was applied to estimate prediction performance of the model. As a result, the mean accuracy of the model has an adjusted R2 of 0.64. The predicted rt (rtPred) correlated with the measured rts of the reference compounds (rtRef) with r = 0.85 (Fig. 2a). The absolute prediction error (/rtPred—rtRef/) has a mean of 0.95 and a median of 0.76 min, which is equivalent to 9.4 and 6.7 % in terms of percent relative error, respectively. Six MDs, XLogP, bpol, nHBDon, VP.0, fragC and WPATH were determined to be the most significant MDs (p values < 0.001) for predicting rt.Fig. 2


Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics.

Cao M, Fraser K, Huege J, Featonby T, Rasmussen S, Jones C - Metabolomics (2014)

Correlation between the predicted retention time (rtPred, min) and the experimental retention time (rtRef, min) for the 93 reference compounds by the established models a Multiple Linear Regression (MLR) (r = 0.85), and b Random Forest (RF) model (r = 0.97)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4419193&req=5

Fig2: Correlation between the predicted retention time (rtPred, min) and the experimental retention time (rtRef, min) for the 93 reference compounds by the established models a Multiple Linear Regression (MLR) (r = 0.85), and b Random Forest (RF) model (r = 0.97)
Mentions: In addition to XLogP we performed a feature selection to determine if we could identify a set of MDs that could better explain the recorded rts of these standards. By exhaustive searching (branch-and-bound algorithm implemented in the “leaps” package) we conducted a model selection to find the best subset of MDs to predict rt in MLR. MLR models were evaluated based on four criteria including Mallow’s Cp and Akaike information criterion (AIC), Bayesian information criterion (BIC) and adjusted R2. Eleven MDs (model size) were selected as the best subset according to these four criteria (see Fig. S2). These 11 MDs (bpol, nHBDon, ATSc1, ATSp1, VP.0, fragC, VABC, VAdjMat, WPATH, WPOL, XLogP, see Data S2 for the details on the descriptors) were then utilized to construct the final predictive MLR model. A repeated 10-fold cross validation was applied to estimate prediction performance of the model. As a result, the mean accuracy of the model has an adjusted R2 of 0.64. The predicted rt (rtPred) correlated with the measured rts of the reference compounds (rtRef) with r = 0.85 (Fig. 2a). The absolute prediction error (/rtPred—rtRef/) has a mean of 0.95 and a median of 0.76 min, which is equivalent to 9.4 and 6.7 % in terms of percent relative error, respectively. Six MDs, XLogP, bpol, nHBDon, VP.0, fragC and WPATH were determined to be the most significant MDs (p values < 0.001) for predicting rt.Fig. 2

Bottom Line: However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task.We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study.The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library.

View Article: PubMed Central - PubMed

Affiliation: AgResearch Grasslands Research Centre, Palmerston North, 4442 New Zealand.

ABSTRACT

Liquid chromatography coupled to mass spectrometry (LCMS) is widely used in metabolomics due to its sensitivity, reproducibility, speed and versatility. Metabolites are detected as peaks which are characterised by mass-over-charge ratio (m/z) and retention time (rt), and one of the most critical but also the most challenging tasks in metabolomics is to annotate the large number of peaks detected in biological samples. Accurate m/z measurements enable the prediction of molecular formulae which provide clues to the chemical identity of peaks, but often a number of metabolites have identical molecular formulae. Chromatographic behaviour, reflecting the physicochemical properties of metabolites, should also provide structural information. However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task. To this end, we conducted Quantitative Structure-Retention Relationship (QSRR) modelling between the calculated molecular descriptors (MDs) and the experimental retention times (rts) of 93 authentic compounds analysed using hydrophilic interaction liquid chromatography (HILIC) coupled to high resolution MS. A predictive QSRR model based on Random Forests algorithm outperformed a Multiple Linear Regression based model, and achieved a high correlation between predicted rts and experimental rts (Pearson's correlation coefficient = 0.97), with mean and median absolute error of 0.52 min and 0.34 min (corresponding to 5.1 and 3.2 % error), respectively. We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study. The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library. The predicted rts were validated using either authentic compounds or ion fragmentation patterns.

No MeSH data available.


Related in: MedlinePlus