Limits...
Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics.

Cao M, Fraser K, Huege J, Featonby T, Rasmussen S, Jones C - Metabolomics (2014)

Bottom Line: However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task.We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study.The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library.

View Article: PubMed Central - PubMed

Affiliation: AgResearch Grasslands Research Centre, Palmerston North, 4442 New Zealand.

ABSTRACT

Liquid chromatography coupled to mass spectrometry (LCMS) is widely used in metabolomics due to its sensitivity, reproducibility, speed and versatility. Metabolites are detected as peaks which are characterised by mass-over-charge ratio (m/z) and retention time (rt), and one of the most critical but also the most challenging tasks in metabolomics is to annotate the large number of peaks detected in biological samples. Accurate m/z measurements enable the prediction of molecular formulae which provide clues to the chemical identity of peaks, but often a number of metabolites have identical molecular formulae. Chromatographic behaviour, reflecting the physicochemical properties of metabolites, should also provide structural information. However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task. To this end, we conducted Quantitative Structure-Retention Relationship (QSRR) modelling between the calculated molecular descriptors (MDs) and the experimental retention times (rts) of 93 authentic compounds analysed using hydrophilic interaction liquid chromatography (HILIC) coupled to high resolution MS. A predictive QSRR model based on Random Forests algorithm outperformed a Multiple Linear Regression based model, and achieved a high correlation between predicted rts and experimental rts (Pearson's correlation coefficient = 0.97), with mean and median absolute error of 0.52 min and 0.34 min (corresponding to 5.1 and 3.2 % error), respectively. We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study. The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library. The predicted rts were validated using either authentic compounds or ion fragmentation patterns.

No MeSH data available.


Related in: MedlinePlus

The smoothed XIC of m/z 166.0532 ± 20 ppm from the eight samples. The boxplot shown (a) was based on the normalised peak heights from wavelet-based peak detection. Histogram (b) of the predicted retention time (pRT) of 216 PubChem compounds with the same chemical formula of C5H11NO3S
© Copyright Policy - OpenAccess
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4419193&req=5

Fig3: The smoothed XIC of m/z 166.0532 ± 20 ppm from the eight samples. The boxplot shown (a) was based on the normalised peak heights from wavelet-based peak detection. Histogram (b) of the predicted retention time (pRT) of 216 PubChem compounds with the same chemical formula of C5H11NO3S

Mentions: In the first scenario we show that our model can help reduce false positives considerably. Peak 166.0532/12.50 (mz/rt) was one of the significant peaks (Kruskal test, p value <0.05) identified in L. perenne blade tissue in response to drought (Fig. 3a). Assessment of the mass spectra indicated that this is a singly charged species ([M+H]+) with m/z of 166.0530. We undertook chemical formula prediction of mass 165.0457 (in its neutral form). When C, H, N, O, S, and P were included in the element search list and a few empirical rules such as H/C ratios and isotopic ratio filtering, were implemented (Kind and Fiehn 2007), C5H11NO3S was the only candidate molecular formula for the accurate mass (see Data S3). However, a search of the formula in PubChem resulted in 269 compounds, preventing further annotation of this formula. The RF-based rt prediction model was therefore used to narrow down the candidates. After the disconnected SMILES forms such as “C1CCS(=O)(=O)C1.C(=O)N” (separated by a period ‘.’) and redundant SMILES were removed, 216 compounds remained for rt prediction. The prediction results are summarized in Fig. 3b, only two compounds, methionine sulfoxide (cid 847) and ethiin (cid 146416), with a predicted rt of 12.67 and 12.59 min, respectively, matched this peak at 12.50 min (±0.68). The two compounds are also recorded in the PlantCyc compound database suggesting their involvement in plant metabolism. We conducted an independent validation experiment (Method S1, Fig. S5) by spiking the authentic compound methionine sulfoxide (ethiin was not available for purchase) into a ryegrass extract, showing that the rt of the standard was 12.81 min (Data S3), thus enabling the peak of 166.0532/12.50 to be annotated as methionine sulfoxide or ethiin.Fig. 3


Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics.

Cao M, Fraser K, Huege J, Featonby T, Rasmussen S, Jones C - Metabolomics (2014)

The smoothed XIC of m/z 166.0532 ± 20 ppm from the eight samples. The boxplot shown (a) was based on the normalised peak heights from wavelet-based peak detection. Histogram (b) of the predicted retention time (pRT) of 216 PubChem compounds with the same chemical formula of C5H11NO3S
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4419193&req=5

Fig3: The smoothed XIC of m/z 166.0532 ± 20 ppm from the eight samples. The boxplot shown (a) was based on the normalised peak heights from wavelet-based peak detection. Histogram (b) of the predicted retention time (pRT) of 216 PubChem compounds with the same chemical formula of C5H11NO3S
Mentions: In the first scenario we show that our model can help reduce false positives considerably. Peak 166.0532/12.50 (mz/rt) was one of the significant peaks (Kruskal test, p value <0.05) identified in L. perenne blade tissue in response to drought (Fig. 3a). Assessment of the mass spectra indicated that this is a singly charged species ([M+H]+) with m/z of 166.0530. We undertook chemical formula prediction of mass 165.0457 (in its neutral form). When C, H, N, O, S, and P were included in the element search list and a few empirical rules such as H/C ratios and isotopic ratio filtering, were implemented (Kind and Fiehn 2007), C5H11NO3S was the only candidate molecular formula for the accurate mass (see Data S3). However, a search of the formula in PubChem resulted in 269 compounds, preventing further annotation of this formula. The RF-based rt prediction model was therefore used to narrow down the candidates. After the disconnected SMILES forms such as “C1CCS(=O)(=O)C1.C(=O)N” (separated by a period ‘.’) and redundant SMILES were removed, 216 compounds remained for rt prediction. The prediction results are summarized in Fig. 3b, only two compounds, methionine sulfoxide (cid 847) and ethiin (cid 146416), with a predicted rt of 12.67 and 12.59 min, respectively, matched this peak at 12.50 min (±0.68). The two compounds are also recorded in the PlantCyc compound database suggesting their involvement in plant metabolism. We conducted an independent validation experiment (Method S1, Fig. S5) by spiking the authentic compound methionine sulfoxide (ethiin was not available for purchase) into a ryegrass extract, showing that the rt of the standard was 12.81 min (Data S3), thus enabling the peak of 166.0532/12.50 to be annotated as methionine sulfoxide or ethiin.Fig. 3

Bottom Line: However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task.We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study.The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library.

View Article: PubMed Central - PubMed

Affiliation: AgResearch Grasslands Research Centre, Palmerston North, 4442 New Zealand.

ABSTRACT

Liquid chromatography coupled to mass spectrometry (LCMS) is widely used in metabolomics due to its sensitivity, reproducibility, speed and versatility. Metabolites are detected as peaks which are characterised by mass-over-charge ratio (m/z) and retention time (rt), and one of the most critical but also the most challenging tasks in metabolomics is to annotate the large number of peaks detected in biological samples. Accurate m/z measurements enable the prediction of molecular formulae which provide clues to the chemical identity of peaks, but often a number of metabolites have identical molecular formulae. Chromatographic behaviour, reflecting the physicochemical properties of metabolites, should also provide structural information. However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task. To this end, we conducted Quantitative Structure-Retention Relationship (QSRR) modelling between the calculated molecular descriptors (MDs) and the experimental retention times (rts) of 93 authentic compounds analysed using hydrophilic interaction liquid chromatography (HILIC) coupled to high resolution MS. A predictive QSRR model based on Random Forests algorithm outperformed a Multiple Linear Regression based model, and achieved a high correlation between predicted rts and experimental rts (Pearson's correlation coefficient = 0.97), with mean and median absolute error of 0.52 min and 0.34 min (corresponding to 5.1 and 3.2 % error), respectively. We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study. The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library. The predicted rts were validated using either authentic compounds or ion fragmentation patterns.

No MeSH data available.


Related in: MedlinePlus