Limits...
A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments.

Robinson MD, De Souza DP, Keen WW, Saunders EC, McConville MJ, Speed TP, Likić VA - BMC Bioinformatics (2007)

Bottom Line: When two or more experiments are performed on different sample states and each consisting of multiple replicates, peak lists within each set of replicate experiments are aligned first (within-state alignment), and subsequently the resulting alignments are aligned themselves (between-state alignment).This approach can produce the optimal alignment between an arbitrary number of peak lists, and models explicitly within-state and between-state peak alignment.The proposed approach may offer significant advantages for processing of high-throughput metabolomics data, especially when large numbers of experimental replicates and multiple sample states are analyzed.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3050, Australia. mrobinson@wehi.edu.au

ABSTRACT

Background: Gas chromatography-mass spectrometry (GC-MS) is a robust platform for the profiling of certain classes of small molecules in biological samples. When multiple samples are profiled, including replicates of the same sample and/or different sample states, one needs to account for retention time drifts between experiments. This can be achieved either by the alignment of chromatographic profiles prior to peak detection, or by matching signal peaks after they have been extracted from chromatogram data matrices. Automated retention time correction is particularly important in non-targeted profiling studies.

Results: A new approach for matching signal peaks based on dynamic programming is presented. The proposed approach relies on both peak retention times and mass spectra. The alignment of more than two peak lists involves three steps: (1) all possible pairs of peak lists are aligned, and similarity of each pair of peak lists is estimated; (2) the guide tree is built based on the similarity between the peak lists; (3) peak lists are progressively aligned starting with the two most similar peak lists, following the guide tree until all peak lists are exhausted. When two or more experiments are performed on different sample states and each consisting of multiple replicates, peak lists within each set of replicate experiments are aligned first (within-state alignment), and subsequently the resulting alignments are aligned themselves (between-state alignment). When more than two sets of replicate experiments are present, the between-state alignment also employs the guide tree. We demonstrate the usefulness of this approach on GC-MS metabolic profiling experiments acquired on wild-type and mutant Leishmania mexicana parasites.

Conclusion: We propose a progressive method to match signal peaks across multiple GC-MS experiments based on dynamic programming. A sensitive peak similarity function is proposed to balance peak retention time and peak mass spectra similarities. This approach can produce the optimal alignment between an arbitrary number of peak lists, and models explicitly within-state and between-state peak alignment. The accuracy of the proposed method was close to the accuracy of manually-curated peak matching, which required tens of man-hours for the analyzed data sets. The proposed approach may offer significant advantages for processing of high-throughput metabolomics data, especially when large numbers of experimental replicates and multiple sample states are analyzed.

Show MeSH

Related in: MedlinePlus

Errors in peak alignment. A portion of the hypothetical alignment table with two types of errors that occur in within-state peak alignment highlighted. The rows of the table represent metabolites and column represent individual experiments. The numbers shown are peak retention times in minutes. Panel (a) shows the type A error (peak mixing) where one or more peaks are shifted to an incorrect metabolite row. Panel (b) shows the type B error (metabolite splitting) where the metabolite row is split to create an artificial metabolite in the alignment table. The condition of minimum peaks is often imposed in practice (see Methods), in which case this type of error results in the deletion of the artificial metabolite (nevertheless the original metabolite is affected as it contains one or more missing peaks). If in the example shown on panel (b) it is assumed that the minimum peak cut-off is three peaks, and therefore both metabolite1 and metabolite2 will be deleted from the alignment table as the net result of the splitting error and the removal of spurious metabolite rows.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194738&req=5

Figure 5: Errors in peak alignment. A portion of the hypothetical alignment table with two types of errors that occur in within-state peak alignment highlighted. The rows of the table represent metabolites and column represent individual experiments. The numbers shown are peak retention times in minutes. Panel (a) shows the type A error (peak mixing) where one or more peaks are shifted to an incorrect metabolite row. Panel (b) shows the type B error (metabolite splitting) where the metabolite row is split to create an artificial metabolite in the alignment table. The condition of minimum peaks is often imposed in practice (see Methods), in which case this type of error results in the deletion of the artificial metabolite (nevertheless the original metabolite is affected as it contains one or more missing peaks). If in the example shown on panel (b) it is assumed that the minimum peak cut-off is three peaks, and therefore both metabolite1 and metabolite2 will be deleted from the alignment table as the net result of the splitting error and the removal of spurious metabolite rows.

Mentions: To assess the accuracy of the proposed approach a series of alignments was constructed by applying dynamic programming alignment with different input parameters. The resulting alignment tables were compared to the correct alignment table and analyzed for errors. Two types of errors were observed (Figure 5): peak mixing (type A error) and metabolite splitting (type B error). In peak mixing errors one or more peaks are shifted to a different metabolite row to take a position of a missing peak. Figure 5(a) shows the simplest case of one metabolite being shifted from metabolite-1 to a metabolite-2 row. In practice mixing can involve more than two metabolite rows. In metabolite splitting, one or more peaks are moved to create an extra metabolite row (i.e. metabolite 3 in Figure 5(b)). As a consequence of the removal of spurious metabolites (see Methods) most artificially created metabolite rows due to metabolite splitting will be discarded. The only exception would be metabolites with all eight peaks present, subject to the splitting that has resulted in two metabolite rows with exactly four peaks per row.


A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments.

Robinson MD, De Souza DP, Keen WW, Saunders EC, McConville MJ, Speed TP, Likić VA - BMC Bioinformatics (2007)

Errors in peak alignment. A portion of the hypothetical alignment table with two types of errors that occur in within-state peak alignment highlighted. The rows of the table represent metabolites and column represent individual experiments. The numbers shown are peak retention times in minutes. Panel (a) shows the type A error (peak mixing) where one or more peaks are shifted to an incorrect metabolite row. Panel (b) shows the type B error (metabolite splitting) where the metabolite row is split to create an artificial metabolite in the alignment table. The condition of minimum peaks is often imposed in practice (see Methods), in which case this type of error results in the deletion of the artificial metabolite (nevertheless the original metabolite is affected as it contains one or more missing peaks). If in the example shown on panel (b) it is assumed that the minimum peak cut-off is three peaks, and therefore both metabolite1 and metabolite2 will be deleted from the alignment table as the net result of the splitting error and the removal of spurious metabolite rows.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194738&req=5

Figure 5: Errors in peak alignment. A portion of the hypothetical alignment table with two types of errors that occur in within-state peak alignment highlighted. The rows of the table represent metabolites and column represent individual experiments. The numbers shown are peak retention times in minutes. Panel (a) shows the type A error (peak mixing) where one or more peaks are shifted to an incorrect metabolite row. Panel (b) shows the type B error (metabolite splitting) where the metabolite row is split to create an artificial metabolite in the alignment table. The condition of minimum peaks is often imposed in practice (see Methods), in which case this type of error results in the deletion of the artificial metabolite (nevertheless the original metabolite is affected as it contains one or more missing peaks). If in the example shown on panel (b) it is assumed that the minimum peak cut-off is three peaks, and therefore both metabolite1 and metabolite2 will be deleted from the alignment table as the net result of the splitting error and the removal of spurious metabolite rows.
Mentions: To assess the accuracy of the proposed approach a series of alignments was constructed by applying dynamic programming alignment with different input parameters. The resulting alignment tables were compared to the correct alignment table and analyzed for errors. Two types of errors were observed (Figure 5): peak mixing (type A error) and metabolite splitting (type B error). In peak mixing errors one or more peaks are shifted to a different metabolite row to take a position of a missing peak. Figure 5(a) shows the simplest case of one metabolite being shifted from metabolite-1 to a metabolite-2 row. In practice mixing can involve more than two metabolite rows. In metabolite splitting, one or more peaks are moved to create an extra metabolite row (i.e. metabolite 3 in Figure 5(b)). As a consequence of the removal of spurious metabolites (see Methods) most artificially created metabolite rows due to metabolite splitting will be discarded. The only exception would be metabolites with all eight peaks present, subject to the splitting that has resulted in two metabolite rows with exactly four peaks per row.

Bottom Line: When two or more experiments are performed on different sample states and each consisting of multiple replicates, peak lists within each set of replicate experiments are aligned first (within-state alignment), and subsequently the resulting alignments are aligned themselves (between-state alignment).This approach can produce the optimal alignment between an arbitrary number of peak lists, and models explicitly within-state and between-state peak alignment.The proposed approach may offer significant advantages for processing of high-throughput metabolomics data, especially when large numbers of experimental replicates and multiple sample states are analyzed.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3050, Australia. mrobinson@wehi.edu.au

ABSTRACT

Background: Gas chromatography-mass spectrometry (GC-MS) is a robust platform for the profiling of certain classes of small molecules in biological samples. When multiple samples are profiled, including replicates of the same sample and/or different sample states, one needs to account for retention time drifts between experiments. This can be achieved either by the alignment of chromatographic profiles prior to peak detection, or by matching signal peaks after they have been extracted from chromatogram data matrices. Automated retention time correction is particularly important in non-targeted profiling studies.

Results: A new approach for matching signal peaks based on dynamic programming is presented. The proposed approach relies on both peak retention times and mass spectra. The alignment of more than two peak lists involves three steps: (1) all possible pairs of peak lists are aligned, and similarity of each pair of peak lists is estimated; (2) the guide tree is built based on the similarity between the peak lists; (3) peak lists are progressively aligned starting with the two most similar peak lists, following the guide tree until all peak lists are exhausted. When two or more experiments are performed on different sample states and each consisting of multiple replicates, peak lists within each set of replicate experiments are aligned first (within-state alignment), and subsequently the resulting alignments are aligned themselves (between-state alignment). When more than two sets of replicate experiments are present, the between-state alignment also employs the guide tree. We demonstrate the usefulness of this approach on GC-MS metabolic profiling experiments acquired on wild-type and mutant Leishmania mexicana parasites.

Conclusion: We propose a progressive method to match signal peaks across multiple GC-MS experiments based on dynamic programming. A sensitive peak similarity function is proposed to balance peak retention time and peak mass spectra similarities. This approach can produce the optimal alignment between an arbitrary number of peak lists, and models explicitly within-state and between-state peak alignment. The accuracy of the proposed method was close to the accuracy of manually-curated peak matching, which required tens of man-hours for the analyzed data sets. The proposed approach may offer significant advantages for processing of high-throughput metabolomics data, especially when large numbers of experimental replicates and multiple sample states are analyzed.

Show MeSH
Related in: MedlinePlus