Limits...
Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities.

Peabody MA, Van Rossum T, Lo R, Brinkman FS - BMC Bioinformatics (2015)

Bottom Line: Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities.A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated.In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada. map1@sfu.ca.

ABSTRACT

Background: The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method's accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions.

Results: An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class-identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed.

Conclusions: The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis.

No MeSH data available.


Related in: MedlinePlus

Performance as clade exclusion level is varied. Sensitivity (a) and precision (b) on the MetaSimHC dataset of simulated 250 bp reads. There is a wide range of variability in the sensitivity and precision of the methods with sensitivity tending to decrease as the level of clade exclusion moves from species to class. Performance is calculated based on proportion of reads appropriately assigned and averaged per genome (see Methods)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4634789&req=5

Fig1: Performance as clade exclusion level is varied. Sensitivity (a) and precision (b) on the MetaSimHC dataset of simulated 250 bp reads. There is a wide range of variability in the sensitivity and precision of the methods with sensitivity tending to decrease as the level of clade exclusion moves from species to class. Performance is calculated based on proportion of reads appropriately assigned and averaged per genome (see Methods)

Mentions: The quality of the assignments made by the different methods was further examined under clade exclusion scenarios at different taxonomic levels. Sensitivity and precision were computed on the MetaSimHC dataset (Fig. 1) and found to vary notably. To examine in greater detail what led to the differences in sensitivity and precision of these methods, the taxonomic distance for each method was evaluated (Additional file 2: Figure S3). Furthermore, the proportion of reads assigned at each taxonomic rank was determined. An example of the results under the genus clade exclusion scenario is shown in Fig. 2, with the data for the rest in Additional file 3. Additionally, the numbers of reads miss-assigned and correctly assigned or overpredicted for each rank were compiled (genus clade exclusion Additional file 2: Figure S4, the rest of the data in Additional file 4). Many of the methods assign a considerable proportion of reads to the species level, when species level assignment is impossible since they are excluded from the database. Also notable is that TACOA assigns the majority of reads to the superkingdom level, so the method will be of limited use for those interested in more specific taxonomic ranks, at least at these shorter read lengths.Fig. 1


Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities.

Peabody MA, Van Rossum T, Lo R, Brinkman FS - BMC Bioinformatics (2015)

Performance as clade exclusion level is varied. Sensitivity (a) and precision (b) on the MetaSimHC dataset of simulated 250 bp reads. There is a wide range of variability in the sensitivity and precision of the methods with sensitivity tending to decrease as the level of clade exclusion moves from species to class. Performance is calculated based on proportion of reads appropriately assigned and averaged per genome (see Methods)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4634789&req=5

Fig1: Performance as clade exclusion level is varied. Sensitivity (a) and precision (b) on the MetaSimHC dataset of simulated 250 bp reads. There is a wide range of variability in the sensitivity and precision of the methods with sensitivity tending to decrease as the level of clade exclusion moves from species to class. Performance is calculated based on proportion of reads appropriately assigned and averaged per genome (see Methods)
Mentions: The quality of the assignments made by the different methods was further examined under clade exclusion scenarios at different taxonomic levels. Sensitivity and precision were computed on the MetaSimHC dataset (Fig. 1) and found to vary notably. To examine in greater detail what led to the differences in sensitivity and precision of these methods, the taxonomic distance for each method was evaluated (Additional file 2: Figure S3). Furthermore, the proportion of reads assigned at each taxonomic rank was determined. An example of the results under the genus clade exclusion scenario is shown in Fig. 2, with the data for the rest in Additional file 3. Additionally, the numbers of reads miss-assigned and correctly assigned or overpredicted for each rank were compiled (genus clade exclusion Additional file 2: Figure S4, the rest of the data in Additional file 4). Many of the methods assign a considerable proportion of reads to the species level, when species level assignment is impossible since they are excluded from the database. Also notable is that TACOA assigns the majority of reads to the superkingdom level, so the method will be of limited use for those interested in more specific taxonomic ranks, at least at these shorter read lengths.Fig. 1

Bottom Line: Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities.A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated.In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada. map1@sfu.ca.

ABSTRACT

Background: The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method's accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions.

Results: An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class-identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed.

Conclusions: The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis.

No MeSH data available.


Related in: MedlinePlus