Limits...
Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes.

Nayfach S, Bradley PH, Wyman SK, Laurent TJ, Williams A, Eisen JA, Pollard KS, Sharpton TJ - PLoS Comput. Biol. (2015)

Bottom Line: However, little is known about how decisions made during annotation affect the reliability of the results.We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP).We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease.

View Article: PubMed Central - PubMed

Affiliation: Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America.

ABSTRACT
Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn's disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.

Show MeSH

Related in: MedlinePlus

Shallow sequencing enables accurate estimates of alpha and beta functional diversity.(A) Relative abundance error for 101-bp Illumina metagenomes from 10 mock communities using between 10,000 and 1 million reads. (B) Relative abundance error for a 101-bp Illumina metagenomes from mock community 160319967-stool1 using between 10,000 and 100 million reads. (C) Expected versus observed functional distances for 10 mock communities using between 10,000 and 1.5 million 101-bp Illumina reads. (D) Distributions of Bray-Curtis dissimilarity error at each sequencing depth from (C).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4643905&req=5

pcbi.1004573.g005: Shallow sequencing enables accurate estimates of alpha and beta functional diversity.(A) Relative abundance error for 101-bp Illumina metagenomes from 10 mock communities using between 10,000 and 1 million reads. (B) Relative abundance error for a 101-bp Illumina metagenomes from mock community 160319967-stool1 using between 10,000 and 100 million reads. (C) Expected versus observed functional distances for 10 mock communities using between 10,000 and 1.5 million 101-bp Illumina reads. (D) Distributions of Bray-Curtis dissimilarity error at each sequencing depth from (C).

Mentions: First, we determined the minimum number of reads for accurate estimates of within-sample protein family relative abundance. To address this question, we rarefied reads from our simulated 101 bp Illumina metagenomes, using between 10,000 and 1 million reads. At each sequencing depth, we used sampled reads to estimate the relative abundance of SFams, and compared these estimates to expected values using an L1 distance. As anticipated, we found that L1 error decreased with increasing sequencing depth and appeared to be close to an asymptote at a depth of ~1 million reads (Fig 5A). While relative abundance error varied between mock communities, it appeared to reach an asymptote for all ten communities analyzed (Fig 5A). To investigate this further, we increased sequencing depth by two orders of magnitude to 100 million reads for one of the mock communities. With this massive increase in sequencing depth, there was only a marginal reduction in relative abundance error, from 0.13 to 0.11 (Fig 5B).


Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes.

Nayfach S, Bradley PH, Wyman SK, Laurent TJ, Williams A, Eisen JA, Pollard KS, Sharpton TJ - PLoS Comput. Biol. (2015)

Shallow sequencing enables accurate estimates of alpha and beta functional diversity.(A) Relative abundance error for 101-bp Illumina metagenomes from 10 mock communities using between 10,000 and 1 million reads. (B) Relative abundance error for a 101-bp Illumina metagenomes from mock community 160319967-stool1 using between 10,000 and 100 million reads. (C) Expected versus observed functional distances for 10 mock communities using between 10,000 and 1.5 million 101-bp Illumina reads. (D) Distributions of Bray-Curtis dissimilarity error at each sequencing depth from (C).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4643905&req=5

pcbi.1004573.g005: Shallow sequencing enables accurate estimates of alpha and beta functional diversity.(A) Relative abundance error for 101-bp Illumina metagenomes from 10 mock communities using between 10,000 and 1 million reads. (B) Relative abundance error for a 101-bp Illumina metagenomes from mock community 160319967-stool1 using between 10,000 and 100 million reads. (C) Expected versus observed functional distances for 10 mock communities using between 10,000 and 1.5 million 101-bp Illumina reads. (D) Distributions of Bray-Curtis dissimilarity error at each sequencing depth from (C).
Mentions: First, we determined the minimum number of reads for accurate estimates of within-sample protein family relative abundance. To address this question, we rarefied reads from our simulated 101 bp Illumina metagenomes, using between 10,000 and 1 million reads. At each sequencing depth, we used sampled reads to estimate the relative abundance of SFams, and compared these estimates to expected values using an L1 distance. As anticipated, we found that L1 error decreased with increasing sequencing depth and appeared to be close to an asymptote at a depth of ~1 million reads (Fig 5A). While relative abundance error varied between mock communities, it appeared to reach an asymptote for all ten communities analyzed (Fig 5A). To investigate this further, we increased sequencing depth by two orders of magnitude to 100 million reads for one of the mock communities. With this massive increase in sequencing depth, there was only a marginal reduction in relative abundance error, from 0.13 to 0.11 (Fig 5B).

Bottom Line: However, little is known about how decisions made during annotation affect the reliability of the results.We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP).We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease.

View Article: PubMed Central - PubMed

Affiliation: Gladstone Institute of Cardiovascular Disease, San Francisco, California, United States of America.

ABSTRACT
Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn's disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.

Show MeSH
Related in: MedlinePlus