Limits...
Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics.

Jovel J, Patterson J, Wang W, Hotte N, O'Keefe S, Mitchel T, Perry T, Kao D, Mason AL, Madsen KL, Wong GK - Front Microbiol (2016)

Bottom Line: The two main approaches for analyzing the microbiome, 16S ribosomal RNA (rRNA) gene amplicons and shotgun metagenomics, are illustrated with analyses of libraries designed to highlight their strengths and weaknesses.To the extent that fluctuations in the diversity of gut bacterial populations correlate with health and disease, we emphasize various techniques for the analysis of bacterial communities within samples (α-diversity) and between samples (β-diversity).Finally, we demonstrate techniques to infer the metabolic capabilities of a bacteria community from these 16S and shotgun data.

View Article: PubMed Central - PubMed

Affiliation: Department of Medicine, University of Alberta Edmonton, AB, Canada.

ABSTRACT
The advent of next generation sequencing (NGS) has enabled investigations of the gut microbiome with unprecedented resolution and throughput. This has stimulated the development of sophisticated bioinformatics tools to analyze the massive amounts of data generated. Researchers therefore need a clear understanding of the key concepts required for the design, execution and interpretation of NGS experiments on microbiomes. We conducted a literature review and used our own data to determine which approaches work best. The two main approaches for analyzing the microbiome, 16S ribosomal RNA (rRNA) gene amplicons and shotgun metagenomics, are illustrated with analyses of libraries designed to highlight their strengths and weaknesses. Several methods for taxonomic classification of bacterial sequences are discussed. We present simulations to assess the number of sequences that are required to perform reliable appraisals of bacterial community structure. To the extent that fluctuations in the diversity of gut bacterial populations correlate with health and disease, we emphasize various techniques for the analysis of bacterial communities within samples (α-diversity) and between samples (β-diversity). Finally, we demonstrate techniques to infer the metabolic capabilities of a bacteria community from these 16S and shotgun data.

No MeSH data available.


Related in: MedlinePlus

Number of sequences required for taxonomic classification of samples with varying diversity. A series of samples were chosen to assess the effect of library complexity on the accuracy of taxonomy assignments and estimation of diversity of bacterial populations. Kefir represents the lowest point in the bacterial diversity spectrum, followed by a patient affected by Crohn's disease (CD), another one recovered from C. difficile infection (C. diff), a healthy individual (Hthy1) and three artificial mixes of bacteria (Mix7-9). (A,B) Libraries were randomly sampled at depths of 500, 1000, 5000, 10,000, 50,000 and 100,000 reads. End1 16S rRNA gene sequences were classified with QIIME using the closed reference method to cluster OTUs and a similarity threshold of 97%. Paired-end shotgun metagenomics sequences were aligned with LAST and taxonomically classified with MEGAN5. Each random sampling was repeated 20 times. As an example, the relative abundance of taxa for one of these samplings at a depth of 1000 or 50,000 sequences is presented for 16S and shotgun metagenomics libraries. A white asterisk indicates a group of bacterial sequences identified as Citrobacter in the shotgun panel and Klebsiella in the 16S panel. Bifidobacterium is indicated with a white plus sign. Propionibacterium is indicated with a white circle. (C,D) For each taxa detected and for each random sample, the proportion error was calculated as the difference between the proportion that each taxon represented in the whole library (i.e., with the maximum number of reads) and in the random sample. This difference was weighted by the proportion that each taxon represented in the whole library. We present the arithmetic mean of all weighted differences for each of the 20 random samples.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4837688&req=5

Figure 3: Number of sequences required for taxonomic classification of samples with varying diversity. A series of samples were chosen to assess the effect of library complexity on the accuracy of taxonomy assignments and estimation of diversity of bacterial populations. Kefir represents the lowest point in the bacterial diversity spectrum, followed by a patient affected by Crohn's disease (CD), another one recovered from C. difficile infection (C. diff), a healthy individual (Hthy1) and three artificial mixes of bacteria (Mix7-9). (A,B) Libraries were randomly sampled at depths of 500, 1000, 5000, 10,000, 50,000 and 100,000 reads. End1 16S rRNA gene sequences were classified with QIIME using the closed reference method to cluster OTUs and a similarity threshold of 97%. Paired-end shotgun metagenomics sequences were aligned with LAST and taxonomically classified with MEGAN5. Each random sampling was repeated 20 times. As an example, the relative abundance of taxa for one of these samplings at a depth of 1000 or 50,000 sequences is presented for 16S and shotgun metagenomics libraries. A white asterisk indicates a group of bacterial sequences identified as Citrobacter in the shotgun panel and Klebsiella in the 16S panel. Bifidobacterium is indicated with a white plus sign. Propionibacterium is indicated with a white circle. (C,D) For each taxa detected and for each random sample, the proportion error was calculated as the difference between the proportion that each taxon represented in the whole library (i.e., with the maximum number of reads) and in the random sample. This difference was weighted by the proportion that each taxon represented in the whole library. We present the arithmetic mean of all weighted differences for each of the 20 random samples.

Mentions: To investigate the minimal sequencing depth sufficient for accurately profiling bacterial community composition, we randomly sampled our libraries at depths of 500, 1000, 5000, 10,000, 50,000, and 100,000 reads. At each depth, sampling and analyses were repeated 20 times. As an example, we show that the taxonomic classification for each type of library at sequencing depths of 1000 and 50,000 was surprisingly consistent (Figures 3A,B). It is expected that taxonomic classification performed with each method will be to some extent divergent, as the resolution of the sequences used for taxonomic assignments is distinct and variable depending on which region of the genome is captured in shotgun surveys, which variable region of the 16S rRNA gene is used, and which composition of species is present in the community under analysis. However, the general pattern of relative abundance of taxa was often observed to be similar although the concordance of 16s vs. shotgun methods was higher for simpler bacterial communities, as seen with the Kefir's community (Figures 3A,B). In the sample from the CD patient, the most abundant genus (Lactobacillus) was detected by both methods (gray bar), but the second was identified as Klebsiella in 16S and Citrobacter in the shotgun libraries (Figures 3A,B). This ambiguity likely occurs because the 16S rRNA gene sequences of these two genera share > 96% similarity. Many other taxa, like Bifidobacterium (Figures 3A,B) were consistently identified because they are phylogenetically more distant from the other taxa present. For the mock populations, all genera (n = 12) were found in shotgun libraries at both depths, but 16S libraries did not allow detection of the Akkermansia or Clostridium genera, even though they were ~5% of Mix-9. As expected, increasing sampling depth led to increased detection of taxa; with 1000 sequences 48 and 58 taxa were detected in 16S or shotgun libraries, respectively, and with 50,000 sequences this increased to 72 and 128. Based on our experimental bacterial mock populations, it is clear that some of the assignments are spurious and increasing sequencing depth augments the artifact. Of note, Propionibacterium was not included in our experimental mixes but was found in both types of libraries, indicative of contamination (Figures 3A,B). Indeed, environmental contamination poses a serious challenge for construction of NGS libraries (Laurence et al., 2014; Salter et al., 2014; Strong et al., 2014; Weiss et al., 2014).


Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics.

Jovel J, Patterson J, Wang W, Hotte N, O'Keefe S, Mitchel T, Perry T, Kao D, Mason AL, Madsen KL, Wong GK - Front Microbiol (2016)

Number of sequences required for taxonomic classification of samples with varying diversity. A series of samples were chosen to assess the effect of library complexity on the accuracy of taxonomy assignments and estimation of diversity of bacterial populations. Kefir represents the lowest point in the bacterial diversity spectrum, followed by a patient affected by Crohn's disease (CD), another one recovered from C. difficile infection (C. diff), a healthy individual (Hthy1) and three artificial mixes of bacteria (Mix7-9). (A,B) Libraries were randomly sampled at depths of 500, 1000, 5000, 10,000, 50,000 and 100,000 reads. End1 16S rRNA gene sequences were classified with QIIME using the closed reference method to cluster OTUs and a similarity threshold of 97%. Paired-end shotgun metagenomics sequences were aligned with LAST and taxonomically classified with MEGAN5. Each random sampling was repeated 20 times. As an example, the relative abundance of taxa for one of these samplings at a depth of 1000 or 50,000 sequences is presented for 16S and shotgun metagenomics libraries. A white asterisk indicates a group of bacterial sequences identified as Citrobacter in the shotgun panel and Klebsiella in the 16S panel. Bifidobacterium is indicated with a white plus sign. Propionibacterium is indicated with a white circle. (C,D) For each taxa detected and for each random sample, the proportion error was calculated as the difference between the proportion that each taxon represented in the whole library (i.e., with the maximum number of reads) and in the random sample. This difference was weighted by the proportion that each taxon represented in the whole library. We present the arithmetic mean of all weighted differences for each of the 20 random samples.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4837688&req=5

Figure 3: Number of sequences required for taxonomic classification of samples with varying diversity. A series of samples were chosen to assess the effect of library complexity on the accuracy of taxonomy assignments and estimation of diversity of bacterial populations. Kefir represents the lowest point in the bacterial diversity spectrum, followed by a patient affected by Crohn's disease (CD), another one recovered from C. difficile infection (C. diff), a healthy individual (Hthy1) and three artificial mixes of bacteria (Mix7-9). (A,B) Libraries were randomly sampled at depths of 500, 1000, 5000, 10,000, 50,000 and 100,000 reads. End1 16S rRNA gene sequences were classified with QIIME using the closed reference method to cluster OTUs and a similarity threshold of 97%. Paired-end shotgun metagenomics sequences were aligned with LAST and taxonomically classified with MEGAN5. Each random sampling was repeated 20 times. As an example, the relative abundance of taxa for one of these samplings at a depth of 1000 or 50,000 sequences is presented for 16S and shotgun metagenomics libraries. A white asterisk indicates a group of bacterial sequences identified as Citrobacter in the shotgun panel and Klebsiella in the 16S panel. Bifidobacterium is indicated with a white plus sign. Propionibacterium is indicated with a white circle. (C,D) For each taxa detected and for each random sample, the proportion error was calculated as the difference between the proportion that each taxon represented in the whole library (i.e., with the maximum number of reads) and in the random sample. This difference was weighted by the proportion that each taxon represented in the whole library. We present the arithmetic mean of all weighted differences for each of the 20 random samples.
Mentions: To investigate the minimal sequencing depth sufficient for accurately profiling bacterial community composition, we randomly sampled our libraries at depths of 500, 1000, 5000, 10,000, 50,000, and 100,000 reads. At each depth, sampling and analyses were repeated 20 times. As an example, we show that the taxonomic classification for each type of library at sequencing depths of 1000 and 50,000 was surprisingly consistent (Figures 3A,B). It is expected that taxonomic classification performed with each method will be to some extent divergent, as the resolution of the sequences used for taxonomic assignments is distinct and variable depending on which region of the genome is captured in shotgun surveys, which variable region of the 16S rRNA gene is used, and which composition of species is present in the community under analysis. However, the general pattern of relative abundance of taxa was often observed to be similar although the concordance of 16s vs. shotgun methods was higher for simpler bacterial communities, as seen with the Kefir's community (Figures 3A,B). In the sample from the CD patient, the most abundant genus (Lactobacillus) was detected by both methods (gray bar), but the second was identified as Klebsiella in 16S and Citrobacter in the shotgun libraries (Figures 3A,B). This ambiguity likely occurs because the 16S rRNA gene sequences of these two genera share > 96% similarity. Many other taxa, like Bifidobacterium (Figures 3A,B) were consistently identified because they are phylogenetically more distant from the other taxa present. For the mock populations, all genera (n = 12) were found in shotgun libraries at both depths, but 16S libraries did not allow detection of the Akkermansia or Clostridium genera, even though they were ~5% of Mix-9. As expected, increasing sampling depth led to increased detection of taxa; with 1000 sequences 48 and 58 taxa were detected in 16S or shotgun libraries, respectively, and with 50,000 sequences this increased to 72 and 128. Based on our experimental bacterial mock populations, it is clear that some of the assignments are spurious and increasing sequencing depth augments the artifact. Of note, Propionibacterium was not included in our experimental mixes but was found in both types of libraries, indicative of contamination (Figures 3A,B). Indeed, environmental contamination poses a serious challenge for construction of NGS libraries (Laurence et al., 2014; Salter et al., 2014; Strong et al., 2014; Weiss et al., 2014).

Bottom Line: The two main approaches for analyzing the microbiome, 16S ribosomal RNA (rRNA) gene amplicons and shotgun metagenomics, are illustrated with analyses of libraries designed to highlight their strengths and weaknesses.To the extent that fluctuations in the diversity of gut bacterial populations correlate with health and disease, we emphasize various techniques for the analysis of bacterial communities within samples (α-diversity) and between samples (β-diversity).Finally, we demonstrate techniques to infer the metabolic capabilities of a bacteria community from these 16S and shotgun data.

View Article: PubMed Central - PubMed

Affiliation: Department of Medicine, University of Alberta Edmonton, AB, Canada.

ABSTRACT
The advent of next generation sequencing (NGS) has enabled investigations of the gut microbiome with unprecedented resolution and throughput. This has stimulated the development of sophisticated bioinformatics tools to analyze the massive amounts of data generated. Researchers therefore need a clear understanding of the key concepts required for the design, execution and interpretation of NGS experiments on microbiomes. We conducted a literature review and used our own data to determine which approaches work best. The two main approaches for analyzing the microbiome, 16S ribosomal RNA (rRNA) gene amplicons and shotgun metagenomics, are illustrated with analyses of libraries designed to highlight their strengths and weaknesses. Several methods for taxonomic classification of bacterial sequences are discussed. We present simulations to assess the number of sequences that are required to perform reliable appraisals of bacterial community structure. To the extent that fluctuations in the diversity of gut bacterial populations correlate with health and disease, we emphasize various techniques for the analysis of bacterial communities within samples (α-diversity) and between samples (β-diversity). Finally, we demonstrate techniques to infer the metabolic capabilities of a bacteria community from these 16S and shotgun data.

No MeSH data available.


Related in: MedlinePlus