Limits...
BioMiCo: a supervised Bayesian model for inference of microbial community structure.

Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, Bielawski JP - Microbiome (2015)

Bottom Line: We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores.BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages.By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, Dalhousie University, Halifax, NS Canada.

ABSTRACT

Background: Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.

Results: Here, we describe a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). The model takes abundance data derived from environmental DNA, and models the composition of each sample by a two-level hierarchy of mixture distributions constrained by Dirichlet priors. BioMiCo is supervised, using known features for samples and appropriate prior constraints to overcome the challenges posed by many variables, sparse data, and large numbers of rare species. The model is trained on a portion of the data, where it learns how assemblages of species are mixed to form communities and how assemblages are related to the known features of each sample. Training yields a model that can predict the features of new samples. We used BioMiCo to build models for three serially sampled datasets and tested their predictive accuracy across different time points. The first model was trained to predict both body site (hand, mouth, and gut) and individual human host. It was able to reliably distinguish these features across different time points. The second was trained on vaginal microbiomes to predict both the Nugent score and individual human host. We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores. The third was trained for the purpose of assessing seasonal transitions in a coastal bacterial community. Application of this model to a high-resolution time series permitted us to track the rate and time of community succession and accurately predict known ecosystem-level events.

Conclusion: BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

No MeSH data available.


Posterior distribution and composition of microbial assemblages with respect to “normal” and “elevated” Nugent scores in asymptomatic human females. Distributions were inferred from the vaginal microbiomes of 32 individuals collected over a 16-week period [26]. (A) Mixing probabilities for the assemblages comprising 95% of the posterior distribution for “normal” and “elevated” Nugent scores. “Normal” was defined by a Nugent score of 0 to 3, and 47 assemblages were responsible for 95% of their posterior distribution. “Elevated” was defined by a Nugent score >4, and 12 assemblages were responsible for 95% of their posterior distribution. As five assemblages were shared between these two distributions, there are a total of 54 assemblages in this plot. (B) Relative magnitude of the mixing probabilities of 27 OTUs (identity is indicated along the x-axis of part (C) marginalized over the 54 assemblages in part (A). Note that 14 OTUs accounted for 95% of the posterior density (PD) of the “normal” Nugent score category and 19 OTUs accounted for 95% of the PD of the “elevated” Nugent score category. As six OTUs were shared between these two distributions, there are mixing probabilities for a total of 27 unique OTUs in this plot. (C) Empirical relative abundance of the 27 OTUs in the 484 training samples. OTU identity is given along the x-axis. Each row represents an individual sample.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359585&req=5

Fig4: Posterior distribution and composition of microbial assemblages with respect to “normal” and “elevated” Nugent scores in asymptomatic human females. Distributions were inferred from the vaginal microbiomes of 32 individuals collected over a 16-week period [26]. (A) Mixing probabilities for the assemblages comprising 95% of the posterior distribution for “normal” and “elevated” Nugent scores. “Normal” was defined by a Nugent score of 0 to 3, and 47 assemblages were responsible for 95% of their posterior distribution. “Elevated” was defined by a Nugent score >4, and 12 assemblages were responsible for 95% of their posterior distribution. As five assemblages were shared between these two distributions, there are a total of 54 assemblages in this plot. (B) Relative magnitude of the mixing probabilities of 27 OTUs (identity is indicated along the x-axis of part (C) marginalized over the 54 assemblages in part (A). Note that 14 OTUs accounted for 95% of the posterior density (PD) of the “normal” Nugent score category and 19 OTUs accounted for 95% of the PD of the “elevated” Nugent score category. As six OTUs were shared between these two distributions, there are mixing probabilities for a total of 27 unique OTUs in this plot. (C) Empirical relative abundance of the 27 OTUs in the 484 training samples. OTU identity is given along the x-axis. Each row represents an individual sample.

Mentions: Figure 4 summarizes the structure of the assemblages learned by the model and their mixing probabilities with respect to “normal” and “elevated” Nugent scores. The difference between assemblage distributions (Figure 4A) is due to differences in the depth of community structure. The group of individuals having a “normal” Nugent score had a relatively flattened distribution of highly sparse assemblages. Sparse assemblages are characterized by just a very few (in this case one to three) OTUs with non-trivial mixing probabilities. The flat assemblage distribution reflects the contribution of many low-abundance OTUs to the “normal” label, as well as the tendency of Lactobacillus iners and Lactobacillus crispatus to co-occur with different low-abundance OTUs in different individuals. The group of individuals having “elevated” Nugent scores had a deeper community structure, that is, more complex co-occurrence patterns concentrated in fewer assemblages, yielding a much more skewed assemblage distribution. As the Nugent score is based on the premise that Lactobacilli decrease the score, and that Gardnerella or Bacteroides spp. or curved gram variable rods increase the score, it is not surprising that this signal is contained in the OTU assemblages (Figure 4). However, the OTU composition of the assemblages with the highest mixing probabilities provides additional information about individuals with elevated Nugent scores; they represent the central tendency of co-occurrence relationships over all the training samples having an “elevated” Nugent score. To the extent that these patterns have temporal stability, they can be used to make predictions about unlabeled data.Figure 4


BioMiCo: a supervised Bayesian model for inference of microbial community structure.

Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, Bielawski JP - Microbiome (2015)

Posterior distribution and composition of microbial assemblages with respect to “normal” and “elevated” Nugent scores in asymptomatic human females. Distributions were inferred from the vaginal microbiomes of 32 individuals collected over a 16-week period [26]. (A) Mixing probabilities for the assemblages comprising 95% of the posterior distribution for “normal” and “elevated” Nugent scores. “Normal” was defined by a Nugent score of 0 to 3, and 47 assemblages were responsible for 95% of their posterior distribution. “Elevated” was defined by a Nugent score >4, and 12 assemblages were responsible for 95% of their posterior distribution. As five assemblages were shared between these two distributions, there are a total of 54 assemblages in this plot. (B) Relative magnitude of the mixing probabilities of 27 OTUs (identity is indicated along the x-axis of part (C) marginalized over the 54 assemblages in part (A). Note that 14 OTUs accounted for 95% of the posterior density (PD) of the “normal” Nugent score category and 19 OTUs accounted for 95% of the PD of the “elevated” Nugent score category. As six OTUs were shared between these two distributions, there are mixing probabilities for a total of 27 unique OTUs in this plot. (C) Empirical relative abundance of the 27 OTUs in the 484 training samples. OTU identity is given along the x-axis. Each row represents an individual sample.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359585&req=5

Fig4: Posterior distribution and composition of microbial assemblages with respect to “normal” and “elevated” Nugent scores in asymptomatic human females. Distributions were inferred from the vaginal microbiomes of 32 individuals collected over a 16-week period [26]. (A) Mixing probabilities for the assemblages comprising 95% of the posterior distribution for “normal” and “elevated” Nugent scores. “Normal” was defined by a Nugent score of 0 to 3, and 47 assemblages were responsible for 95% of their posterior distribution. “Elevated” was defined by a Nugent score >4, and 12 assemblages were responsible for 95% of their posterior distribution. As five assemblages were shared between these two distributions, there are a total of 54 assemblages in this plot. (B) Relative magnitude of the mixing probabilities of 27 OTUs (identity is indicated along the x-axis of part (C) marginalized over the 54 assemblages in part (A). Note that 14 OTUs accounted for 95% of the posterior density (PD) of the “normal” Nugent score category and 19 OTUs accounted for 95% of the PD of the “elevated” Nugent score category. As six OTUs were shared between these two distributions, there are mixing probabilities for a total of 27 unique OTUs in this plot. (C) Empirical relative abundance of the 27 OTUs in the 484 training samples. OTU identity is given along the x-axis. Each row represents an individual sample.
Mentions: Figure 4 summarizes the structure of the assemblages learned by the model and their mixing probabilities with respect to “normal” and “elevated” Nugent scores. The difference between assemblage distributions (Figure 4A) is due to differences in the depth of community structure. The group of individuals having a “normal” Nugent score had a relatively flattened distribution of highly sparse assemblages. Sparse assemblages are characterized by just a very few (in this case one to three) OTUs with non-trivial mixing probabilities. The flat assemblage distribution reflects the contribution of many low-abundance OTUs to the “normal” label, as well as the tendency of Lactobacillus iners and Lactobacillus crispatus to co-occur with different low-abundance OTUs in different individuals. The group of individuals having “elevated” Nugent scores had a deeper community structure, that is, more complex co-occurrence patterns concentrated in fewer assemblages, yielding a much more skewed assemblage distribution. As the Nugent score is based on the premise that Lactobacilli decrease the score, and that Gardnerella or Bacteroides spp. or curved gram variable rods increase the score, it is not surprising that this signal is contained in the OTU assemblages (Figure 4). However, the OTU composition of the assemblages with the highest mixing probabilities provides additional information about individuals with elevated Nugent scores; they represent the central tendency of co-occurrence relationships over all the training samples having an “elevated” Nugent score. To the extent that these patterns have temporal stability, they can be used to make predictions about unlabeled data.Figure 4

Bottom Line: We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores.BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages.By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, Dalhousie University, Halifax, NS Canada.

ABSTRACT

Background: Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.

Results: Here, we describe a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). The model takes abundance data derived from environmental DNA, and models the composition of each sample by a two-level hierarchy of mixture distributions constrained by Dirichlet priors. BioMiCo is supervised, using known features for samples and appropriate prior constraints to overcome the challenges posed by many variables, sparse data, and large numbers of rare species. The model is trained on a portion of the data, where it learns how assemblages of species are mixed to form communities and how assemblages are related to the known features of each sample. Training yields a model that can predict the features of new samples. We used BioMiCo to build models for three serially sampled datasets and tested their predictive accuracy across different time points. The first model was trained to predict both body site (hand, mouth, and gut) and individual human host. It was able to reliably distinguish these features across different time points. The second was trained on vaginal microbiomes to predict both the Nugent score and individual human host. We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores. The third was trained for the purpose of assessing seasonal transitions in a coastal bacterial community. Application of this model to a high-resolution time series permitted us to track the rate and time of community succession and accurately predict known ecosystem-level events.

Conclusion: BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

No MeSH data available.