Limits...
BioMiCo: a supervised Bayesian model for inference of microbial community structure.

Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, Bielawski JP - Microbiome (2015)

Bottom Line: We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores.BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages.By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, Dalhousie University, Halifax, NS Canada.

ABSTRACT

Background: Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.

Results: Here, we describe a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). The model takes abundance data derived from environmental DNA, and models the composition of each sample by a two-level hierarchy of mixture distributions constrained by Dirichlet priors. BioMiCo is supervised, using known features for samples and appropriate prior constraints to overcome the challenges posed by many variables, sparse data, and large numbers of rare species. The model is trained on a portion of the data, where it learns how assemblages of species are mixed to form communities and how assemblages are related to the known features of each sample. Training yields a model that can predict the features of new samples. We used BioMiCo to build models for three serially sampled datasets and tested their predictive accuracy across different time points. The first model was trained to predict both body site (hand, mouth, and gut) and individual human host. It was able to reliably distinguish these features across different time points. The second was trained on vaginal microbiomes to predict both the Nugent score and individual human host. We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores. The third was trained for the purpose of assessing seasonal transitions in a coastal bacterial community. Application of this model to a high-resolution time series permitted us to track the rate and time of community succession and accurately predict known ecosystem-level events.

Conclusion: BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

No MeSH data available.


Posterior distribution and composition of seasonal assemblages in a coastal bacterial community. (A) Mixing probabilities for microbial assemblages at four distinct seasonal time points. Colors used to define seasons were spring equinox (SE) - green; summer solstice (SS) - red; autumn equinox (AE) - orange; and winter solstice (WS) - blue. Samples were from 24 surface (1 m) collections taken from 2005 to 2010 in the Bedford Basin, a coastal inlet of the temperate northwest Atlantic Ocean [35]. (B) Hierarchical clustering of OTU-mixing probabilities from the four seasonal assemblages in part (A). For these assemblages, a very large number of OTUs contribute 95% of the posterior density (PD), but a small subset contributes a disproportionately large amount of that density. We refer to this influential subset as the predominant OTUs and define them according to the inflection point in their posterior OTU distribution. For clarity, we clustered only the predominant OTUs. For SE, there were 15 predominant OTUs (75% of PD). For SS, there were 15 predominant OTUs (73% of PD). For AE, there were 17 predominant OTUs (62% of PD). For WS, there were 19 predominant OTUs (66% of PD). (C) Model-based predictions for the high-resolution time series collected biweekly from January to December in 2009.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359585&req=5

Fig5: Posterior distribution and composition of seasonal assemblages in a coastal bacterial community. (A) Mixing probabilities for microbial assemblages at four distinct seasonal time points. Colors used to define seasons were spring equinox (SE) - green; summer solstice (SS) - red; autumn equinox (AE) - orange; and winter solstice (WS) - blue. Samples were from 24 surface (1 m) collections taken from 2005 to 2010 in the Bedford Basin, a coastal inlet of the temperate northwest Atlantic Ocean [35]. (B) Hierarchical clustering of OTU-mixing probabilities from the four seasonal assemblages in part (A). For these assemblages, a very large number of OTUs contribute 95% of the posterior density (PD), but a small subset contributes a disproportionately large amount of that density. We refer to this influential subset as the predominant OTUs and define them according to the inflection point in their posterior OTU distribution. For clarity, we clustered only the predominant OTUs. For SE, there were 15 predominant OTUs (75% of PD). For SS, there were 15 predominant OTUs (73% of PD). For AE, there were 17 predominant OTUs (62% of PD). For WS, there were 19 predominant OTUs (66% of PD). (C) Model-based predictions for the high-resolution time series collected biweekly from January to December in 2009.

Mentions: As proof of principle, we applied our modeling framework to the data of El-Swais et al. [35], which consisted of a set of OTUs from bacterial communities in the Bedford Basin, a coastal inlet of the temperate northwest Atlantic Ocean sampled over a 6-year period. We trained the model on 24 samples collected from 2005 to 2010; each sample was labeled according to four distinct seasonal time points (thus, K = 4 in this model). The factor values for training consisted of the spring equinox (SE), summer solstice (SS), autumn equinox (AE), and winter solstice (WS). The training phase identified four highly informative assemblages, each contributing a high posterior mixing probability to one of the four seasons (Figure 5A). Hence, as expected, samples from different seasons were clearly distinguishable from each other. More interestingly, seasonal differences in the mixing probabilities of OTUs revealed putative patterns of ecological interaction between taxa. A number of OTUs had high-mixture weights for only one assemblage, which is indicative of seasonal specificity. For example, Polaribacter, Cytophaga, and Alteromonadales OTUs were important members of the SE assemblage (Figure 4B), which supports earlier studies reporting that these taxa are often associated with the consumption of organic matter from spring phytoplankton blooms [36,37]. During the summer, the Bedford Basin surface waters are characterized by low-nutrient availability [38]. As such, it follows that OTUs from the SAR11 and SAR86 clades were principal members of the SS assemblage (Figure 4B), in agreement with the known oligotrophic nature of these marine bacteria [39,40]. These results demonstrate that even with only a few training samples (six per each season), the model can identify ecological structure in the data. Readers should consult El-Swais et al. [35] for additional details about the temporal variability of bacteria in the Bedford Basin.Figure 5


BioMiCo: a supervised Bayesian model for inference of microbial community structure.

Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, Bielawski JP - Microbiome (2015)

Posterior distribution and composition of seasonal assemblages in a coastal bacterial community. (A) Mixing probabilities for microbial assemblages at four distinct seasonal time points. Colors used to define seasons were spring equinox (SE) - green; summer solstice (SS) - red; autumn equinox (AE) - orange; and winter solstice (WS) - blue. Samples were from 24 surface (1 m) collections taken from 2005 to 2010 in the Bedford Basin, a coastal inlet of the temperate northwest Atlantic Ocean [35]. (B) Hierarchical clustering of OTU-mixing probabilities from the four seasonal assemblages in part (A). For these assemblages, a very large number of OTUs contribute 95% of the posterior density (PD), but a small subset contributes a disproportionately large amount of that density. We refer to this influential subset as the predominant OTUs and define them according to the inflection point in their posterior OTU distribution. For clarity, we clustered only the predominant OTUs. For SE, there were 15 predominant OTUs (75% of PD). For SS, there were 15 predominant OTUs (73% of PD). For AE, there were 17 predominant OTUs (62% of PD). For WS, there were 19 predominant OTUs (66% of PD). (C) Model-based predictions for the high-resolution time series collected biweekly from January to December in 2009.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359585&req=5

Fig5: Posterior distribution and composition of seasonal assemblages in a coastal bacterial community. (A) Mixing probabilities for microbial assemblages at four distinct seasonal time points. Colors used to define seasons were spring equinox (SE) - green; summer solstice (SS) - red; autumn equinox (AE) - orange; and winter solstice (WS) - blue. Samples were from 24 surface (1 m) collections taken from 2005 to 2010 in the Bedford Basin, a coastal inlet of the temperate northwest Atlantic Ocean [35]. (B) Hierarchical clustering of OTU-mixing probabilities from the four seasonal assemblages in part (A). For these assemblages, a very large number of OTUs contribute 95% of the posterior density (PD), but a small subset contributes a disproportionately large amount of that density. We refer to this influential subset as the predominant OTUs and define them according to the inflection point in their posterior OTU distribution. For clarity, we clustered only the predominant OTUs. For SE, there were 15 predominant OTUs (75% of PD). For SS, there were 15 predominant OTUs (73% of PD). For AE, there were 17 predominant OTUs (62% of PD). For WS, there were 19 predominant OTUs (66% of PD). (C) Model-based predictions for the high-resolution time series collected biweekly from January to December in 2009.
Mentions: As proof of principle, we applied our modeling framework to the data of El-Swais et al. [35], which consisted of a set of OTUs from bacterial communities in the Bedford Basin, a coastal inlet of the temperate northwest Atlantic Ocean sampled over a 6-year period. We trained the model on 24 samples collected from 2005 to 2010; each sample was labeled according to four distinct seasonal time points (thus, K = 4 in this model). The factor values for training consisted of the spring equinox (SE), summer solstice (SS), autumn equinox (AE), and winter solstice (WS). The training phase identified four highly informative assemblages, each contributing a high posterior mixing probability to one of the four seasons (Figure 5A). Hence, as expected, samples from different seasons were clearly distinguishable from each other. More interestingly, seasonal differences in the mixing probabilities of OTUs revealed putative patterns of ecological interaction between taxa. A number of OTUs had high-mixture weights for only one assemblage, which is indicative of seasonal specificity. For example, Polaribacter, Cytophaga, and Alteromonadales OTUs were important members of the SE assemblage (Figure 4B), which supports earlier studies reporting that these taxa are often associated with the consumption of organic matter from spring phytoplankton blooms [36,37]. During the summer, the Bedford Basin surface waters are characterized by low-nutrient availability [38]. As such, it follows that OTUs from the SAR11 and SAR86 clades were principal members of the SS assemblage (Figure 4B), in agreement with the known oligotrophic nature of these marine bacteria [39,40]. These results demonstrate that even with only a few training samples (six per each season), the model can identify ecological structure in the data. Readers should consult El-Swais et al. [35] for additional details about the temporal variability of bacteria in the Bedford Basin.Figure 5

Bottom Line: We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores.BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages.By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, Dalhousie University, Halifax, NS Canada.

ABSTRACT

Background: Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.

Results: Here, we describe a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). The model takes abundance data derived from environmental DNA, and models the composition of each sample by a two-level hierarchy of mixture distributions constrained by Dirichlet priors. BioMiCo is supervised, using known features for samples and appropriate prior constraints to overcome the challenges posed by many variables, sparse data, and large numbers of rare species. The model is trained on a portion of the data, where it learns how assemblages of species are mixed to form communities and how assemblages are related to the known features of each sample. Training yields a model that can predict the features of new samples. We used BioMiCo to build models for three serially sampled datasets and tested their predictive accuracy across different time points. The first model was trained to predict both body site (hand, mouth, and gut) and individual human host. It was able to reliably distinguish these features across different time points. The second was trained on vaginal microbiomes to predict both the Nugent score and individual human host. We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores. The third was trained for the purpose of assessing seasonal transitions in a coastal bacterial community. Application of this model to a high-resolution time series permitted us to track the rate and time of community succession and accurately predict known ecosystem-level events.

Conclusion: BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

No MeSH data available.