Limits...
BioMiCo: a supervised Bayesian model for inference of microbial community structure.

Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, Bielawski JP - Microbiome (2015)

Bottom Line: We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores.BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages.By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, Dalhousie University, Halifax, NS Canada.

ABSTRACT

Background: Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.

Results: Here, we describe a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). The model takes abundance data derived from environmental DNA, and models the composition of each sample by a two-level hierarchy of mixture distributions constrained by Dirichlet priors. BioMiCo is supervised, using known features for samples and appropriate prior constraints to overcome the challenges posed by many variables, sparse data, and large numbers of rare species. The model is trained on a portion of the data, where it learns how assemblages of species are mixed to form communities and how assemblages are related to the known features of each sample. Training yields a model that can predict the features of new samples. We used BioMiCo to build models for three serially sampled datasets and tested their predictive accuracy across different time points. The first model was trained to predict both body site (hand, mouth, and gut) and individual human host. It was able to reliably distinguish these features across different time points. The second was trained on vaginal microbiomes to predict both the Nugent score and individual human host. We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores. The third was trained for the purpose of assessing seasonal transitions in a coastal bacterial community. Application of this model to a high-resolution time series permitted us to track the rate and time of community succession and accurately predict known ecosystem-level events.

Conclusion: BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

No MeSH data available.


Plate diagram of the mixed-membership model for BioMiCo.π is the probability distribution on the possible source environments, X represents the environments, Z represents the assemblages, W represents the OTUs, θ is the prior distribution of assemblages in environments, and ϕ is the prior distribution of OTUs in assemblages. The variable α represents the concentration parameters for prior distributions, with αθ being the concentration parameter for the prior on the distribution of assemblages in environments, αϕ the concentration parameter for the prior on the distribution of OTUs in assemblages, and απ the concentration parameter for the prior on the distribution of environments. N is the number of samples in a dataset, and Nn is the number of OTUs in sample n. K is the number of environments, and L is the number of assemblages.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359585&req=5

Fig1: Plate diagram of the mixed-membership model for BioMiCo.π is the probability distribution on the possible source environments, X represents the environments, Z represents the assemblages, W represents the OTUs, θ is the prior distribution of assemblages in environments, and ϕ is the prior distribution of OTUs in assemblages. The variable α represents the concentration parameters for prior distributions, with αθ being the concentration parameter for the prior on the distribution of assemblages in environments, αϕ the concentration parameter for the prior on the distribution of OTUs in assemblages, and απ the concentration parameter for the prior on the distribution of environments. N is the number of samples in a dataset, and Nn is the number of OTUs in sample n. K is the number of environments, and L is the number of assemblages.

Mentions: We use symmetric Dirichlet because we have no prior knowledge to favor a particular OTU in an assemblage. Thus, the model is used to differentiate assemblages according to their unique mixture OTUs. A plate diagram of the model is shown in Figure 1.Figure 1


BioMiCo: a supervised Bayesian model for inference of microbial community structure.

Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, Bielawski JP - Microbiome (2015)

Plate diagram of the mixed-membership model for BioMiCo.π is the probability distribution on the possible source environments, X represents the environments, Z represents the assemblages, W represents the OTUs, θ is the prior distribution of assemblages in environments, and ϕ is the prior distribution of OTUs in assemblages. The variable α represents the concentration parameters for prior distributions, with αθ being the concentration parameter for the prior on the distribution of assemblages in environments, αϕ the concentration parameter for the prior on the distribution of OTUs in assemblages, and απ the concentration parameter for the prior on the distribution of environments. N is the number of samples in a dataset, and Nn is the number of OTUs in sample n. K is the number of environments, and L is the number of assemblages.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359585&req=5

Fig1: Plate diagram of the mixed-membership model for BioMiCo.π is the probability distribution on the possible source environments, X represents the environments, Z represents the assemblages, W represents the OTUs, θ is the prior distribution of assemblages in environments, and ϕ is the prior distribution of OTUs in assemblages. The variable α represents the concentration parameters for prior distributions, with αθ being the concentration parameter for the prior on the distribution of assemblages in environments, αϕ the concentration parameter for the prior on the distribution of OTUs in assemblages, and απ the concentration parameter for the prior on the distribution of environments. N is the number of samples in a dataset, and Nn is the number of OTUs in sample n. K is the number of environments, and L is the number of assemblages.
Mentions: We use symmetric Dirichlet because we have no prior knowledge to favor a particular OTU in an assemblage. Thus, the model is used to differentiate assemblages according to their unique mixture OTUs. A plate diagram of the model is shown in Figure 1.Figure 1

Bottom Line: We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores.BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages.By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, Dalhousie University, Halifax, NS Canada.

ABSTRACT

Background: Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.

Results: Here, we describe a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). The model takes abundance data derived from environmental DNA, and models the composition of each sample by a two-level hierarchy of mixture distributions constrained by Dirichlet priors. BioMiCo is supervised, using known features for samples and appropriate prior constraints to overcome the challenges posed by many variables, sparse data, and large numbers of rare species. The model is trained on a portion of the data, where it learns how assemblages of species are mixed to form communities and how assemblages are related to the known features of each sample. Training yields a model that can predict the features of new samples. We used BioMiCo to build models for three serially sampled datasets and tested their predictive accuracy across different time points. The first model was trained to predict both body site (hand, mouth, and gut) and individual human host. It was able to reliably distinguish these features across different time points. The second was trained on vaginal microbiomes to predict both the Nugent score and individual human host. We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores. The third was trained for the purpose of assessing seasonal transitions in a coastal bacterial community. Application of this model to a high-resolution time series permitted us to track the rate and time of community succession and accurately predict known ecosystem-level events.

Conclusion: BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

No MeSH data available.