Limits...
BioMiCo: a supervised Bayesian model for inference of microbial community structure.

Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, Bielawski JP - Microbiome (2015)

Bottom Line: We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores.BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages.By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, Dalhousie University, Halifax, NS Canada.

ABSTRACT

Background: Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.

Results: Here, we describe a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). The model takes abundance data derived from environmental DNA, and models the composition of each sample by a two-level hierarchy of mixture distributions constrained by Dirichlet priors. BioMiCo is supervised, using known features for samples and appropriate prior constraints to overcome the challenges posed by many variables, sparse data, and large numbers of rare species. The model is trained on a portion of the data, where it learns how assemblages of species are mixed to form communities and how assemblages are related to the known features of each sample. Training yields a model that can predict the features of new samples. We used BioMiCo to build models for three serially sampled datasets and tested their predictive accuracy across different time points. The first model was trained to predict both body site (hand, mouth, and gut) and individual human host. It was able to reliably distinguish these features across different time points. The second was trained on vaginal microbiomes to predict both the Nugent score and individual human host. We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores. The third was trained for the purpose of assessing seasonal transitions in a coastal bacterial community. Application of this model to a high-resolution time series permitted us to track the rate and time of community succession and accurately predict known ecosystem-level events.

Conclusion: BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

No MeSH data available.


Heatmap showing the contributions of particular OTUs that characterize fecal samples of individuals 1 and 2. (A) Posterior probabilities for OTUs (same labels as in (B)) as determined by the model. (B) Empirical abundance of OTUs in samples collected in October 2009 and used to train the data (sample IDs listed on the right are from Caporaso et al. [1]). Note that 41 OTUs accounted for 95% of the posterior density for individual 1, and 86 OTUs accounted for 95% of the posterior density of individual 2. For clarity, we present only the 20 OTUs that have the highest posterior probability in each individual. For individual 1, the top 20 OTUs account for 90% of the posterior density. For individual 2, the top 20 OTUs account for 79% of the posterior density. The top 20 OTUs can be thought of as the “predominant” OTUs for each individual.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359585&req=5

Fig3: Heatmap showing the contributions of particular OTUs that characterize fecal samples of individuals 1 and 2. (A) Posterior probabilities for OTUs (same labels as in (B)) as determined by the model. (B) Empirical abundance of OTUs in samples collected in October 2009 and used to train the data (sample IDs listed on the right are from Caporaso et al. [1]). Note that 41 OTUs accounted for 95% of the posterior density for individual 1, and 86 OTUs accounted for 95% of the posterior density of individual 2. For clarity, we present only the 20 OTUs that have the highest posterior probability in each individual. For individual 1, the top 20 OTUs account for 90% of the posterior density. For individual 2, the top 20 OTUs account for 79% of the posterior density. The top 20 OTUs can be thought of as the “predominant” OTUs for each individual.

Mentions: The original set of OTU counts generated by Caporaso et al. [1] was filtered to remove all singletons (that is, OTUs observed in only one sample). Our model can be run on data that includes singletons, but they were excluded because they increase computational costs without providing a useful signal (and could potentially dilute signal strength for the assemblage membership of other OTUs). Filtering yielded a matrix of 1,967 samples (rows) having 15,685 OTUs (columns). This matrix represented the samples obtained from 396 time points for two human hosts over four body sites; each sample was labeled according to both host and body site (thus, K = 2 × 4 = 8 in this model). As was already reported [1], we found pronounced variability in microbiota from both subjects across months, weeks, and even days. Also, like the original study, samples from different body sites were easily distinguished throughout the sampling interval (Figure 2 and Additional file 3). However, we also uncovered evidence of a hierarchal structure in the form of assemblages of OTUs; the values of both αϕ and αθ estimated by using Metropolis-Hastings when training on seven different months were substantially less than 1.0 (αϕ: mean = 0.006, min = 0.005, max = 0.007; αθ: mean = 0.021, min = 0.012, max = 0.026). Further, the mixture weights revealed (i) some OTUs had similar, and temporally stable, mixture weights in both individuals (for example, Bacteriodes ovatus 119570 in Figure 3), (ii) some OTUs had consistently different mixture weights between individuals (for example, Bacteriodes 577170 in Figure 3), and (iii) most OTUs had no temporal stability. For this dataset, assemblages are composed of OTUs that tend to have the same co-occurrence pattern across serially sampled microbiomes; thus, they can represent a temporally stable signature within these data. Figure 3A shows OTUs from assemblages that are characteristic of fecal samples from individuals 1 and 2; note that they are composed of OTUs with varying degrees of abundance yet follow the same co-occurrence pattern.Figure 2


BioMiCo: a supervised Bayesian model for inference of microbial community structure.

Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, Bielawski JP - Microbiome (2015)

Heatmap showing the contributions of particular OTUs that characterize fecal samples of individuals 1 and 2. (A) Posterior probabilities for OTUs (same labels as in (B)) as determined by the model. (B) Empirical abundance of OTUs in samples collected in October 2009 and used to train the data (sample IDs listed on the right are from Caporaso et al. [1]). Note that 41 OTUs accounted for 95% of the posterior density for individual 1, and 86 OTUs accounted for 95% of the posterior density of individual 2. For clarity, we present only the 20 OTUs that have the highest posterior probability in each individual. For individual 1, the top 20 OTUs account for 90% of the posterior density. For individual 2, the top 20 OTUs account for 79% of the posterior density. The top 20 OTUs can be thought of as the “predominant” OTUs for each individual.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359585&req=5

Fig3: Heatmap showing the contributions of particular OTUs that characterize fecal samples of individuals 1 and 2. (A) Posterior probabilities for OTUs (same labels as in (B)) as determined by the model. (B) Empirical abundance of OTUs in samples collected in October 2009 and used to train the data (sample IDs listed on the right are from Caporaso et al. [1]). Note that 41 OTUs accounted for 95% of the posterior density for individual 1, and 86 OTUs accounted for 95% of the posterior density of individual 2. For clarity, we present only the 20 OTUs that have the highest posterior probability in each individual. For individual 1, the top 20 OTUs account for 90% of the posterior density. For individual 2, the top 20 OTUs account for 79% of the posterior density. The top 20 OTUs can be thought of as the “predominant” OTUs for each individual.
Mentions: The original set of OTU counts generated by Caporaso et al. [1] was filtered to remove all singletons (that is, OTUs observed in only one sample). Our model can be run on data that includes singletons, but they were excluded because they increase computational costs without providing a useful signal (and could potentially dilute signal strength for the assemblage membership of other OTUs). Filtering yielded a matrix of 1,967 samples (rows) having 15,685 OTUs (columns). This matrix represented the samples obtained from 396 time points for two human hosts over four body sites; each sample was labeled according to both host and body site (thus, K = 2 × 4 = 8 in this model). As was already reported [1], we found pronounced variability in microbiota from both subjects across months, weeks, and even days. Also, like the original study, samples from different body sites were easily distinguished throughout the sampling interval (Figure 2 and Additional file 3). However, we also uncovered evidence of a hierarchal structure in the form of assemblages of OTUs; the values of both αϕ and αθ estimated by using Metropolis-Hastings when training on seven different months were substantially less than 1.0 (αϕ: mean = 0.006, min = 0.005, max = 0.007; αθ: mean = 0.021, min = 0.012, max = 0.026). Further, the mixture weights revealed (i) some OTUs had similar, and temporally stable, mixture weights in both individuals (for example, Bacteriodes ovatus 119570 in Figure 3), (ii) some OTUs had consistently different mixture weights between individuals (for example, Bacteriodes 577170 in Figure 3), and (iii) most OTUs had no temporal stability. For this dataset, assemblages are composed of OTUs that tend to have the same co-occurrence pattern across serially sampled microbiomes; thus, they can represent a temporally stable signature within these data. Figure 3A shows OTUs from assemblages that are characteristic of fecal samples from individuals 1 and 2; note that they are composed of OTUs with varying degrees of abundance yet follow the same co-occurrence pattern.Figure 2

Bottom Line: We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores.BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages.By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, Dalhousie University, Halifax, NS Canada.

ABSTRACT

Background: Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.

Results: Here, we describe a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). The model takes abundance data derived from environmental DNA, and models the composition of each sample by a two-level hierarchy of mixture distributions constrained by Dirichlet priors. BioMiCo is supervised, using known features for samples and appropriate prior constraints to overcome the challenges posed by many variables, sparse data, and large numbers of rare species. The model is trained on a portion of the data, where it learns how assemblages of species are mixed to form communities and how assemblages are related to the known features of each sample. Training yields a model that can predict the features of new samples. We used BioMiCo to build models for three serially sampled datasets and tested their predictive accuracy across different time points. The first model was trained to predict both body site (hand, mouth, and gut) and individual human host. It was able to reliably distinguish these features across different time points. The second was trained on vaginal microbiomes to predict both the Nugent score and individual human host. We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores. The third was trained for the purpose of assessing seasonal transitions in a coastal bacterial community. Application of this model to a high-resolution time series permitted us to track the rate and time of community succession and accurately predict known ecosystem-level events.

Conclusion: BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

No MeSH data available.