Limits...
Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data.

Xu L, Paterson AD, Turpin W, Xu W - PLoS ONE (2015)

Bottom Line: We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components.We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations.We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.

View Article: PubMed Central - PubMed

Affiliation: Dalla Lana School of Public Health, University of Toronto, ON, M5T 3M7, Canada.

ABSTRACT
Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard parametric and non-parametric models, hurdle models, and zero inflated models. We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components. We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations. We also evaluate the abilities of model selection strategies using Akaike information criterion (AIC) or Vuong test to identify the correct model. The simulation studies show that hurdle and zero inflated models have well controlled type I errors, higher power, better goodness of fit measures, and are more accurate and efficient in the parameter estimation. Besides that, the hurdle models have similar goodness of fit and parameter estimation for the count component as their corresponding zero inflated models. However, the estimation and interpretation of the parameters for the zero components differs, and hurdle models are more stable when structural zeros are absent. We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.

No MeSH data available.


Related in: MedlinePlus

The comparison plots of the observed and expected counts of bacteria for Campylobacter, Anaerotruncus and Dehalobacterium for females and males using the best three models judging by AIC criterion.The X axis is the possible values of the OTUs, the bars are the observed counts, the red line connects the expected counts produced by the model with smallest AIC values, the green line connects the expected counts produced by the model with the second smallest AIC values and the blue line connects the expected counts produced by the model with the third smallest AIC values. The first, second and third row of the plots are for bacteria Campylobacter, Anaerotruncus, and Dehalobacterium, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493133&req=5

pone.0129606.g011: The comparison plots of the observed and expected counts of bacteria for Campylobacter, Anaerotruncus and Dehalobacterium for females and males using the best three models judging by AIC criterion.The X axis is the possible values of the OTUs, the bars are the observed counts, the red line connects the expected counts produced by the model with smallest AIC values, the green line connects the expected counts produced by the model with the second smallest AIC values and the blue line connects the expected counts produced by the model with the third smallest AIC values. The first, second and third row of the plots are for bacteria Campylobacter, Anaerotruncus, and Dehalobacterium, respectively.

Mentions: For Campylobacter with 77% zeros (Table 3), NB, NBH and ZINB has the first, second and third smallest AICs, respectively, and the AIC values are very close. In addition, all of these models consistently detect a significant gender effect, while other models do not. Furthermore, their predictions (Fig 11) are similar and can describe the observed sequence counts very well. They also perform about the same in the estimations of γ1. However, the ZINB provides a relatively large SE for , indicating the lack of stability of the parameter estimate in the ZINB parameterization. Vuong test shows no particular preference for any of these three models. We thus choose NB as the fitting model and conclude that gender is significantly associated with the OTU count levels of Campylobacter, with males having significantly lower mean counts than females.


Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data.

Xu L, Paterson AD, Turpin W, Xu W - PLoS ONE (2015)

The comparison plots of the observed and expected counts of bacteria for Campylobacter, Anaerotruncus and Dehalobacterium for females and males using the best three models judging by AIC criterion.The X axis is the possible values of the OTUs, the bars are the observed counts, the red line connects the expected counts produced by the model with smallest AIC values, the green line connects the expected counts produced by the model with the second smallest AIC values and the blue line connects the expected counts produced by the model with the third smallest AIC values. The first, second and third row of the plots are for bacteria Campylobacter, Anaerotruncus, and Dehalobacterium, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493133&req=5

pone.0129606.g011: The comparison plots of the observed and expected counts of bacteria for Campylobacter, Anaerotruncus and Dehalobacterium for females and males using the best three models judging by AIC criterion.The X axis is the possible values of the OTUs, the bars are the observed counts, the red line connects the expected counts produced by the model with smallest AIC values, the green line connects the expected counts produced by the model with the second smallest AIC values and the blue line connects the expected counts produced by the model with the third smallest AIC values. The first, second and third row of the plots are for bacteria Campylobacter, Anaerotruncus, and Dehalobacterium, respectively.
Mentions: For Campylobacter with 77% zeros (Table 3), NB, NBH and ZINB has the first, second and third smallest AICs, respectively, and the AIC values are very close. In addition, all of these models consistently detect a significant gender effect, while other models do not. Furthermore, their predictions (Fig 11) are similar and can describe the observed sequence counts very well. They also perform about the same in the estimations of γ1. However, the ZINB provides a relatively large SE for , indicating the lack of stability of the parameter estimate in the ZINB parameterization. Vuong test shows no particular preference for any of these three models. We thus choose NB as the fitting model and conclude that gender is significantly associated with the OTU count levels of Campylobacter, with males having significantly lower mean counts than females.

Bottom Line: We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components.We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations.We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.

View Article: PubMed Central - PubMed

Affiliation: Dalla Lana School of Public Health, University of Toronto, ON, M5T 3M7, Canada.

ABSTRACT
Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard parametric and non-parametric models, hurdle models, and zero inflated models. We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components. We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations. We also evaluate the abilities of model selection strategies using Akaike information criterion (AIC) or Vuong test to identify the correct model. The simulation studies show that hurdle and zero inflated models have well controlled type I errors, higher power, better goodness of fit measures, and are more accurate and efficient in the parameter estimation. Besides that, the hurdle models have similar goodness of fit and parameter estimation for the count component as their corresponding zero inflated models. However, the estimation and interpretation of the parameters for the zero components differs, and hurdle models are more stable when structural zeros are absent. We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.

No MeSH data available.


Related in: MedlinePlus