Limits...
Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data.

Xu L, Paterson AD, Turpin W, Xu W - PLoS ONE (2015)

Bottom Line: We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components.We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations.We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.

View Article: PubMed Central - PubMed

Affiliation: Dalla Lana School of Public Health, University of Toronto, ON, M5T 3M7, Canada.

ABSTRACT
Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard parametric and non-parametric models, hurdle models, and zero inflated models. We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components. We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations. We also evaluate the abilities of model selection strategies using Akaike information criterion (AIC) or Vuong test to identify the correct model. The simulation studies show that hurdle and zero inflated models have well controlled type I errors, higher power, better goodness of fit measures, and are more accurate and efficient in the parameter estimation. Besides that, the hurdle models have similar goodness of fit and parameter estimation for the count component as their corresponding zero inflated models. However, the estimation and interpretation of the parameters for the zero components differs, and hurdle models are more stable when structural zeros are absent. We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.

No MeSH data available.


Related in: MedlinePlus

The power of test for ZINB simulated data.The X axis is the value of the covariate effect on the count data γ1 and the Y axis is the power of test when the level of significance is 0.05. Three different cases of covariate effect, i.e., the consonant (ϕt = ϕc − 5%), neutral (ϕt = ϕc) and dissonant (ϕt = ϕc + 5%) effect, are presented in panels (A) and (B); (C) and (D); and (E) and (F), respectively. Each column reflects different proportion of zero inflation in the unexposed group: 20% in (A), (C) and (E); and 50% in (B), (D) and (F) from the left to the right column, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493133&req=5

pone.0129606.g004: The power of test for ZINB simulated data.The X axis is the value of the covariate effect on the count data γ1 and the Y axis is the power of test when the level of significance is 0.05. Three different cases of covariate effect, i.e., the consonant (ϕt = ϕc − 5%), neutral (ϕt = ϕc) and dissonant (ϕt = ϕc + 5%) effect, are presented in panels (A) and (B); (C) and (D); and (E) and (F), respectively. Each column reflects different proportion of zero inflation in the unexposed group: 20% in (A), (C) and (E); and 50% in (B), (D) and (F) from the left to the right column, respectively.

Mentions: Fig 3 and Fig 4 show the power of test when applying different analysis methods to the simulated ZIP and ZINB distributed data, respectively. Methods having the potential of large inflated type I errors (e.g., Poisson model or PH/ZIP model for ZINB distributed data) are not included in these comparisons. These plots show that the hurdle or ZI models perform consistently well in all scenarios examined, while the behaviors of one part models vary across different methods and simulation scenarios. In the consonant effect case, one part models such as LOLS and NB tend to do as well as ZI or hurdle models with WRS performing worse when the proportion of zeros is large. However, in dissonant effect cases, one part models fail to have good power to detect the significance of the overall covariate effect. This is consistent with the observation by Lachenbruch [17] for the continuous non-negative responses with excess zeros. In the neutral effect case, when the proportion of structural zeros is 50% or more, the one-part models also have lower power than the two part models.


Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data.

Xu L, Paterson AD, Turpin W, Xu W - PLoS ONE (2015)

The power of test for ZINB simulated data.The X axis is the value of the covariate effect on the count data γ1 and the Y axis is the power of test when the level of significance is 0.05. Three different cases of covariate effect, i.e., the consonant (ϕt = ϕc − 5%), neutral (ϕt = ϕc) and dissonant (ϕt = ϕc + 5%) effect, are presented in panels (A) and (B); (C) and (D); and (E) and (F), respectively. Each column reflects different proportion of zero inflation in the unexposed group: 20% in (A), (C) and (E); and 50% in (B), (D) and (F) from the left to the right column, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493133&req=5

pone.0129606.g004: The power of test for ZINB simulated data.The X axis is the value of the covariate effect on the count data γ1 and the Y axis is the power of test when the level of significance is 0.05. Three different cases of covariate effect, i.e., the consonant (ϕt = ϕc − 5%), neutral (ϕt = ϕc) and dissonant (ϕt = ϕc + 5%) effect, are presented in panels (A) and (B); (C) and (D); and (E) and (F), respectively. Each column reflects different proportion of zero inflation in the unexposed group: 20% in (A), (C) and (E); and 50% in (B), (D) and (F) from the left to the right column, respectively.
Mentions: Fig 3 and Fig 4 show the power of test when applying different analysis methods to the simulated ZIP and ZINB distributed data, respectively. Methods having the potential of large inflated type I errors (e.g., Poisson model or PH/ZIP model for ZINB distributed data) are not included in these comparisons. These plots show that the hurdle or ZI models perform consistently well in all scenarios examined, while the behaviors of one part models vary across different methods and simulation scenarios. In the consonant effect case, one part models such as LOLS and NB tend to do as well as ZI or hurdle models with WRS performing worse when the proportion of zeros is large. However, in dissonant effect cases, one part models fail to have good power to detect the significance of the overall covariate effect. This is consistent with the observation by Lachenbruch [17] for the continuous non-negative responses with excess zeros. In the neutral effect case, when the proportion of structural zeros is 50% or more, the one-part models also have lower power than the two part models.

Bottom Line: We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components.We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations.We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.

View Article: PubMed Central - PubMed

Affiliation: Dalla Lana School of Public Health, University of Toronto, ON, M5T 3M7, Canada.

ABSTRACT
Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard parametric and non-parametric models, hurdle models, and zero inflated models. We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components. We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations. We also evaluate the abilities of model selection strategies using Akaike information criterion (AIC) or Vuong test to identify the correct model. The simulation studies show that hurdle and zero inflated models have well controlled type I errors, higher power, better goodness of fit measures, and are more accurate and efficient in the parameter estimation. Besides that, the hurdle models have similar goodness of fit and parameter estimation for the count component as their corresponding zero inflated models. However, the estimation and interpretation of the parameters for the zero components differs, and hurdle models are more stable when structural zeros are absent. We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.

No MeSH data available.


Related in: MedlinePlus