Limits...
Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples.

Nahorniak M, Larsen DP, Volk C, Jordan CE - PLoS ONE (2015)

Bottom Line: In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB).We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data.In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.

View Article: PubMed Central - PubMed

Affiliation: South Fork Research, Inc., North Bend, Washington, United States of America.

ABSTRACT
In ecology, as in other research fields, efficient sampling for population estimation often drives sample designs toward unequal probability sampling, such as in stratified sampling. Design based statistical analysis tools are appropriate for seamless integration of sample design into the statistical analysis. However, it is also common and necessary, after a sampling design has been implemented, to use datasets to address questions that, in many cases, were not considered during the sampling design phase. Questions may arise requiring the use of model based statistical tools such as multiple regression, quantile regression, or regression tree analysis. However, such model based tools may require, for ensuring unbiased estimation, data from simple random samples, which can be problematic when analyzing data from unequal probability designs. Despite numerous method specific tools available to properly account for sampling design, too often in the analysis of ecological data, sample design is ignored and consequences are not properly considered. We demonstrate here that violation of this assumption can lead to biased parameter estimates in ecological research. In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB). Inverse probability bootstrapping is an easily implemented method for obtaining equal probability re-samples from a probability sample, from which unbiased model based estimates can be made. We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data. For illustration, we considered three model based analysis tools--linear regression, quantile regression, and boosted regression tree analysis. In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.

No MeSH data available.


Related in: MedlinePlus

Distribution of estimated slopes for quantile regression, using simple random samples (SRS), stratified samples fit ignoring sample inclusion probabilities (Strat), and regression using Inverse Probability Bootstrap samples (IPB)
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4488419&req=5

pone.0131765.g004: Distribution of estimated slopes for quantile regression, using simple random samples (SRS), stratified samples fit ignoring sample inclusion probabilities (Strat), and regression using Inverse Probability Bootstrap samples (IPB)

Mentions: Using quantile regression, we again found bias to be present when sample inclusion probabilities were ignored. Errors in estimated coefficients are higher for 5 of 6 coefficients and for the intercept when sample inclusion probabilities were ignored than for estimates made using simple random sampling or IPB sampling (Table 6 and Fig 4).


Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples.

Nahorniak M, Larsen DP, Volk C, Jordan CE - PLoS ONE (2015)

Distribution of estimated slopes for quantile regression, using simple random samples (SRS), stratified samples fit ignoring sample inclusion probabilities (Strat), and regression using Inverse Probability Bootstrap samples (IPB)
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4488419&req=5

pone.0131765.g004: Distribution of estimated slopes for quantile regression, using simple random samples (SRS), stratified samples fit ignoring sample inclusion probabilities (Strat), and regression using Inverse Probability Bootstrap samples (IPB)
Mentions: Using quantile regression, we again found bias to be present when sample inclusion probabilities were ignored. Errors in estimated coefficients are higher for 5 of 6 coefficients and for the intercept when sample inclusion probabilities were ignored than for estimates made using simple random sampling or IPB sampling (Table 6 and Fig 4).

Bottom Line: In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB).We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data.In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.

View Article: PubMed Central - PubMed

Affiliation: South Fork Research, Inc., North Bend, Washington, United States of America.

ABSTRACT
In ecology, as in other research fields, efficient sampling for population estimation often drives sample designs toward unequal probability sampling, such as in stratified sampling. Design based statistical analysis tools are appropriate for seamless integration of sample design into the statistical analysis. However, it is also common and necessary, after a sampling design has been implemented, to use datasets to address questions that, in many cases, were not considered during the sampling design phase. Questions may arise requiring the use of model based statistical tools such as multiple regression, quantile regression, or regression tree analysis. However, such model based tools may require, for ensuring unbiased estimation, data from simple random samples, which can be problematic when analyzing data from unequal probability designs. Despite numerous method specific tools available to properly account for sampling design, too often in the analysis of ecological data, sample design is ignored and consequences are not properly considered. We demonstrate here that violation of this assumption can lead to biased parameter estimates in ecological research. In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB). Inverse probability bootstrapping is an easily implemented method for obtaining equal probability re-samples from a probability sample, from which unbiased model based estimates can be made. We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data. For illustration, we considered three model based analysis tools--linear regression, quantile regression, and boosted regression tree analysis. In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.

No MeSH data available.


Related in: MedlinePlus