Limits...
A new distribution-free approach to constructing the confidence region for multiple parameters.

Hu Z, Yang RC - PLoS ONE (2013)

Bottom Line: Construction of confidence intervals or regions is an important part of statistical inference.The usual approach to constructing a confidence interval for a single parameter or confidence region for two or more parameters requires that the distribution of estimated parameters is known or can be assumed.In reality, the sampling distributions of parameters of biological importance are often unknown or difficult to be characterized.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada.

ABSTRACT
Construction of confidence intervals or regions is an important part of statistical inference. The usual approach to constructing a confidence interval for a single parameter or confidence region for two or more parameters requires that the distribution of estimated parameters is known or can be assumed. In reality, the sampling distributions of parameters of biological importance are often unknown or difficult to be characterized. Distribution-free nonparametric resampling methods such as bootstrapping and permutation have been widely used to construct the confidence interval for a single parameter. There are also several parametric (ellipse) and nonparametric (convex hull peeling, bagplot and HPDregionplot) methods available for constructing confidence regions for two or more parameters. However, these methods have some key deficiencies including biased estimation of the true coverage rate, failure to account for the shape of the distribution inherent in the data and difficulty to implement. The purpose of this paper is to develop a new distribution-free method for constructing the confidence region that is based only on a few basic geometrical principles and accounts for the actual shape of the distribution inherent in the real data. The new method is implemented in an R package, distfree.cr/R. The statistical properties of the new method are evaluated and compared with those of the other methods through Monte Carlo simulation. Our new method outperforms the other methods regardless of whether the samples are taken from normal or non-normal bivariate distributions. In addition, the superiority of our method is consistent across different sample sizes and different levels of correlation between the two variables. We also analyze three biological data sets to illustrate the use of our new method for genomics and other biological researches.

Show MeSH

Related in: MedlinePlus

The 95% empirical confidence regions estimated by the four methods (distfree, ellipse, convex hull peeling and HPDregionplot) in simulation II which is detailed in Figure 4.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3852254&req=5

pone-0081179-g007: The 95% empirical confidence regions estimated by the four methods (distfree, ellipse, convex hull peeling and HPDregionplot) in simulation II which is detailed in Figure 4.

Mentions: In all three simulations, our method outperforms other methods (Figures 3, 4, and 5) as the realized- estimates by our method is close to or coincides with the true significance levels for both small (n = 200) and large (n = 10,000) samples with all three values. The classic ellipsoidal method provides overestimation when is low and underestimation when is high. All methods including the ellipsoid approach produce similar 95% CRs for the data from the bivariate normal distribution as in simulation I (Figure 6). However, the CRs determined by the ellipsoid approach fail to account for the actual shapes of non-normal sampling distributions as in simulations II and III (Figures 7 and 8). The HPDregionplot is the most sophisticated strategy in capturing the shape of non-normal sampling distribution in all simulations. However, the realized- estimates by the HPDregionplot approach are constantly lower than the true significance levels; the underestimation tends to increase with the significant level and the correlation (), and it is more pronounced for non-normal data in simulations II (Figure 4) and III (Figure 5) than for normal data in simulation I (Figure 3). It is somewhat surprising to note that the bagplot method performs as well as our method with small sample (n = 200) but it performs poorly with the large sample (n = 10,000) particularly when is high.


A new distribution-free approach to constructing the confidence region for multiple parameters.

Hu Z, Yang RC - PLoS ONE (2013)

The 95% empirical confidence regions estimated by the four methods (distfree, ellipse, convex hull peeling and HPDregionplot) in simulation II which is detailed in Figure 4.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3852254&req=5

pone-0081179-g007: The 95% empirical confidence regions estimated by the four methods (distfree, ellipse, convex hull peeling and HPDregionplot) in simulation II which is detailed in Figure 4.
Mentions: In all three simulations, our method outperforms other methods (Figures 3, 4, and 5) as the realized- estimates by our method is close to or coincides with the true significance levels for both small (n = 200) and large (n = 10,000) samples with all three values. The classic ellipsoidal method provides overestimation when is low and underestimation when is high. All methods including the ellipsoid approach produce similar 95% CRs for the data from the bivariate normal distribution as in simulation I (Figure 6). However, the CRs determined by the ellipsoid approach fail to account for the actual shapes of non-normal sampling distributions as in simulations II and III (Figures 7 and 8). The HPDregionplot is the most sophisticated strategy in capturing the shape of non-normal sampling distribution in all simulations. However, the realized- estimates by the HPDregionplot approach are constantly lower than the true significance levels; the underestimation tends to increase with the significant level and the correlation (), and it is more pronounced for non-normal data in simulations II (Figure 4) and III (Figure 5) than for normal data in simulation I (Figure 3). It is somewhat surprising to note that the bagplot method performs as well as our method with small sample (n = 200) but it performs poorly with the large sample (n = 10,000) particularly when is high.

Bottom Line: Construction of confidence intervals or regions is an important part of statistical inference.The usual approach to constructing a confidence interval for a single parameter or confidence region for two or more parameters requires that the distribution of estimated parameters is known or can be assumed.In reality, the sampling distributions of parameters of biological importance are often unknown or difficult to be characterized.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada.

ABSTRACT
Construction of confidence intervals or regions is an important part of statistical inference. The usual approach to constructing a confidence interval for a single parameter or confidence region for two or more parameters requires that the distribution of estimated parameters is known or can be assumed. In reality, the sampling distributions of parameters of biological importance are often unknown or difficult to be characterized. Distribution-free nonparametric resampling methods such as bootstrapping and permutation have been widely used to construct the confidence interval for a single parameter. There are also several parametric (ellipse) and nonparametric (convex hull peeling, bagplot and HPDregionplot) methods available for constructing confidence regions for two or more parameters. However, these methods have some key deficiencies including biased estimation of the true coverage rate, failure to account for the shape of the distribution inherent in the data and difficulty to implement. The purpose of this paper is to develop a new distribution-free method for constructing the confidence region that is based only on a few basic geometrical principles and accounts for the actual shape of the distribution inherent in the real data. The new method is implemented in an R package, distfree.cr/R. The statistical properties of the new method are evaluated and compared with those of the other methods through Monte Carlo simulation. Our new method outperforms the other methods regardless of whether the samples are taken from normal or non-normal bivariate distributions. In addition, the superiority of our method is consistent across different sample sizes and different levels of correlation between the two variables. We also analyze three biological data sets to illustrate the use of our new method for genomics and other biological researches.

Show MeSH
Related in: MedlinePlus