Limits...
A new distribution-free approach to constructing the confidence region for multiple parameters.

Hu Z, Yang RC - PLoS ONE (2013)

Bottom Line: Construction of confidence intervals or regions is an important part of statistical inference.The usual approach to constructing a confidence interval for a single parameter or confidence region for two or more parameters requires that the distribution of estimated parameters is known or can be assumed.In reality, the sampling distributions of parameters of biological importance are often unknown or difficult to be characterized.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada.

ABSTRACT
Construction of confidence intervals or regions is an important part of statistical inference. The usual approach to constructing a confidence interval for a single parameter or confidence region for two or more parameters requires that the distribution of estimated parameters is known or can be assumed. In reality, the sampling distributions of parameters of biological importance are often unknown or difficult to be characterized. Distribution-free nonparametric resampling methods such as bootstrapping and permutation have been widely used to construct the confidence interval for a single parameter. There are also several parametric (ellipse) and nonparametric (convex hull peeling, bagplot and HPDregionplot) methods available for constructing confidence regions for two or more parameters. However, these methods have some key deficiencies including biased estimation of the true coverage rate, failure to account for the shape of the distribution inherent in the data and difficulty to implement. The purpose of this paper is to develop a new distribution-free method for constructing the confidence region that is based only on a few basic geometrical principles and accounts for the actual shape of the distribution inherent in the real data. The new method is implemented in an R package, distfree.cr/R. The statistical properties of the new method are evaluated and compared with those of the other methods through Monte Carlo simulation. Our new method outperforms the other methods regardless of whether the samples are taken from normal or non-normal bivariate distributions. In addition, the superiority of our method is consistent across different sample sizes and different levels of correlation between the two variables. We also analyze three biological data sets to illustrate the use of our new method for genomics and other biological researches.

Show MeSH

Related in: MedlinePlus

Biplot of 18 genotypic scores and nine environmental scores from the Ontario winter wheat data.The 95% confidence regions are constructed for the genotypic and environmental scores using 10,000 bootstrap samples.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3852254&req=5

pone-0081179-g011: Biplot of 18 genotypic scores and nine environmental scores from the Ontario winter wheat data.The 95% confidence regions are constructed for the genotypic and environmental scores using 10,000 bootstrap samples.

Mentions: The biplot of PC1 vs. PC2 genotypic and environmental scores along with the 95% CR is presented in Figure 11. The PC1 and PC2 account for about 78% of the total variability. To highlight key features in the biplot, the CR are displayed only for those scores that are significantly different from the origin of the biplot [i.e., the CR of the scores that do not include the point of (0,0)]. A hexagon is drawn to connect six genotypes (G3, G7, G8, G12, G13 and G18) that are located at the corners (i.e., vertices) of the hexagon in the biplot. To further facilitate the interpretation of the biplot, six line segments perpendicular to different sides of the polygon are drawn through the origin to subdivide the polygon into six sectors involving different subsets of environments and genotypes: the genotype at the corner of each sector is considered as the ‘best’ performer in the environments included in that sector as often claimed in the earlier studies (e.g., Yan et al. [30]). However, it is evident from the 95% CR of the scores that the ‘best’ genotypes are often not statistically different from other genotypes. For example, genotype G8 at the upright corner is indistinguishable from genotypes G4 and G10 in the same sector, judging from their overlapped CR. Simple visual inspection of the biplot [30] claimed that genotype G18 yielded more than genotype G8 in eastern Ontario (represented by E5 and E7) and G8 yielded more than G18 in southwestern Ontario (represented by the other seven environments). With the 95% CR being now attached to individual scores (Figure 11), this claim is no longer true because the CRs for G8 and G18 overlap. Thus, identification of superior genotypes or mega-environments based on the initial inspection of biplots is simply a curious visual observation only and it must be substantiated by subsequent parametric or non-parametric statistical assessments before being recommended for practical utility.


A new distribution-free approach to constructing the confidence region for multiple parameters.

Hu Z, Yang RC - PLoS ONE (2013)

Biplot of 18 genotypic scores and nine environmental scores from the Ontario winter wheat data.The 95% confidence regions are constructed for the genotypic and environmental scores using 10,000 bootstrap samples.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3852254&req=5

pone-0081179-g011: Biplot of 18 genotypic scores and nine environmental scores from the Ontario winter wheat data.The 95% confidence regions are constructed for the genotypic and environmental scores using 10,000 bootstrap samples.
Mentions: The biplot of PC1 vs. PC2 genotypic and environmental scores along with the 95% CR is presented in Figure 11. The PC1 and PC2 account for about 78% of the total variability. To highlight key features in the biplot, the CR are displayed only for those scores that are significantly different from the origin of the biplot [i.e., the CR of the scores that do not include the point of (0,0)]. A hexagon is drawn to connect six genotypes (G3, G7, G8, G12, G13 and G18) that are located at the corners (i.e., vertices) of the hexagon in the biplot. To further facilitate the interpretation of the biplot, six line segments perpendicular to different sides of the polygon are drawn through the origin to subdivide the polygon into six sectors involving different subsets of environments and genotypes: the genotype at the corner of each sector is considered as the ‘best’ performer in the environments included in that sector as often claimed in the earlier studies (e.g., Yan et al. [30]). However, it is evident from the 95% CR of the scores that the ‘best’ genotypes are often not statistically different from other genotypes. For example, genotype G8 at the upright corner is indistinguishable from genotypes G4 and G10 in the same sector, judging from their overlapped CR. Simple visual inspection of the biplot [30] claimed that genotype G18 yielded more than genotype G8 in eastern Ontario (represented by E5 and E7) and G8 yielded more than G18 in southwestern Ontario (represented by the other seven environments). With the 95% CR being now attached to individual scores (Figure 11), this claim is no longer true because the CRs for G8 and G18 overlap. Thus, identification of superior genotypes or mega-environments based on the initial inspection of biplots is simply a curious visual observation only and it must be substantiated by subsequent parametric or non-parametric statistical assessments before being recommended for practical utility.

Bottom Line: Construction of confidence intervals or regions is an important part of statistical inference.The usual approach to constructing a confidence interval for a single parameter or confidence region for two or more parameters requires that the distribution of estimated parameters is known or can be assumed.In reality, the sampling distributions of parameters of biological importance are often unknown or difficult to be characterized.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada.

ABSTRACT
Construction of confidence intervals or regions is an important part of statistical inference. The usual approach to constructing a confidence interval for a single parameter or confidence region for two or more parameters requires that the distribution of estimated parameters is known or can be assumed. In reality, the sampling distributions of parameters of biological importance are often unknown or difficult to be characterized. Distribution-free nonparametric resampling methods such as bootstrapping and permutation have been widely used to construct the confidence interval for a single parameter. There are also several parametric (ellipse) and nonparametric (convex hull peeling, bagplot and HPDregionplot) methods available for constructing confidence regions for two or more parameters. However, these methods have some key deficiencies including biased estimation of the true coverage rate, failure to account for the shape of the distribution inherent in the data and difficulty to implement. The purpose of this paper is to develop a new distribution-free method for constructing the confidence region that is based only on a few basic geometrical principles and accounts for the actual shape of the distribution inherent in the real data. The new method is implemented in an R package, distfree.cr/R. The statistical properties of the new method are evaluated and compared with those of the other methods through Monte Carlo simulation. Our new method outperforms the other methods regardless of whether the samples are taken from normal or non-normal bivariate distributions. In addition, the superiority of our method is consistent across different sample sizes and different levels of correlation between the two variables. We also analyze three biological data sets to illustrate the use of our new method for genomics and other biological researches.

Show MeSH
Related in: MedlinePlus