Alphas, betas and skewy distributions: two ways of getting the wrong answer.
Bottom Line:
Although many parametric statistical tests are considered to be robust, as recently shown in Methodologist's Corner, it still pays to be circumspect about the assumptions underlying statistical tests.In this paper I show that robustness mainly refers to α, the type-I error.If the underlying distribution of data is ignored there can be a major penalty in terms of the β, the type-II error, representing a large increase in false negative rate or, equivalently, a severe loss of power of the test.
View Article:
PubMed Central - PubMed
Affiliation: Institute of Applied Health Sciences, School of Medicine and Dentistry, University of Aberdeen, Aberdeen, UK. p.fayers@abdn.ac.uk
ABSTRACT
Show MeSH
Although many parametric statistical tests are considered to be robust, as recently shown in Methodologist's Corner, it still pays to be circumspect about the assumptions underlying statistical tests. In this paper I show that robustness mainly refers to α, the type-I error. If the underlying distribution of data is ignored there can be a major penalty in terms of the β, the type-II error, representing a large increase in false negative rate or, equivalently, a severe loss of power of the test. |
Related In:
Results -
Collection
getmorefigures.php?uid=PMC3139856&req=5
Mentions: Computer-generated random numbers were used to produce data that followed a log-normal distribution. Technically, this was accomplished by first generating random observations from a normal distribution with mean 0 and standard deviation 1, and then using an exponential transformation. A constant was added to the first 50% of these observations, creating a group of observations with an increased mean value. This increase was measured in terms of effect sizes, where the effect size is the mean difference expressed as a multiple of the standard deviation. For example, to simulate a study with a sample size of 100 observations per group, 200 log-normal data items were generated and the first 100 increased. Thus, for example, to produce an effect size of 0.5 which is generally regarded as an effect of moderate magnitude, a value of 0.5 would be added to the normal values because the standard deviation had been set at 1.0. A t test was then applied. A Wilcoxon two-sample signed-rank test was also used, and in addition a logarithmic transformation was applied before a second t test. This was repeated some 30,000 times, to obtain a reasonably precise estimate of the proportion of times that such an effect size, in a study of this magnitude, would result in a difference being found significant (p < 0.05). The whole exercise was repeated over and again, for varying numbers of observations in the two comparison groups, and with varying effect sizes The effect sizes considered ranged from 0 (no difference in mean values), through 0.2, 0.5, 0.8 and 1.0, but results are only displayed for an effect size of 0.5 (Fig. 2) and zero (Fig. 3). Thus Fig. 2 shows the power to detect an effect size of 0.5, for varying sample sizes. It should be noted that when the effect size is zero (Fig. 3), there is no difference between the two simulated groups and so the proportion of p values deemed “significant” will represent false positives––that is, in a robust and reliable significance test, on average 5% of results ought to be declared “significant, p < 0.05”.Fig. 2 |
View Article: PubMed Central - PubMed
Affiliation: Institute of Applied Health Sciences, School of Medicine and Dentistry, University of Aberdeen, Aberdeen, UK. p.fayers@abdn.ac.uk