Limits...
Alphas, betas and skewy distributions: two ways of getting the wrong answer.

Fayers P - Adv Health Sci Educ Theory Pract (2011)

Bottom Line: Although many parametric statistical tests are considered to be robust, as recently shown in Methodologist's Corner, it still pays to be circumspect about the assumptions underlying statistical tests.In this paper I show that robustness mainly refers to α, the type-I error.If the underlying distribution of data is ignored there can be a major penalty in terms of the β, the type-II error, representing a large increase in false negative rate or, equivalently, a severe loss of power of the test.

View Article: PubMed Central - PubMed

Affiliation: Institute of Applied Health Sciences, School of Medicine and Dentistry, University of Aberdeen, Aberdeen, UK. p.fayers@abdn.ac.uk

ABSTRACT
Although many parametric statistical tests are considered to be robust, as recently shown in Methodologist's Corner, it still pays to be circumspect about the assumptions underlying statistical tests. In this paper I show that robustness mainly refers to α, the type-I error. If the underlying distribution of data is ignored there can be a major penalty in terms of the β, the type-II error, representing a large increase in false negative rate or, equivalently, a severe loss of power of the test.

Show MeSH
The relation between type-I error and sample size when a t test, a t test on logarithms and a Wilcoxon rank test are applied to log-normal data. Here, the effect size is 0.0 implying that the  hypothesis of “no difference” is true, because the type-I error is “the probability of falsely rejecting the  hypothesis when it is true”. The sample sizes shown are the number of subjects in each of the two groups
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3139856&req=5

Fig3: The relation between type-I error and sample size when a t test, a t test on logarithms and a Wilcoxon rank test are applied to log-normal data. Here, the effect size is 0.0 implying that the hypothesis of “no difference” is true, because the type-I error is “the probability of falsely rejecting the hypothesis when it is true”. The sample sizes shown are the number of subjects in each of the two groups

Mentions: Computer-generated random numbers were used to produce data that followed a log-normal distribution. Technically, this was accomplished by first generating random observations from a normal distribution with mean 0 and standard deviation 1, and then using an exponential transformation. A constant was added to the first 50% of these observations, creating a group of observations with an increased mean value. This increase was measured in terms of effect sizes, where the effect size is the mean difference expressed as a multiple of the standard deviation. For example, to simulate a study with a sample size of 100 observations per group, 200 log-normal data items were generated and the first 100 increased. Thus, for example, to produce an effect size of 0.5 which is generally regarded as an effect of moderate magnitude, a value of 0.5 would be added to the normal values because the standard deviation had been set at 1.0. A t test was then applied. A Wilcoxon two-sample signed-rank test was also used, and in addition a logarithmic transformation was applied before a second t test. This was repeated some 30,000 times, to obtain a reasonably precise estimate of the proportion of times that such an effect size, in a study of this magnitude, would result in a difference being found significant (p < 0.05). The whole exercise was repeated over and again, for varying numbers of observations in the two comparison groups, and with varying effect sizes The effect sizes considered ranged from 0 (no difference in mean values), through 0.2, 0.5, 0.8 and 1.0, but results are only displayed for an effect size of 0.5 (Fig. 2) and zero (Fig. 3). Thus Fig. 2 shows the power to detect an effect size of 0.5, for varying sample sizes. It should be noted that when the effect size is zero (Fig. 3), there is no difference between the two simulated groups and so the proportion of p values deemed “significant” will represent false positives––that is, in a robust and reliable significance test, on average 5% of results ought to be declared “significant, p < 0.05”.Fig. 2


Alphas, betas and skewy distributions: two ways of getting the wrong answer.

Fayers P - Adv Health Sci Educ Theory Pract (2011)

The relation between type-I error and sample size when a t test, a t test on logarithms and a Wilcoxon rank test are applied to log-normal data. Here, the effect size is 0.0 implying that the  hypothesis of “no difference” is true, because the type-I error is “the probability of falsely rejecting the  hypothesis when it is true”. The sample sizes shown are the number of subjects in each of the two groups
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3139856&req=5

Fig3: The relation between type-I error and sample size when a t test, a t test on logarithms and a Wilcoxon rank test are applied to log-normal data. Here, the effect size is 0.0 implying that the hypothesis of “no difference” is true, because the type-I error is “the probability of falsely rejecting the hypothesis when it is true”. The sample sizes shown are the number of subjects in each of the two groups
Mentions: Computer-generated random numbers were used to produce data that followed a log-normal distribution. Technically, this was accomplished by first generating random observations from a normal distribution with mean 0 and standard deviation 1, and then using an exponential transformation. A constant was added to the first 50% of these observations, creating a group of observations with an increased mean value. This increase was measured in terms of effect sizes, where the effect size is the mean difference expressed as a multiple of the standard deviation. For example, to simulate a study with a sample size of 100 observations per group, 200 log-normal data items were generated and the first 100 increased. Thus, for example, to produce an effect size of 0.5 which is generally regarded as an effect of moderate magnitude, a value of 0.5 would be added to the normal values because the standard deviation had been set at 1.0. A t test was then applied. A Wilcoxon two-sample signed-rank test was also used, and in addition a logarithmic transformation was applied before a second t test. This was repeated some 30,000 times, to obtain a reasonably precise estimate of the proportion of times that such an effect size, in a study of this magnitude, would result in a difference being found significant (p < 0.05). The whole exercise was repeated over and again, for varying numbers of observations in the two comparison groups, and with varying effect sizes The effect sizes considered ranged from 0 (no difference in mean values), through 0.2, 0.5, 0.8 and 1.0, but results are only displayed for an effect size of 0.5 (Fig. 2) and zero (Fig. 3). Thus Fig. 2 shows the power to detect an effect size of 0.5, for varying sample sizes. It should be noted that when the effect size is zero (Fig. 3), there is no difference between the two simulated groups and so the proportion of p values deemed “significant” will represent false positives––that is, in a robust and reliable significance test, on average 5% of results ought to be declared “significant, p < 0.05”.Fig. 2

Bottom Line: Although many parametric statistical tests are considered to be robust, as recently shown in Methodologist's Corner, it still pays to be circumspect about the assumptions underlying statistical tests.In this paper I show that robustness mainly refers to α, the type-I error.If the underlying distribution of data is ignored there can be a major penalty in terms of the β, the type-II error, representing a large increase in false negative rate or, equivalently, a severe loss of power of the test.

View Article: PubMed Central - PubMed

Affiliation: Institute of Applied Health Sciences, School of Medicine and Dentistry, University of Aberdeen, Aberdeen, UK. p.fayers@abdn.ac.uk

ABSTRACT
Although many parametric statistical tests are considered to be robust, as recently shown in Methodologist's Corner, it still pays to be circumspect about the assumptions underlying statistical tests. In this paper I show that robustness mainly refers to α, the type-I error. If the underlying distribution of data is ignored there can be a major penalty in terms of the β, the type-II error, representing a large increase in false negative rate or, equivalently, a severe loss of power of the test.

Show MeSH