Limits...
Power estimation of tests in log-linear non-uniform association models for ordinal agreement.

Valet F, Mary JY - BMC Med Res Methodol (2011)

Bottom Line: In additition, even for large samples, marginal heterogeneities within raters led to a decrease in power estimates.This paper provided some issues about how many objects had to be classified by two independent observers (or by the same observer at two different times) to be able to detect a given scale structure defect.Our results also highlighted the importance of marginal homogeneity within raters, to ensure optimal power when using non-uniform association models.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institut Curie, Ecole des Mines de Paris, INSERM U900, Paris, France. fabien.valet@curie.net

ABSTRACT

Background: Log-linear association models have been extensively used to investigate the pattern of agreement between ordinal ratings. In 2007, log-linear non-uniform association models were introduced to estimate, from a cross-classification of two independent raters using an ordinal scale, varying degrees of distinguishability between distant and adjacent categories of the scale.

Methods: In this paper, a simple method based on simulations was proposed to estimate the power of non-uniform association models to detect heterogeneities across distinguishabilities between adjacent categories of an ordinal scale, illustrating some possible scale defects.

Results: Different scenarios of distinguishability patterns were investigated, as well as different scenarios of marginal heterogeneity within rater. For sample size of N = 50, the probabilities of detecting heterogeneities within the tables are lower than .80, whatever the number of categories. In additition, even for large samples, marginal heterogeneities within raters led to a decrease in power estimates.

Conclusion: This paper provided some issues about how many objects had to be classified by two independent observers (or by the same observer at two different times) to be able to detect a given scale structure defect. Our results also highlighted the importance of marginal homogeneity within raters, to ensure optimal power when using non-uniform association models.

Show MeSH
Power estimates of tests with alternative hypotheses given by : β1,2 ≠ β2,3 = β3,4 = β4,5 = log(3), : β1,2 = β2,3 ≠ β3,4 = β4,5 = log(3), : β1,2 = β4,5 ≠ β2,3 = β3,4 = log(3) for (a, d), (b, e) and (c, f) respectively. Marginal probabilities are given by .
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3118948&req=5

Figure 1: Power estimates of tests with alternative hypotheses given by : β1,2 ≠ β2,3 = β3,4 = β4,5 = log(3), : β1,2 = β2,3 ≠ β3,4 = β4,5 = log(3), : β1,2 = β4,5 ≠ β2,3 = β3,4 = log(3) for (a, d), (b, e) and (c, f) respectively. Marginal probabilities are given by .

Mentions: All simulations and power estimations were performed using R software [22]. Association parameters were equal to log(3) under the hypothesis (i.e. OR equal to 3) and for each alternative hypothesis, the values K of the tested OR ranged from 1 to 16, which corresponds to association parameters ranging from log(1) = 0, to log(16) = 2.77. Thus, for a specific alternative hypothesis, each specific set of association parameters {βk, k+1; k = 1,..., 4} contained some fixed parameters equal to log(3) depicting the hypothesis, and some varying parameters ranging from 0 to 2.77 depicting the alternative hypotheses. Simulations results were firstly displayed on Figure 1, illustrating for each simulated scenario, the power estimates of tests with alternative hypotheses corresponding to the different NUA models tested. In others words, this figure represents the probability of finding significant heterogeneities within the DDs between adjacent categories, according to the total sample size N, three different alternative hypotheses, and for different values K of tested OR. Left panel (Figure 1, examples a. to c.) corresponds to simulated scenarios with homogeneous marginal distributions within rater, whereas right panel (Figure 1, examples d. to f.) corresponds to simulated scenarios with three different sets of heterogeneous marginal distributions. We can observe that power estimates were constantly lower in scenarios with heterogeneous marginal distributions (right panel) as compared to those with homogeneous marginal distributions (left panel). In some cases, influence of marginal distributions heterogeneity was even drastic and strongly penalized NUA models ability in detecting significant heterogeneities within DDs between adjacent categories (Figure 1, example d.). For total sample sizes of N ≤ 100, we can also note that none of the simulated scenarios provided power estimates greater than 80%. Conversely, except for example given in Figure 1, example d., power estimates were greater than 80% for tested OR K ≥ 12, for all the tested hypotheses. Then, power estimates were given in table 3. Like in Figure 1, this table shows power estimates as a function of N, the three different alternative hypotheses, and the different values K of the tested OR. In a similar way, left panel corresponds to simulated scenarios with homogeneous marginal distribution, whereas right panel corresponds to different situations of heterogeneity within marginal distributions. For example, from the hypothesis that all OR are equal to 3, i.e. DDs between all adjacent categories equal to 2/3, the power estimates of test corresponding to i) an alternative given by : β1,2 ≠ β2,3 = β3,4 = β4,5, ii) an homogeneous marginal distribution, and iii) a total sample size equal to N = 250, are greater than 80% for OR greater or equal to 10. In others words, for N = 250, NUA models are able to detect with a probability greater than 80%, DD between adjacent categories 1 and 2, greater than 1-1/10=.90. For the left panel of this table and for the hypothesis of a different DD between the first two adjacent categories as compared to the others, NUA models are able to detect with a probability greater than 80%: a DD or DDs greater than .92 for N ≥ 200, and DDs greater than .94 for N ≥ 150. In a similar way, for N = 200, NUA models are able to detect different DD between close and symmetric adjacent categories ( and , respectively) with a probability greater than 80% for DD or DDs greater than .90.


Power estimation of tests in log-linear non-uniform association models for ordinal agreement.

Valet F, Mary JY - BMC Med Res Methodol (2011)

Power estimates of tests with alternative hypotheses given by : β1,2 ≠ β2,3 = β3,4 = β4,5 = log(3), : β1,2 = β2,3 ≠ β3,4 = β4,5 = log(3), : β1,2 = β4,5 ≠ β2,3 = β3,4 = log(3) for (a, d), (b, e) and (c, f) respectively. Marginal probabilities are given by .
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3118948&req=5

Figure 1: Power estimates of tests with alternative hypotheses given by : β1,2 ≠ β2,3 = β3,4 = β4,5 = log(3), : β1,2 = β2,3 ≠ β3,4 = β4,5 = log(3), : β1,2 = β4,5 ≠ β2,3 = β3,4 = log(3) for (a, d), (b, e) and (c, f) respectively. Marginal probabilities are given by .
Mentions: All simulations and power estimations were performed using R software [22]. Association parameters were equal to log(3) under the hypothesis (i.e. OR equal to 3) and for each alternative hypothesis, the values K of the tested OR ranged from 1 to 16, which corresponds to association parameters ranging from log(1) = 0, to log(16) = 2.77. Thus, for a specific alternative hypothesis, each specific set of association parameters {βk, k+1; k = 1,..., 4} contained some fixed parameters equal to log(3) depicting the hypothesis, and some varying parameters ranging from 0 to 2.77 depicting the alternative hypotheses. Simulations results were firstly displayed on Figure 1, illustrating for each simulated scenario, the power estimates of tests with alternative hypotheses corresponding to the different NUA models tested. In others words, this figure represents the probability of finding significant heterogeneities within the DDs between adjacent categories, according to the total sample size N, three different alternative hypotheses, and for different values K of tested OR. Left panel (Figure 1, examples a. to c.) corresponds to simulated scenarios with homogeneous marginal distributions within rater, whereas right panel (Figure 1, examples d. to f.) corresponds to simulated scenarios with three different sets of heterogeneous marginal distributions. We can observe that power estimates were constantly lower in scenarios with heterogeneous marginal distributions (right panel) as compared to those with homogeneous marginal distributions (left panel). In some cases, influence of marginal distributions heterogeneity was even drastic and strongly penalized NUA models ability in detecting significant heterogeneities within DDs between adjacent categories (Figure 1, example d.). For total sample sizes of N ≤ 100, we can also note that none of the simulated scenarios provided power estimates greater than 80%. Conversely, except for example given in Figure 1, example d., power estimates were greater than 80% for tested OR K ≥ 12, for all the tested hypotheses. Then, power estimates were given in table 3. Like in Figure 1, this table shows power estimates as a function of N, the three different alternative hypotheses, and the different values K of the tested OR. In a similar way, left panel corresponds to simulated scenarios with homogeneous marginal distribution, whereas right panel corresponds to different situations of heterogeneity within marginal distributions. For example, from the hypothesis that all OR are equal to 3, i.e. DDs between all adjacent categories equal to 2/3, the power estimates of test corresponding to i) an alternative given by : β1,2 ≠ β2,3 = β3,4 = β4,5, ii) an homogeneous marginal distribution, and iii) a total sample size equal to N = 250, are greater than 80% for OR greater or equal to 10. In others words, for N = 250, NUA models are able to detect with a probability greater than 80%, DD between adjacent categories 1 and 2, greater than 1-1/10=.90. For the left panel of this table and for the hypothesis of a different DD between the first two adjacent categories as compared to the others, NUA models are able to detect with a probability greater than 80%: a DD or DDs greater than .92 for N ≥ 200, and DDs greater than .94 for N ≥ 150. In a similar way, for N = 200, NUA models are able to detect different DD between close and symmetric adjacent categories ( and , respectively) with a probability greater than 80% for DD or DDs greater than .90.

Bottom Line: In additition, even for large samples, marginal heterogeneities within raters led to a decrease in power estimates.This paper provided some issues about how many objects had to be classified by two independent observers (or by the same observer at two different times) to be able to detect a given scale structure defect.Our results also highlighted the importance of marginal homogeneity within raters, to ensure optimal power when using non-uniform association models.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institut Curie, Ecole des Mines de Paris, INSERM U900, Paris, France. fabien.valet@curie.net

ABSTRACT

Background: Log-linear association models have been extensively used to investigate the pattern of agreement between ordinal ratings. In 2007, log-linear non-uniform association models were introduced to estimate, from a cross-classification of two independent raters using an ordinal scale, varying degrees of distinguishability between distant and adjacent categories of the scale.

Methods: In this paper, a simple method based on simulations was proposed to estimate the power of non-uniform association models to detect heterogeneities across distinguishabilities between adjacent categories of an ordinal scale, illustrating some possible scale defects.

Results: Different scenarios of distinguishability patterns were investigated, as well as different scenarios of marginal heterogeneity within rater. For sample size of N = 50, the probabilities of detecting heterogeneities within the tables are lower than .80, whatever the number of categories. In additition, even for large samples, marginal heterogeneities within raters led to a decrease in power estimates.

Conclusion: This paper provided some issues about how many objects had to be classified by two independent observers (or by the same observer at two different times) to be able to detect a given scale structure defect. Our results also highlighted the importance of marginal homogeneity within raters, to ensure optimal power when using non-uniform association models.

Show MeSH