Limits...
Elucidating the foundations of statistical inference with 2 x 2 tables.

Choi L, Blume JD, Dupont WD - PLoS ONE (2015)

Bottom Line: To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice.The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent.Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA.

ABSTRACT
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.

No MeSH data available.


The standardized conditional, modified profile, and profile likelihood functions are depicted for the log odds ratio ψ using the data in Fig. 1, Fig. 2, Fig. 3 and Fig. 4.The example numbers in this figure correspond to the examples described in Figs. 1–4. The profile likelihood is represented by a dashed black line, while the conditional and modified profile likelihoods are represented by thick red and black dotted lines, respectively. The horizontal lines represent 1/6.8 (upper), 1/8 (middle) and 1/32 (lower) likelihood support intervals (SIs). The maximum likelihood estimate (MLE)  of each likelihood was also shown. For normally distributed data, a 1/6.8 SI and a Frequentist 95% confidence interval are identical. Note that the modified profile and conditional likelihoods are indistinguishable for all examples, while the profile and conditional likelihoods are similar for the examples of the  (i.e., ψ = 0 in Examples 1 & 3). In these two examples, the profile likelihood is not visible because it is overlain by the conditional likelihood.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4388855&req=5

pone.0121263.g005: The standardized conditional, modified profile, and profile likelihood functions are depicted for the log odds ratio ψ using the data in Fig. 1, Fig. 2, Fig. 3 and Fig. 4.The example numbers in this figure correspond to the examples described in Figs. 1–4. The profile likelihood is represented by a dashed black line, while the conditional and modified profile likelihoods are represented by thick red and black dotted lines, respectively. The horizontal lines represent 1/6.8 (upper), 1/8 (middle) and 1/32 (lower) likelihood support intervals (SIs). The maximum likelihood estimate (MLE) of each likelihood was also shown. For normally distributed data, a 1/6.8 SI and a Frequentist 95% confidence interval are identical. Note that the modified profile and conditional likelihoods are indistinguishable for all examples, while the profile and conditional likelihoods are similar for the examples of the (i.e., ψ = 0 in Examples 1 & 3). In these two examples, the profile likelihood is not visible because it is overlain by the conditional likelihood.

Mentions: To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 × 2 contingency tables, ubiquitous in the scientific literature, is a case in point. A problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The analysis of 2 × 2 contingency tables has generated controversy and dispute for more than a half-century in the statistical literature, so perhaps ‘deceptively simple’ would be a better description. For an illustration, consider the data from an example in the right panel of Table 1. Many p-values, including that from Fisher’s exact test, are associated with this one table despite the fact that they all appear to test the same hypothesis. Table 2 shows these p-values, which range in magnitude and may lead different conclusions. As such, this controversy is often viewed—too simplistically—as a problem of selecting the ‘right p-value’.


Elucidating the foundations of statistical inference with 2 x 2 tables.

Choi L, Blume JD, Dupont WD - PLoS ONE (2015)

The standardized conditional, modified profile, and profile likelihood functions are depicted for the log odds ratio ψ using the data in Fig. 1, Fig. 2, Fig. 3 and Fig. 4.The example numbers in this figure correspond to the examples described in Figs. 1–4. The profile likelihood is represented by a dashed black line, while the conditional and modified profile likelihoods are represented by thick red and black dotted lines, respectively. The horizontal lines represent 1/6.8 (upper), 1/8 (middle) and 1/32 (lower) likelihood support intervals (SIs). The maximum likelihood estimate (MLE)  of each likelihood was also shown. For normally distributed data, a 1/6.8 SI and a Frequentist 95% confidence interval are identical. Note that the modified profile and conditional likelihoods are indistinguishable for all examples, while the profile and conditional likelihoods are similar for the examples of the  (i.e., ψ = 0 in Examples 1 & 3). In these two examples, the profile likelihood is not visible because it is overlain by the conditional likelihood.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4388855&req=5

pone.0121263.g005: The standardized conditional, modified profile, and profile likelihood functions are depicted for the log odds ratio ψ using the data in Fig. 1, Fig. 2, Fig. 3 and Fig. 4.The example numbers in this figure correspond to the examples described in Figs. 1–4. The profile likelihood is represented by a dashed black line, while the conditional and modified profile likelihoods are represented by thick red and black dotted lines, respectively. The horizontal lines represent 1/6.8 (upper), 1/8 (middle) and 1/32 (lower) likelihood support intervals (SIs). The maximum likelihood estimate (MLE) of each likelihood was also shown. For normally distributed data, a 1/6.8 SI and a Frequentist 95% confidence interval are identical. Note that the modified profile and conditional likelihoods are indistinguishable for all examples, while the profile and conditional likelihoods are similar for the examples of the (i.e., ψ = 0 in Examples 1 & 3). In these two examples, the profile likelihood is not visible because it is overlain by the conditional likelihood.
Mentions: To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 × 2 contingency tables, ubiquitous in the scientific literature, is a case in point. A problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The analysis of 2 × 2 contingency tables has generated controversy and dispute for more than a half-century in the statistical literature, so perhaps ‘deceptively simple’ would be a better description. For an illustration, consider the data from an example in the right panel of Table 1. Many p-values, including that from Fisher’s exact test, are associated with this one table despite the fact that they all appear to test the same hypothesis. Table 2 shows these p-values, which range in magnitude and may lead different conclusions. As such, this controversy is often viewed—too simplistically—as a problem of selecting the ‘right p-value’.

Bottom Line: To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice.The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent.Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA.

ABSTRACT
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.

No MeSH data available.