Limits...
Elucidating the foundations of statistical inference with 2 x 2 tables.

Choi L, Blume JD, Dupont WD - PLoS ONE (2015)

Bottom Line: To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice.The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent.Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA.

ABSTRACT
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.

No MeSH data available.


The data are shown on the top left panel.On top right panel, all possible configurations of tables (y1 and y2) are listed when only y+ is known. The corresponding maximum likelihood estimate of the log odds ratio ψ for each possible table, denoted as , is also shown. The nuisance parameter λ* = (n1π1+n2π2)/(n1 + n2) is the marginal probability of success among all treated subjects. (A) Contour plot of the likelihood L = L(ψ,λ*;y1,y2), which is the joint likelihood of different values of ψ and λ* given the observed values ofy1 and y2. Lighter colors denote higher values of L; (B) Contour plot of the marginal likelihood L2 = L(ψ,λ*;y+) given the success total y+ as a function of ψ andλ*; (C) The likelihood L given y1 and y2 plotted against ψ at five different fixed values ofλ*. The profile likelihood function is also plotted; (D) The marginal likelihood L2 given y+ plotted against ψ at fixed values of λ*. The conditional likelihood L1 = L(ψ;y1∣y+) is also plotted in red. These graphs demonstrate that for balanced sample sizes the marginal success total tells us virtually nothing about ψ, and hence should be treated as an ancillary statistic.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4388855&req=5

pone.0121263.g001: The data are shown on the top left panel.On top right panel, all possible configurations of tables (y1 and y2) are listed when only y+ is known. The corresponding maximum likelihood estimate of the log odds ratio ψ for each possible table, denoted as , is also shown. The nuisance parameter λ* = (n1π1+n2π2)/(n1 + n2) is the marginal probability of success among all treated subjects. (A) Contour plot of the likelihood L = L(ψ,λ*;y1,y2), which is the joint likelihood of different values of ψ and λ* given the observed values ofy1 and y2. Lighter colors denote higher values of L; (B) Contour plot of the marginal likelihood L2 = L(ψ,λ*;y+) given the success total y+ as a function of ψ andλ*; (C) The likelihood L given y1 and y2 plotted against ψ at five different fixed values ofλ*. The profile likelihood function is also plotted; (D) The marginal likelihood L2 given y+ plotted against ψ at fixed values of λ*. The conditional likelihood L1 = L(ψ;y1∣y+) is also plotted in red. These graphs demonstrate that for balanced sample sizes the marginal success total tells us virtually nothing about ψ, and hence should be treated as an ancillary statistic.

Mentions: We examined the information about ψ contained in y+ under a wide variety of scenarios, including when the sample sizes are equal, small, large and extremely unbalanced with sparse cells. Fig. 1 and Fig. 2 show examples with equal smaller sample sizes, while Fig. 3 and Fig. 4 show examples with unequal sample sizes. In addition, in Fig. 1 and Fig. 3 the observed success rates are equal while in Fig. 2 and Fig. 4 they are not.


Elucidating the foundations of statistical inference with 2 x 2 tables.

Choi L, Blume JD, Dupont WD - PLoS ONE (2015)

The data are shown on the top left panel.On top right panel, all possible configurations of tables (y1 and y2) are listed when only y+ is known. The corresponding maximum likelihood estimate of the log odds ratio ψ for each possible table, denoted as , is also shown. The nuisance parameter λ* = (n1π1+n2π2)/(n1 + n2) is the marginal probability of success among all treated subjects. (A) Contour plot of the likelihood L = L(ψ,λ*;y1,y2), which is the joint likelihood of different values of ψ and λ* given the observed values ofy1 and y2. Lighter colors denote higher values of L; (B) Contour plot of the marginal likelihood L2 = L(ψ,λ*;y+) given the success total y+ as a function of ψ andλ*; (C) The likelihood L given y1 and y2 plotted against ψ at five different fixed values ofλ*. The profile likelihood function is also plotted; (D) The marginal likelihood L2 given y+ plotted against ψ at fixed values of λ*. The conditional likelihood L1 = L(ψ;y1∣y+) is also plotted in red. These graphs demonstrate that for balanced sample sizes the marginal success total tells us virtually nothing about ψ, and hence should be treated as an ancillary statistic.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4388855&req=5

pone.0121263.g001: The data are shown on the top left panel.On top right panel, all possible configurations of tables (y1 and y2) are listed when only y+ is known. The corresponding maximum likelihood estimate of the log odds ratio ψ for each possible table, denoted as , is also shown. The nuisance parameter λ* = (n1π1+n2π2)/(n1 + n2) is the marginal probability of success among all treated subjects. (A) Contour plot of the likelihood L = L(ψ,λ*;y1,y2), which is the joint likelihood of different values of ψ and λ* given the observed values ofy1 and y2. Lighter colors denote higher values of L; (B) Contour plot of the marginal likelihood L2 = L(ψ,λ*;y+) given the success total y+ as a function of ψ andλ*; (C) The likelihood L given y1 and y2 plotted against ψ at five different fixed values ofλ*. The profile likelihood function is also plotted; (D) The marginal likelihood L2 given y+ plotted against ψ at fixed values of λ*. The conditional likelihood L1 = L(ψ;y1∣y+) is also plotted in red. These graphs demonstrate that for balanced sample sizes the marginal success total tells us virtually nothing about ψ, and hence should be treated as an ancillary statistic.
Mentions: We examined the information about ψ contained in y+ under a wide variety of scenarios, including when the sample sizes are equal, small, large and extremely unbalanced with sparse cells. Fig. 1 and Fig. 2 show examples with equal smaller sample sizes, while Fig. 3 and Fig. 4 show examples with unequal sample sizes. In addition, in Fig. 1 and Fig. 3 the observed success rates are equal while in Fig. 2 and Fig. 4 they are not.

Bottom Line: To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice.The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent.Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA.

ABSTRACT
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher's exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table's origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.

No MeSH data available.