Limits...
Bias in odds ratios by logistic regression modelling and sample size.

Nemes S, Jonasson JM, Genell A, Steineck G - BMC Med Res Methodol (2009)

Bottom Line: In epidemiological studies researchers use logistic regression as an analytical tool to study the association of a binary outcome to a set of possible exposures.Logistic regression overestimates odds ratios in studies with small to moderate samples size.Regression coefficient estimates shifts away from zero, odds ratios from one.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Clinical Cancer Epidemiology, Department of Oncology, Sahlgrenska Academy, University of Gothenburg, Sweden. nemes.szilard@oc.gu.se

ABSTRACT

Background: In epidemiological studies researchers use logistic regression as an analytical tool to study the association of a binary outcome to a set of possible exposures.

Methods: Using a simulation study we illustrate how the analytically derived bias of odds ratios modelling in logistic regression varies as a function of the sample size.

Results: Logistic regression overestimates odds ratios in studies with small to moderate samples size. The small sample size induced bias is a systematic one, bias away from . Regression coefficient estimates shifts away from zero, odds ratios from one.

Conclusion: If several small studies are pooled without consideration of the bias introduced by the inherent mathematical properties of the logistic regression model, researchers may be mislead to erroneous interpretation of the results.

Show MeSH

Related in: MedlinePlus

Sampling distribution of logistic regression coefficient estimates at different sample sizes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2724427&req=5

Figure 2: Sampling distribution of logistic regression coefficient estimates at different sample sizes.

Mentions: Table 1 summarizes the estimated empirical bias in estimated regression coefficients. With increasing sample size the estimated coefficients asymptotically approaches the population value (Figure 1). The fit is better for continuous variables (R2 = 0.963) than for discrete one (R2 = 0.836). This translates to a greater variability in logistic regression estimates for discrete variables. For both the continuous and discrete exposure variables the asymptotic bias converges to zero as the sample size increase, but the convergence intensity differs. Also the sampling density function is rather skewed in smaller samples and approaches to a symmetric distribution with increasing sample size (Figure. 2). Skewed sampling distribution more frequently result in extreme value estimates, the proportion of which decreases with increasing sample sizes (Figure 3).


Bias in odds ratios by logistic regression modelling and sample size.

Nemes S, Jonasson JM, Genell A, Steineck G - BMC Med Res Methodol (2009)

Sampling distribution of logistic regression coefficient estimates at different sample sizes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2724427&req=5

Figure 2: Sampling distribution of logistic regression coefficient estimates at different sample sizes.
Mentions: Table 1 summarizes the estimated empirical bias in estimated regression coefficients. With increasing sample size the estimated coefficients asymptotically approaches the population value (Figure 1). The fit is better for continuous variables (R2 = 0.963) than for discrete one (R2 = 0.836). This translates to a greater variability in logistic regression estimates for discrete variables. For both the continuous and discrete exposure variables the asymptotic bias converges to zero as the sample size increase, but the convergence intensity differs. Also the sampling density function is rather skewed in smaller samples and approaches to a symmetric distribution with increasing sample size (Figure. 2). Skewed sampling distribution more frequently result in extreme value estimates, the proportion of which decreases with increasing sample sizes (Figure 3).

Bottom Line: In epidemiological studies researchers use logistic regression as an analytical tool to study the association of a binary outcome to a set of possible exposures.Logistic regression overestimates odds ratios in studies with small to moderate samples size.Regression coefficient estimates shifts away from zero, odds ratios from one.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Clinical Cancer Epidemiology, Department of Oncology, Sahlgrenska Academy, University of Gothenburg, Sweden. nemes.szilard@oc.gu.se

ABSTRACT

Background: In epidemiological studies researchers use logistic regression as an analytical tool to study the association of a binary outcome to a set of possible exposures.

Methods: Using a simulation study we illustrate how the analytically derived bias of odds ratios modelling in logistic regression varies as a function of the sample size.

Results: Logistic regression overestimates odds ratios in studies with small to moderate samples size. The small sample size induced bias is a systematic one, bias away from . Regression coefficient estimates shifts away from zero, odds ratios from one.

Conclusion: If several small studies are pooled without consideration of the bias introduced by the inherent mathematical properties of the logistic regression model, researchers may be mislead to erroneous interpretation of the results.

Show MeSH
Related in: MedlinePlus