Limits...
The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees.

Lampa E, Lind L, Lind PM, Bornefalk-Hermansson A - Environ Health (2014)

Bottom Line: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants.The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable.Some spurious interactions were also found, however.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden. erik.lampa@medsci.uu.se.

ABSTRACT

Background: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees.

Methods: We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method's ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data.

Results: The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set.

Conclusions: We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies.

Show MeSH

Related in: MedlinePlus

Interactions. Black dots represent observed values of H and boxes represent the  distributions H0. Small tick marks represent values of the  distribution below or above the 5th and 95th percentiles respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4120739&req=5

Figure 11: Interactions. Black dots represent observed values of H and boxes represent the distributions H0. Small tick marks represent values of the distribution below or above the 5th and 95th percentiles respectively.

Mentions: The maximum bootstrap validated R2 was 0.19 and was achieved with an ensemble consisting of 6,500 depth 6 CARTs. Using the one SE rule, an ensemble constisting of 6,250 depth 3 CARTs produced a bootstrap validated R2 of 0.18. The maximum R2 resulting from an ensemble consisting of CARTs restricted to d= 1 was 0.17, suggesting that if interaction effects are present in the data they are not very influential. Figure 10 shows the ten most important predictors of serum bilirubin levels. There were no predictors that clearly stood out from the rest, but height was the most important predictor followed by BPA, Triglycerides, Al and Co. Figure 11 shows the total interaction strength (top left panel), two-way interactions with BPA (top right panel), two-way interactions with PCB 126 (bottom left panel) and two-way interactions with Zn (bottom right panel) for the 10 most important predictors. BPA seems to interact with height (7/10) and PCB 126 (8/10), PCB 126 seems to interact with BPA (7/10) and Zn (8/10) and Zn seems to interact with PCB 126 and Co (stability 7/10 and 2/10 respectively). When assessing the total interaction strength, neither height nor Co seemed to be involved in any interactions (Figure 11, top left panel) and we focus on the interaction involving BPA and PCB 126 in this example.


The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees.

Lampa E, Lind L, Lind PM, Bornefalk-Hermansson A - Environ Health (2014)

Interactions. Black dots represent observed values of H and boxes represent the  distributions H0. Small tick marks represent values of the  distribution below or above the 5th and 95th percentiles respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4120739&req=5

Figure 11: Interactions. Black dots represent observed values of H and boxes represent the distributions H0. Small tick marks represent values of the distribution below or above the 5th and 95th percentiles respectively.
Mentions: The maximum bootstrap validated R2 was 0.19 and was achieved with an ensemble consisting of 6,500 depth 6 CARTs. Using the one SE rule, an ensemble constisting of 6,250 depth 3 CARTs produced a bootstrap validated R2 of 0.18. The maximum R2 resulting from an ensemble consisting of CARTs restricted to d= 1 was 0.17, suggesting that if interaction effects are present in the data they are not very influential. Figure 10 shows the ten most important predictors of serum bilirubin levels. There were no predictors that clearly stood out from the rest, but height was the most important predictor followed by BPA, Triglycerides, Al and Co. Figure 11 shows the total interaction strength (top left panel), two-way interactions with BPA (top right panel), two-way interactions with PCB 126 (bottom left panel) and two-way interactions with Zn (bottom right panel) for the 10 most important predictors. BPA seems to interact with height (7/10) and PCB 126 (8/10), PCB 126 seems to interact with BPA (7/10) and Zn (8/10) and Zn seems to interact with PCB 126 and Co (stability 7/10 and 2/10 respectively). When assessing the total interaction strength, neither height nor Co seemed to be involved in any interactions (Figure 11, top left panel) and we focus on the interaction involving BPA and PCB 126 in this example.

Bottom Line: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants.The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable.Some spurious interactions were also found, however.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden. erik.lampa@medsci.uu.se.

ABSTRACT

Background: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees.

Methods: We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method's ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data.

Results: The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set.

Conclusions: We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies.

Show MeSH
Related in: MedlinePlus