Limits...
The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees.

Lampa E, Lind L, Lind PM, Bornefalk-Hermansson A - Environ Health (2014)

Bottom Line: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants.The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable.Some spurious interactions were also found, however.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden. erik.lampa@medsci.uu.se.

ABSTRACT

Background: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees.

Methods: We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method's ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data.

Results: The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set.

Conclusions: We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies.

Show MeSH

Related in: MedlinePlus

Interactions for SNR = 1. Black dots represent observed values of H, and boxes represent the derived  distributions H0. Small tick marks represent values of the  distribution below or above the 5th and 95th percentiles respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4120739&req=5

Figure 3: Interactions for SNR = 1. Black dots represent observed values of H, and boxes represent the derived distributions H0. Small tick marks represent values of the distribution below or above the 5th and 95th percentiles respectively.

Mentions: Figures 3 and 4 show the same for SNR = 1 and SNR = 0.5 as Figure 2 does for SNR = 2. The top left panels of Figures 3 and 4 show the total interaction strengths, and it is clear that the correct interacting variables have been identified. The effect of the narrow distributions is apparent in the lower left panel of Figure 4. A spurious three-way interactions involving p-p’-DDE, Cd and PCB 169 could be seen, although the observed value of H is small. This interaction was less stable (6/10) than the interaction between p-p’-DDE, Cd and PCB 170 (9/10) and between p-p’-DDE, Cd and MMP (9/10) and PCB 169 was not judged to interact with any other variable (Figure 4, top left panel). The correct four-way interactions were identified, however (Figures 3 and 4, lower right panels). The other identified interactions were stable for both SNR = 1 and for SNR = 0.5 (stability ranged between 8/10 and 10/10).The top left panel of Figure 5 shows the strengths of the total interaction effects when SNR = 0.1. Only p-p’-DDE and BPA seem to be involved in interactions and neither the correct two-way interactions (top right panel) nor the correct three-way (bottom panels) interactions were identified. The p-p’-DDE–Pb and p-p’-DDE–PCB 126 interactions were not stable (4/10 and 3/10 respectively) in the split-sample validation and neither was the spurious three-way interaction p-p’-DDE–Pb–PCB 126 (Figure 5 bottom panels, stability 2/10).Figure 6 shows interaction strengths for the two-way interactions with sex for SNR = 2, 1 and 0.5. BPA is clearly interacting with sex in each of the three scenarios (stability 10/10). We did not include SNR = 0.1 in Figure 6 as sex was not found among the ten most important variables. Partial dependences on BPA conditioned on sex are seen in Figure 7 with SNR = 2 (top left panel), SNR = 1, (top right panel) and SNR = 0.5 (bottom left panel). The non-linear dependence on OCDD is captured well as is shown in Figure 8 although the U-shape is not as clear for SNR = 0.1 as it is for the other SNRs.


The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees.

Lampa E, Lind L, Lind PM, Bornefalk-Hermansson A - Environ Health (2014)

Interactions for SNR = 1. Black dots represent observed values of H, and boxes represent the derived  distributions H0. Small tick marks represent values of the  distribution below or above the 5th and 95th percentiles respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4120739&req=5

Figure 3: Interactions for SNR = 1. Black dots represent observed values of H, and boxes represent the derived distributions H0. Small tick marks represent values of the distribution below or above the 5th and 95th percentiles respectively.
Mentions: Figures 3 and 4 show the same for SNR = 1 and SNR = 0.5 as Figure 2 does for SNR = 2. The top left panels of Figures 3 and 4 show the total interaction strengths, and it is clear that the correct interacting variables have been identified. The effect of the narrow distributions is apparent in the lower left panel of Figure 4. A spurious three-way interactions involving p-p’-DDE, Cd and PCB 169 could be seen, although the observed value of H is small. This interaction was less stable (6/10) than the interaction between p-p’-DDE, Cd and PCB 170 (9/10) and between p-p’-DDE, Cd and MMP (9/10) and PCB 169 was not judged to interact with any other variable (Figure 4, top left panel). The correct four-way interactions were identified, however (Figures 3 and 4, lower right panels). The other identified interactions were stable for both SNR = 1 and for SNR = 0.5 (stability ranged between 8/10 and 10/10).The top left panel of Figure 5 shows the strengths of the total interaction effects when SNR = 0.1. Only p-p’-DDE and BPA seem to be involved in interactions and neither the correct two-way interactions (top right panel) nor the correct three-way (bottom panels) interactions were identified. The p-p’-DDE–Pb and p-p’-DDE–PCB 126 interactions were not stable (4/10 and 3/10 respectively) in the split-sample validation and neither was the spurious three-way interaction p-p’-DDE–Pb–PCB 126 (Figure 5 bottom panels, stability 2/10).Figure 6 shows interaction strengths for the two-way interactions with sex for SNR = 2, 1 and 0.5. BPA is clearly interacting with sex in each of the three scenarios (stability 10/10). We did not include SNR = 0.1 in Figure 6 as sex was not found among the ten most important variables. Partial dependences on BPA conditioned on sex are seen in Figure 7 with SNR = 2 (top left panel), SNR = 1, (top right panel) and SNR = 0.5 (bottom left panel). The non-linear dependence on OCDD is captured well as is shown in Figure 8 although the U-shape is not as clear for SNR = 0.1 as it is for the other SNRs.

Bottom Line: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants.The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable.Some spurious interactions were also found, however.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden. erik.lampa@medsci.uu.se.

ABSTRACT

Background: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees.

Methods: We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method's ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data.

Results: The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set.

Conclusions: We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies.

Show MeSH
Related in: MedlinePlus