Limits...
Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests.

Glaser B, Nikolov I, Chubb D, Hamshere ML, Segurado R, Moskvina V, Holmans P - BMC Proc (2007)

Bottom Line: The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels.Within a hypothetical two-stage design, pairwise LR analysis of all markers with significant RF single importance would have reduced the number of possible combinations in our small data set by 61%, whereas joint importance measures would have been less efficient for marker pair reduction.This suggests that RF single importance measures, which are able to detect a wide range of interaction effects and are computationally very efficient, might be exploited as pre-screening tool for larger association studies.Follow-up analysis, such as by LR, is required since RFs do not indicate high-risk genotype combinations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Biostatistics and Bioinformatics Unit, and Department of Psychological Medicine, Cardiff University, School of Medicine, Heath Park, Cardiff, Wales, CF14 4XN, UK. B.Glaser@Bristol.ac.uk

ABSTRACT
Using parametric and nonparametric techniques, our study investigated the presence of single locus and pairwise effects between 20 markers of the Genetic Analysis Workshop 15 (GAW15) North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set (Problem 2), analyzing 463 independent patients and 855 controls. Specifically, our work examined the correspondence between logistic regression (LR) analysis of single-locus and pairwise interaction effects, and random forest (RF) single and joint importance measures. For this comparison, we selected small but stable RFs (500 trees), which showed strong correlations (r~0.98) between their importance measures and those by RFs grown on 5000 trees. Both RF importance measures captured most of the LR single-locus and pairwise interaction effects, while joint importance measures also corresponded to full LR models containing main and interaction effects. We furthermore showed that RF measures were particularly sensitive to data imputation. The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels.Within a hypothetical two-stage design, pairwise LR analysis of all markers with significant RF single importance would have reduced the number of possible combinations in our small data set by 61%, whereas joint importance measures would have been less efficient for marker pair reduction. This suggests that RF single importance measures, which are able to detect a wide range of interaction effects and are computationally very efficient, might be exploited as pre-screening tool for larger association studies. Follow-up analysis, such as by LR, is required since RFs do not indicate high-risk genotype combinations.

No MeSH data available.


Related in: MedlinePlus

Performance of RFs grown with different numbers of trees. Performance of single (A) and joint (B) importance measures for RFs grown on 50, 100, 200, 300, 400, 500, 1000, and 5000 trees, respectively (median-replaced data). r AvImp-5000, Correlation of AvImps of each RF compared to RFs based on 5000 trees; r Zscore-5000, Correlation of Z-scores of each RF compared to RFs based on 5000 trees; r AvImp-Zscore, Correlation between AvImp and Z-score for each RF; %sig p, Percentage of significant AvImps. All RF features apart from the tree number were held constant.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2367457&req=5

Figure 1: Performance of RFs grown with different numbers of trees. Performance of single (A) and joint (B) importance measures for RFs grown on 50, 100, 200, 300, 400, 500, 1000, and 5000 trees, respectively (median-replaced data). r AvImp-5000, Correlation of AvImps of each RF compared to RFs based on 5000 trees; r Zscore-5000, Correlation of Z-scores of each RF compared to RFs based on 5000 trees; r AvImp-Zscore, Correlation between AvImp and Z-score for each RF; %sig p, Percentage of significant AvImps. All RF features apart from the tree number were held constant.

Mentions: The predictive importance of single markers [3] and marker pairs [5] was measured by Z-scores and significance levels. Z-scores can be inferred from AvImp measures through division by their associated standard error (SE), which is determined by the AvImp measure and the number of trees grown [3]. AvImp measures increase with the number of trees and their rank becomes stable, provided the number of trees is sufficiently large, such as observed by Lunetta et al. [4]. The associated Z-scores and their significance, however, also continue to increase with increasing number of trees, even when the RF reaches stability (see Figure 1). We investigated the correlation between AvImp measures for RFs with 5000 trees and RFs with 50, 100, 200, 300, 400, 500, and 1000 trees, respectively (median-replaced data); and in a similar way, the correlation between Z-scores (Spearman rank correlation). We observed high correlations (rAvImp-5000 ≥ 0.98, rZscore-5000 ≥ 0.96) for RFs with ~500 trees and more, for both single and joint AvImp measures (see Figure 1), implying a) a stable rank among their most important markers and b) that significant markers identified by these RFs remain significant when analyzed using RF grown on 5000 trees. We therefore selected stable but small RFs (500 trees), which give similar AvImp ranks as RFs grown on 5000 trees.


Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests.

Glaser B, Nikolov I, Chubb D, Hamshere ML, Segurado R, Moskvina V, Holmans P - BMC Proc (2007)

Performance of RFs grown with different numbers of trees. Performance of single (A) and joint (B) importance measures for RFs grown on 50, 100, 200, 300, 400, 500, 1000, and 5000 trees, respectively (median-replaced data). r AvImp-5000, Correlation of AvImps of each RF compared to RFs based on 5000 trees; r Zscore-5000, Correlation of Z-scores of each RF compared to RFs based on 5000 trees; r AvImp-Zscore, Correlation between AvImp and Z-score for each RF; %sig p, Percentage of significant AvImps. All RF features apart from the tree number were held constant.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2367457&req=5

Figure 1: Performance of RFs grown with different numbers of trees. Performance of single (A) and joint (B) importance measures for RFs grown on 50, 100, 200, 300, 400, 500, 1000, and 5000 trees, respectively (median-replaced data). r AvImp-5000, Correlation of AvImps of each RF compared to RFs based on 5000 trees; r Zscore-5000, Correlation of Z-scores of each RF compared to RFs based on 5000 trees; r AvImp-Zscore, Correlation between AvImp and Z-score for each RF; %sig p, Percentage of significant AvImps. All RF features apart from the tree number were held constant.
Mentions: The predictive importance of single markers [3] and marker pairs [5] was measured by Z-scores and significance levels. Z-scores can be inferred from AvImp measures through division by their associated standard error (SE), which is determined by the AvImp measure and the number of trees grown [3]. AvImp measures increase with the number of trees and their rank becomes stable, provided the number of trees is sufficiently large, such as observed by Lunetta et al. [4]. The associated Z-scores and their significance, however, also continue to increase with increasing number of trees, even when the RF reaches stability (see Figure 1). We investigated the correlation between AvImp measures for RFs with 5000 trees and RFs with 50, 100, 200, 300, 400, 500, and 1000 trees, respectively (median-replaced data); and in a similar way, the correlation between Z-scores (Spearman rank correlation). We observed high correlations (rAvImp-5000 ≥ 0.98, rZscore-5000 ≥ 0.96) for RFs with ~500 trees and more, for both single and joint AvImp measures (see Figure 1), implying a) a stable rank among their most important markers and b) that significant markers identified by these RFs remain significant when analyzed using RF grown on 5000 trees. We therefore selected stable but small RFs (500 trees), which give similar AvImp ranks as RFs grown on 5000 trees.

Bottom Line: The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels.Within a hypothetical two-stage design, pairwise LR analysis of all markers with significant RF single importance would have reduced the number of possible combinations in our small data set by 61%, whereas joint importance measures would have been less efficient for marker pair reduction.This suggests that RF single importance measures, which are able to detect a wide range of interaction effects and are computationally very efficient, might be exploited as pre-screening tool for larger association studies.Follow-up analysis, such as by LR, is required since RFs do not indicate high-risk genotype combinations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Biostatistics and Bioinformatics Unit, and Department of Psychological Medicine, Cardiff University, School of Medicine, Heath Park, Cardiff, Wales, CF14 4XN, UK. B.Glaser@Bristol.ac.uk

ABSTRACT
Using parametric and nonparametric techniques, our study investigated the presence of single locus and pairwise effects between 20 markers of the Genetic Analysis Workshop 15 (GAW15) North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set (Problem 2), analyzing 463 independent patients and 855 controls. Specifically, our work examined the correspondence between logistic regression (LR) analysis of single-locus and pairwise interaction effects, and random forest (RF) single and joint importance measures. For this comparison, we selected small but stable RFs (500 trees), which showed strong correlations (r~0.98) between their importance measures and those by RFs grown on 5000 trees. Both RF importance measures captured most of the LR single-locus and pairwise interaction effects, while joint importance measures also corresponded to full LR models containing main and interaction effects. We furthermore showed that RF measures were particularly sensitive to data imputation. The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels.Within a hypothetical two-stage design, pairwise LR analysis of all markers with significant RF single importance would have reduced the number of possible combinations in our small data set by 61%, whereas joint importance measures would have been less efficient for marker pair reduction. This suggests that RF single importance measures, which are able to detect a wide range of interaction effects and are computationally very efficient, might be exploited as pre-screening tool for larger association studies. Follow-up analysis, such as by LR, is required since RFs do not indicate high-risk genotype combinations.

No MeSH data available.


Related in: MedlinePlus