Limits...
Classification of rheumatoid arthritis status with candidate gene and genome-wide single-nucleotide polymorphisms using random forests.

Sun YV, Cai Z, Desai K, Lawrance R, Leff R, Jawaid A, Kardia SL, Yang H - BMC Proc (2007)

Bottom Line: Several genes were consistently identified as weakly associated with RA without a significant interaction or combinatorial effect with other candidate genes.However, using the top 500 SNPs, ranked by the importance score, from the genome-wide linkage panel of 5742 SNPs, we were able to accurately predict RA patients and normal subjects with sensitivity of approximately 90% and specificity of approximately 80%, which was confirmed by five-fold cross-validation.However, in a complete training-testing framework, replication of genetic predictors was less satisfactory; thus, further evaluation of existing methodology and development of new methods are warranted.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology, School of Public Health, University of Michigan, 611 Church Street #244, Ann Arbor, Michigan 48104, USA. yansun@umich.edu

ABSTRACT
Using the North American Rheumatoid Arthritis Consortium (NARAC) candidate gene and genome-wide single-nucleotide polymorphism (SNP) data sets, we applied regression methods and tree-based random forests to identify genetic associations with rheumatoid arthritis (RA) and to predict RA disease status. Several genes were consistently identified as weakly associated with RA without a significant interaction or combinatorial effect with other candidate genes. Using random forests, the tested candidate gene SNPs were not sufficient to predict RA patients and normal subjects with high accuracy. However, using the top 500 SNPs, ranked by the importance score, from the genome-wide linkage panel of 5742 SNPs, we were able to accurately predict RA patients and normal subjects with sensitivity of approximately 90% and specificity of approximately 80%, which was confirmed by five-fold cross-validation. However, in a complete training-testing framework, replication of genetic predictors was less satisfactory; thus, further evaluation of existing methodology and development of new methods are warranted.

No MeSH data available.


Related in: MedlinePlus

Two distinct clusters of RA patients in a multidimensional scaling (MDS) plot. The X-axis and the Y-axis represent the dimensions with the two largest eigenvalues generated by the MDS algorithm.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2367463&req=5

Figure 1: Two distinct clusters of RA patients in a multidimensional scaling (MDS) plot. The X-axis and the Y-axis represent the dimensions with the two largest eigenvalues generated by the MDS algorithm.

Mentions: Figure 1 shows the separation of the two clusters, for which a cluster validity algorithm was employed to determine statistical significance. The empirical distribution of a cluster validity statistic, within-to-between ratio, was computed from 10,000 permutations of the partition vector. The test statistic was defined as the ratio of average within cluster distances to the average between cluster distances for a partition vector. The clusters (subgroups A and B) shown in Figure 1, was calculated to be significant at p-value 0.01. These phenotypic clusters are correlated with a set of clinical measures, such as ARA criteria score [7], which reflect the disease severity.


Classification of rheumatoid arthritis status with candidate gene and genome-wide single-nucleotide polymorphisms using random forests.

Sun YV, Cai Z, Desai K, Lawrance R, Leff R, Jawaid A, Kardia SL, Yang H - BMC Proc (2007)

Two distinct clusters of RA patients in a multidimensional scaling (MDS) plot. The X-axis and the Y-axis represent the dimensions with the two largest eigenvalues generated by the MDS algorithm.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2367463&req=5

Figure 1: Two distinct clusters of RA patients in a multidimensional scaling (MDS) plot. The X-axis and the Y-axis represent the dimensions with the two largest eigenvalues generated by the MDS algorithm.
Mentions: Figure 1 shows the separation of the two clusters, for which a cluster validity algorithm was employed to determine statistical significance. The empirical distribution of a cluster validity statistic, within-to-between ratio, was computed from 10,000 permutations of the partition vector. The test statistic was defined as the ratio of average within cluster distances to the average between cluster distances for a partition vector. The clusters (subgroups A and B) shown in Figure 1, was calculated to be significant at p-value 0.01. These phenotypic clusters are correlated with a set of clinical measures, such as ARA criteria score [7], which reflect the disease severity.

Bottom Line: Several genes were consistently identified as weakly associated with RA without a significant interaction or combinatorial effect with other candidate genes.However, using the top 500 SNPs, ranked by the importance score, from the genome-wide linkage panel of 5742 SNPs, we were able to accurately predict RA patients and normal subjects with sensitivity of approximately 90% and specificity of approximately 80%, which was confirmed by five-fold cross-validation.However, in a complete training-testing framework, replication of genetic predictors was less satisfactory; thus, further evaluation of existing methodology and development of new methods are warranted.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology, School of Public Health, University of Michigan, 611 Church Street #244, Ann Arbor, Michigan 48104, USA. yansun@umich.edu

ABSTRACT
Using the North American Rheumatoid Arthritis Consortium (NARAC) candidate gene and genome-wide single-nucleotide polymorphism (SNP) data sets, we applied regression methods and tree-based random forests to identify genetic associations with rheumatoid arthritis (RA) and to predict RA disease status. Several genes were consistently identified as weakly associated with RA without a significant interaction or combinatorial effect with other candidate genes. Using random forests, the tested candidate gene SNPs were not sufficient to predict RA patients and normal subjects with high accuracy. However, using the top 500 SNPs, ranked by the importance score, from the genome-wide linkage panel of 5742 SNPs, we were able to accurately predict RA patients and normal subjects with sensitivity of approximately 90% and specificity of approximately 80%, which was confirmed by five-fold cross-validation. However, in a complete training-testing framework, replication of genetic predictors was less satisfactory; thus, further evaluation of existing methodology and development of new methods are warranted.

No MeSH data available.


Related in: MedlinePlus