Statistical significance of variables driving systematic variation in high-dimensional data.
Bottom Line: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs.We also analyze gene expression data from post-trauma patients, allowing the gene expression data to provide a molecularly driven phenotype.An R software package, called jackstraw, is available in CRAN. email@example.com.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.Show MeSH
Related in: MedlinePlus
Mentions: We carried out analogous analyses on 15 more simulation scenarios, detailed in Fig. 4. We used all possible combinations of the following: (1) either dichotomous or sinusoidal functions for L; (2) the parameters were simulated from either a Bernoulli or Uniform distribution; (3) m = 1000 or m = 5000 variables; and (4) the proportion of true variables set to either or . The proposed method was applied with , and to study the impact of the choice of the number of synthetic variables. For each scenario, we applied the joint criterion double KS evaluation (Supplementary Fig. S3), using 500 simulated data sets. The conventional F test method consistently produced anti-conservative P values, while the proposed method yielded accurately distributed P values (Fig. 6).Fig. 6.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.