Statistical significance of variables driving systematic variation in high-dimensional data.
Bottom Line: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs.We also analyze gene expression data from post-trauma patients, allowing the gene expression data to provide a molecularly driven phenotype.An R software package, called jackstraw, is available in CRAN. email@example.com.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.Show MeSH
Mentions: We constructed 16 simulation scenarios representing a wide range of configurations of signal and noise (Fig. 4), with 500 independent studies simulated from each. Let us first consider one of the simpler scenarios in detail. Model (1) is used to generate the data. In this particular scenario, we have m = 1000, n = 20, r = 1 andL=n−1n(1,1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1),a dichotomous mean shift resembling differential expression between the first 10 observations and the second 10 observations. (The factor is to give L unit variance.) For 95% of the variables, we set bi = 0, implying they are variables; we parameterize this proportion by . The other 50 non- variables were simulated such that Uniform(0,1). The noise terms are simulated as Normal(0,1). The data for variable i are thus simulated according to .Fig. 4.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.