Statistical significance of variables driving systematic variation in high-dimensional data.
Bottom Line: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs.We also analyze gene expression data from post-trauma patients, allowing the gene expression data to provide a molecularly driven phenotype.An R software package, called jackstraw, is available in CRAN. firstname.lastname@example.org.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.Show MeSH
Mentions: Let’s now consider a concrete example of , and the ultimate inference goal. Spellman et al. (1998) carried out a gene expression study to identify cell-cycle regulated genes of Saccharomyces cerevisiae (Fig. 2). In this experiment, m = 5981 genes’ expression values were originally measured over n = 14 time points in a culture of yeast cells whose cell cycles had been synchronized. (Note that an inspection of the 14 microarrays from Spellman et al. (1998) reveals an aberrant gene expression profile from 300-min, so we removed this array in our analysis—see Supplementary Figure S2.) Here, z is the latent variable that represents the dynamic gene expression regulatory program over the yeast cell cycle. L is the manifested influence of z on the observed scale of gene expression measurements (Fig. 1). The ordered time points themselves do not capture the underlying cell-cycle regulation, and it is, therefore, not clear how to a priori accurately model L. If L were directly observed, then we could identify which genes are cell-cycle regulated by performing a significance test of versus for each gene i.Fig. 2.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.