Statistical significance of variables driving systematic variation in high-dimensional data.
Bottom Line: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs.The proposed method can greatly simplify complex significance testing problems encountered in genomics and can be used to identify the genomic variables significantly associated with latent variables.Using our method, we find a greater enrichment for inflammatory-related gene sets compared to the original analysis that uses a clinically defined, although likely imprecise, phenotype.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.Show MeSH
Mentions: Let’s now consider a concrete example of , and the ultimate inference goal. Spellman et al. (1998) carried out a gene expression study to identify cell-cycle regulated genes of Saccharomyces cerevisiae (Fig. 2). In this experiment, m = 5981 genes’ expression values were originally measured over n = 14 time points in a culture of yeast cells whose cell cycles had been synchronized. (Note that an inspection of the 14 microarrays from Spellman et al. (1998) reveals an aberrant gene expression profile from 300-min, so we removed this array in our analysis—see Supplementary Figure S2.) Here, z is the latent variable that represents the dynamic gene expression regulatory program over the yeast cell cycle. L is the manifested influence of z on the observed scale of gene expression measurements (Fig. 1). The ordered time points themselves do not capture the underlying cell-cycle regulation, and it is, therefore, not clear how to a priori accurately model L. If L were directly observed, then we could identify which genes are cell-cycle regulated by performing a significance test of versus for each gene i.Fig. 2.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.