Statistical significance of variables driving systematic variation in high-dimensional data.
Bottom Line: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs.The proposed method can greatly simplify complex significance testing problems encountered in genomics and can be used to identify the genomic variables significantly associated with latent variables.Using our method, we find a greater enrichment for inflammatory-related gene sets compared to the original analysis that uses a clinically defined, although likely imprecise, phenotype.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.Show MeSH
Mentions: We have developed a resampling method (Fig. 3) to obtain accurate statistical significance measures of the associations between observed variables and their PCs, accounting for the over-fitting characteristics due to computation of PCs from the same set of observed variables. The proposed algorithm replaces a small number s () of observed variables with independently permuted ‘synthetic’ variables, while preserving the overall systematic variation in the data. Note that the jackstraw disrupts the systematic variation among the randomly chosen s rows by applying independently generated permutation mappings. We denote the new matrix with the s synthetic variables replacing their original values as . This is simply the original matrix Y with the s rows of Y replaced by independently permuted versions. On each permutation dataset , we calculate association statistics for each synthetic variable, exactly as was done on the original data. We carry this out B times, effectively creating B sets of permutation statistics. The association statistics calculated on Y are then compared to the association statistics calculated on only the s synthetic rows of to obtain statistical significance measures.Fig. 3.
Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.