Limits...
Statistical significance of variables driving systematic variation in high-dimensional data.

Chung NC, Storey JD - Bioinformatics (2014)

Bottom Line: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs.The proposed method can greatly simplify complex significance testing problems encountered in genomics and can be used to identify the genomic variables significantly associated with latent variables.Using our method, we find a greater enrichment for inflammatory-related gene sets compared to the original analysis that uses a clinically defined, although likely imprecise, phenotype.

View Article: PubMed Central - PubMed

Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.

Show MeSH

Related in: MedlinePlus

QQ-plots of double KS test P values from 16 simulation scenarios versus the Uniform(0,1) distribution. For each of 500 independent studies per scenario, we tested for deviation of  P values from Uniform(0,1), resulting in 500 KS test P values for each scenario. An individual point in the QQ-plot represents a double KS test P value for one scenario, comparing its 500 KS test P values to Uniform(0,1). On the left panel, the systematic downward displacement of 16 black points indicates an anti-conservative bias of the conventional F-test. In contrast, the proposed method produces  P values that are not anti-conservative. On the right panel, a set of 16 points are below the diagonal red line if the joint  distribution deviates from the Uniform(0,1) distribution. The proposed method adjusts for over-fitting of PCA and produces accurate estimates of association significance
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4325543&req=5

btu674-F6: QQ-plots of double KS test P values from 16 simulation scenarios versus the Uniform(0,1) distribution. For each of 500 independent studies per scenario, we tested for deviation of P values from Uniform(0,1), resulting in 500 KS test P values for each scenario. An individual point in the QQ-plot represents a double KS test P value for one scenario, comparing its 500 KS test P values to Uniform(0,1). On the left panel, the systematic downward displacement of 16 black points indicates an anti-conservative bias of the conventional F-test. In contrast, the proposed method produces P values that are not anti-conservative. On the right panel, a set of 16 points are below the diagonal red line if the joint distribution deviates from the Uniform(0,1) distribution. The proposed method adjusts for over-fitting of PCA and produces accurate estimates of association significance

Mentions: We carried out analogous analyses on 15 more simulation scenarios, detailed in Fig. 4. We used all possible combinations of the following: (1) either dichotomous or sinusoidal functions for L; (2) the parameters were simulated from either a Bernoulli or Uniform distribution; (3) m = 1000 or m = 5000 variables; and (4) the proportion of true variables set to either or . The proposed method was applied with , and to study the impact of the choice of the number of synthetic variables. For each scenario, we applied the joint criterion double KS evaluation (Supplementary Fig. S3), using 500 simulated data sets. The conventional F test method consistently produced anti-conservative P values, while the proposed method yielded accurately distributed P values (Fig. 6).Fig. 6.


Statistical significance of variables driving systematic variation in high-dimensional data.

Chung NC, Storey JD - Bioinformatics (2014)

QQ-plots of double KS test P values from 16 simulation scenarios versus the Uniform(0,1) distribution. For each of 500 independent studies per scenario, we tested for deviation of  P values from Uniform(0,1), resulting in 500 KS test P values for each scenario. An individual point in the QQ-plot represents a double KS test P value for one scenario, comparing its 500 KS test P values to Uniform(0,1). On the left panel, the systematic downward displacement of 16 black points indicates an anti-conservative bias of the conventional F-test. In contrast, the proposed method produces  P values that are not anti-conservative. On the right panel, a set of 16 points are below the diagonal red line if the joint  distribution deviates from the Uniform(0,1) distribution. The proposed method adjusts for over-fitting of PCA and produces accurate estimates of association significance
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4325543&req=5

btu674-F6: QQ-plots of double KS test P values from 16 simulation scenarios versus the Uniform(0,1) distribution. For each of 500 independent studies per scenario, we tested for deviation of P values from Uniform(0,1), resulting in 500 KS test P values for each scenario. An individual point in the QQ-plot represents a double KS test P value for one scenario, comparing its 500 KS test P values to Uniform(0,1). On the left panel, the systematic downward displacement of 16 black points indicates an anti-conservative bias of the conventional F-test. In contrast, the proposed method produces P values that are not anti-conservative. On the right panel, a set of 16 points are below the diagonal red line if the joint distribution deviates from the Uniform(0,1) distribution. The proposed method adjusts for over-fitting of PCA and produces accurate estimates of association significance
Mentions: We carried out analogous analyses on 15 more simulation scenarios, detailed in Fig. 4. We used all possible combinations of the following: (1) either dichotomous or sinusoidal functions for L; (2) the parameters were simulated from either a Bernoulli or Uniform distribution; (3) m = 1000 or m = 5000 variables; and (4) the proportion of true variables set to either or . The proposed method was applied with , and to study the impact of the choice of the number of synthetic variables. For each scenario, we applied the joint criterion double KS evaluation (Supplementary Fig. S3), using 500 simulated data sets. The conventional F test method consistently produced anti-conservative P values, while the proposed method yielded accurately distributed P values (Fig. 6).Fig. 6.

Bottom Line: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs.The proposed method can greatly simplify complex significance testing problems encountered in genomics and can be used to identify the genomic variables significantly associated with latent variables.Using our method, we find a greater enrichment for inflammatory-related gene sets compared to the original analysis that uses a clinically defined, although likely imprecise, phenotype.

View Article: PubMed Central - PubMed

Affiliation: Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA.

Show MeSH
Related in: MedlinePlus