Limits...
StickWRLD as an Interactive Visual Pre-Filter for Canceromics-Centric Expression Quantitative Trait Locus Data.

Rumpf RW, Wolock SL, Ray WC - Cancer Inform (2014)

Bottom Line: One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models.By allowing the user to dynamically modify the retention parameters (both P and the residual, r), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria - effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis.In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene-SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance.

View Article: PubMed Central - PubMed

Affiliation: The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA.

ABSTRACT
As datasets increase in complexity, the time required for analysis (both computational and human domain-expert) increases. One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models. Simple tables of summary statistics rarely provide an adequate picture of the patterns and details of the dataset to enable researchers to make well-informed decisions about the adequacy of the models they are constructing. We have developed a tool, StickWRLD, which allows the user to visually browse through their data, displaying all possible correlations. By allowing the user to dynamically modify the retention parameters (both P and the residual, r), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria - effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis. In this study, we applied StickWRLD to a semi-synthetic dataset constructed from two published human datasets. In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene-SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance.

No MeSH data available.


Initial view of dataset in StickWRLD using the default settings for p and residual (r). The fifteen columns in the foreground represent the genes in the dataset (the other columns represent SNPs); the green sphere most prevalent in each indicates that the state of “no change” in expression (up or down regulation) is the most common. At these settings the only correlations which can be seen are SNP to SNP relationships.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4214597&req=5

f1-cin-suppl.3-2014-063: Initial view of dataset in StickWRLD using the default settings for p and residual (r). The fifteen columns in the foreground represent the genes in the dataset (the other columns represent SNPs); the green sphere most prevalent in each indicates that the state of “no change” in expression (up or down regulation) is the most common. At these settings the only correlations which can be seen are SNP to SNP relationships.

Mentions: Figure 1 displays the initial view of the dataset when loaded into StickWRLD using the defaults of P = 0.05 and r = 0.1. Although several correlations are seen using these defaults, these are all in fact correlations between SNPs. This indicates that the co-occurrence of the correlated SNPs occurs at a significant frequency. For example, rs4783754 is correlated to both rs9922615 and rs9924505, suggesting that there are specific alleles of rs4783754 which tend to be co-inherited with specific alleles of rs9922615 and rs9924505. Similarly, CHR16:67319752 and CHR16:67381383 are also correlated to one another. This is completely expected for proximal SNPs where the frequency of recombination between them is low, but can reveal interesting distantly interacting loci when the SNPs are sufficiently distant that the probability of recombination approaches 50%. None of the patterns observed here, however, showed any correlation to changes in expression of the genes in the dataset.


StickWRLD as an Interactive Visual Pre-Filter for Canceromics-Centric Expression Quantitative Trait Locus Data.

Rumpf RW, Wolock SL, Ray WC - Cancer Inform (2014)

Initial view of dataset in StickWRLD using the default settings for p and residual (r). The fifteen columns in the foreground represent the genes in the dataset (the other columns represent SNPs); the green sphere most prevalent in each indicates that the state of “no change” in expression (up or down regulation) is the most common. At these settings the only correlations which can be seen are SNP to SNP relationships.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4214597&req=5

f1-cin-suppl.3-2014-063: Initial view of dataset in StickWRLD using the default settings for p and residual (r). The fifteen columns in the foreground represent the genes in the dataset (the other columns represent SNPs); the green sphere most prevalent in each indicates that the state of “no change” in expression (up or down regulation) is the most common. At these settings the only correlations which can be seen are SNP to SNP relationships.
Mentions: Figure 1 displays the initial view of the dataset when loaded into StickWRLD using the defaults of P = 0.05 and r = 0.1. Although several correlations are seen using these defaults, these are all in fact correlations between SNPs. This indicates that the co-occurrence of the correlated SNPs occurs at a significant frequency. For example, rs4783754 is correlated to both rs9922615 and rs9924505, suggesting that there are specific alleles of rs4783754 which tend to be co-inherited with specific alleles of rs9922615 and rs9924505. Similarly, CHR16:67319752 and CHR16:67381383 are also correlated to one another. This is completely expected for proximal SNPs where the frequency of recombination between them is low, but can reveal interesting distantly interacting loci when the SNPs are sufficiently distant that the probability of recombination approaches 50%. None of the patterns observed here, however, showed any correlation to changes in expression of the genes in the dataset.

Bottom Line: One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models.By allowing the user to dynamically modify the retention parameters (both P and the residual, r), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria - effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis.In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene-SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance.

View Article: PubMed Central - PubMed

Affiliation: The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA.

ABSTRACT
As datasets increase in complexity, the time required for analysis (both computational and human domain-expert) increases. One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models. Simple tables of summary statistics rarely provide an adequate picture of the patterns and details of the dataset to enable researchers to make well-informed decisions about the adequacy of the models they are constructing. We have developed a tool, StickWRLD, which allows the user to visually browse through their data, displaying all possible correlations. By allowing the user to dynamically modify the retention parameters (both P and the residual, r), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria - effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis. In this study, we applied StickWRLD to a semi-synthetic dataset constructed from two published human datasets. In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene-SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance.

No MeSH data available.