Limits...
StickWRLD as an Interactive Visual Pre-Filter for Canceromics-Centric Expression Quantitative Trait Locus Data.

Rumpf RW, Wolock SL, Ray WC - Cancer Inform (2014)

Bottom Line: One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models.By allowing the user to dynamically modify the retention parameters (both P and the residual, r), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria - effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis.In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene-SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance.

View Article: PubMed Central - PubMed

Affiliation: The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA.

ABSTRACT
As datasets increase in complexity, the time required for analysis (both computational and human domain-expert) increases. One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models. Simple tables of summary statistics rarely provide an adequate picture of the patterns and details of the dataset to enable researchers to make well-informed decisions about the adequacy of the models they are constructing. We have developed a tool, StickWRLD, which allows the user to visually browse through their data, displaying all possible correlations. By allowing the user to dynamically modify the retention parameters (both P and the residual, r), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria - effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis. In this study, we applied StickWRLD to a semi-synthetic dataset constructed from two published human datasets. In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene-SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance.

No MeSH data available.


Tuning the residual to a lower value reveals numerous additional correlations, including one between a gene (CHD1) and a SNP (16:67369626). The correlation of interest (gene to SNP) is easily seen by it’s sudden appearance when the residual is modified. In a more traditional analysis, this “signal” would easily be overwhelmed by the amount of “noise” – while the SNP to SNP relationship are of interest, they are not the primary concern in this analysis, and the ability to rapidly isolate the signal from the noise visually makes allowed us to quickly determine which relationships to focus on.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4214597&req=5

f2-cin-suppl.3-2014-063: Tuning the residual to a lower value reveals numerous additional correlations, including one between a gene (CHD1) and a SNP (16:67369626). The correlation of interest (gene to SNP) is easily seen by it’s sudden appearance when the residual is modified. In a more traditional analysis, this “signal” would easily be overwhelmed by the amount of “noise” – while the SNP to SNP relationship are of interest, they are not the primary concern in this analysis, and the ability to rapidly isolate the signal from the noise visually makes allowed us to quickly determine which relationships to focus on.

Mentions: Tuning the residual down by increments reveals additional correlations – again all SNP to SNP – until the residual is reduced to 0.05 (Fig. 2). Here, we see our first significant correlation (with P = 0.05) between expression levels of a gene and an SNP – specifically, CDH1 and rs35255374. Dialing the residual down to 0.025 reveals three additional gene to SNP relationships: CHD1 to 16:67369626; PCDH1 to 16:67374748; and CDH22 to rs35255374 (Fig. 3).


StickWRLD as an Interactive Visual Pre-Filter for Canceromics-Centric Expression Quantitative Trait Locus Data.

Rumpf RW, Wolock SL, Ray WC - Cancer Inform (2014)

Tuning the residual to a lower value reveals numerous additional correlations, including one between a gene (CHD1) and a SNP (16:67369626). The correlation of interest (gene to SNP) is easily seen by it’s sudden appearance when the residual is modified. In a more traditional analysis, this “signal” would easily be overwhelmed by the amount of “noise” – while the SNP to SNP relationship are of interest, they are not the primary concern in this analysis, and the ability to rapidly isolate the signal from the noise visually makes allowed us to quickly determine which relationships to focus on.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4214597&req=5

f2-cin-suppl.3-2014-063: Tuning the residual to a lower value reveals numerous additional correlations, including one between a gene (CHD1) and a SNP (16:67369626). The correlation of interest (gene to SNP) is easily seen by it’s sudden appearance when the residual is modified. In a more traditional analysis, this “signal” would easily be overwhelmed by the amount of “noise” – while the SNP to SNP relationship are of interest, they are not the primary concern in this analysis, and the ability to rapidly isolate the signal from the noise visually makes allowed us to quickly determine which relationships to focus on.
Mentions: Tuning the residual down by increments reveals additional correlations – again all SNP to SNP – until the residual is reduced to 0.05 (Fig. 2). Here, we see our first significant correlation (with P = 0.05) between expression levels of a gene and an SNP – specifically, CDH1 and rs35255374. Dialing the residual down to 0.025 reveals three additional gene to SNP relationships: CHD1 to 16:67369626; PCDH1 to 16:67374748; and CDH22 to rs35255374 (Fig. 3).

Bottom Line: One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models.By allowing the user to dynamically modify the retention parameters (both P and the residual, r), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria - effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis.In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene-SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance.

View Article: PubMed Central - PubMed

Affiliation: The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA.

ABSTRACT
As datasets increase in complexity, the time required for analysis (both computational and human domain-expert) increases. One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models. Simple tables of summary statistics rarely provide an adequate picture of the patterns and details of the dataset to enable researchers to make well-informed decisions about the adequacy of the models they are constructing. We have developed a tool, StickWRLD, which allows the user to visually browse through their data, displaying all possible correlations. By allowing the user to dynamically modify the retention parameters (both P and the residual, r), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria - effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis. In this study, we applied StickWRLD to a semi-synthetic dataset constructed from two published human datasets. In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene-SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance.

No MeSH data available.