Limits...
Inferring pathway dysregulation in cancers from multiple types of omic data.

MacNeil SM, Johnson WE, Li DY, Piccolo SR, Bild AH - Genome Med (2015)

Bottom Line: Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common.Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level.GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns.

View Article: PubMed Central - PubMed

Affiliation: Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA ; Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA.

ABSTRACT
Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common. Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level. GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns. We compare GSOA to alternative methods and demonstrate its ability to identify pathways known to play a role in various cancer phenotypes. Software implementing the GSOA method is freely available from https://bitbucket.org/srp33/gsoa.

No MeSH data available.


Related in: MedlinePlus

Results of cross-algorithm comparisons on simulated data. We compared GSOA against other methods using simulated data that contained interdependence among variables. For various FDR thresholds, we calculated the proportion of simulated gene sets containing signal that were considered significant and the proportion of gene sets containing only random data that were considered insignificant. Panels a-c show results for balanced data (50/50 sample split); Panels d-f show results for unbalanced data (90/10 sample split). See also Additional file 1: Fig. S4
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4499940&req=5

Fig2: Results of cross-algorithm comparisons on simulated data. We compared GSOA against other methods using simulated data that contained interdependence among variables. For various FDR thresholds, we calculated the proportion of simulated gene sets containing signal that were considered significant and the proportion of gene sets containing only random data that were considered insignificant. Panels a-c show results for balanced data (50/50 sample split); Panels d-f show results for unbalanced data (90/10 sample split). See also Additional file 1: Fig. S4

Mentions: Using the simulated data, we evaluated the balance between sensitivity and specificity for each method. In this context, sensitivity refers to an algorithm’s ability to identify as significant the gene sets that contained signal genes. Specificity refers to the algorithm’s ability to correctly classify (as insignificant) any gene set that contained no signal gene. We used the Matthews Correlation Coefficient (MCC) to quantify the balance between sensitivity and specificity [34]. For each gene set, the predictor was the FDR value that had been assigned to the gene set by each algorithm. Across all of the FDR thresholds that we tested, GSOA attained considerably higher MCC values than the competing methods (Fig. 2a). In particular, at relatively stringent FDR thresholds, as would be used in analyzing omic data, GSOA was much more sensitive than the other methods (Fig. 2b) and attained similar levels of specificity (Fig. 2c). For example, at an FDR threshold of 0.05, GSOA produced 243 (26 %) more true positives than GSAA, the best competing method (Additional file 1: Table S1A). GSOA produced 11 false positives (1 % of all signal gene sets), which was only three more than GSAA. At an FDR threshold of 0.20, GSOA and GAGE attained the same MCC value; GSOA produced 150 more true positives than GAGE, whereas GAGE produced 123 fewer false positives (Additional file 1: Table S1B).Fig. 2


Inferring pathway dysregulation in cancers from multiple types of omic data.

MacNeil SM, Johnson WE, Li DY, Piccolo SR, Bild AH - Genome Med (2015)

Results of cross-algorithm comparisons on simulated data. We compared GSOA against other methods using simulated data that contained interdependence among variables. For various FDR thresholds, we calculated the proportion of simulated gene sets containing signal that were considered significant and the proportion of gene sets containing only random data that were considered insignificant. Panels a-c show results for balanced data (50/50 sample split); Panels d-f show results for unbalanced data (90/10 sample split). See also Additional file 1: Fig. S4
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4499940&req=5

Fig2: Results of cross-algorithm comparisons on simulated data. We compared GSOA against other methods using simulated data that contained interdependence among variables. For various FDR thresholds, we calculated the proportion of simulated gene sets containing signal that were considered significant and the proportion of gene sets containing only random data that were considered insignificant. Panels a-c show results for balanced data (50/50 sample split); Panels d-f show results for unbalanced data (90/10 sample split). See also Additional file 1: Fig. S4
Mentions: Using the simulated data, we evaluated the balance between sensitivity and specificity for each method. In this context, sensitivity refers to an algorithm’s ability to identify as significant the gene sets that contained signal genes. Specificity refers to the algorithm’s ability to correctly classify (as insignificant) any gene set that contained no signal gene. We used the Matthews Correlation Coefficient (MCC) to quantify the balance between sensitivity and specificity [34]. For each gene set, the predictor was the FDR value that had been assigned to the gene set by each algorithm. Across all of the FDR thresholds that we tested, GSOA attained considerably higher MCC values than the competing methods (Fig. 2a). In particular, at relatively stringent FDR thresholds, as would be used in analyzing omic data, GSOA was much more sensitive than the other methods (Fig. 2b) and attained similar levels of specificity (Fig. 2c). For example, at an FDR threshold of 0.05, GSOA produced 243 (26 %) more true positives than GSAA, the best competing method (Additional file 1: Table S1A). GSOA produced 11 false positives (1 % of all signal gene sets), which was only three more than GSAA. At an FDR threshold of 0.20, GSOA and GAGE attained the same MCC value; GSOA produced 150 more true positives than GAGE, whereas GAGE produced 123 fewer false positives (Additional file 1: Table S1B).Fig. 2

Bottom Line: Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common.Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level.GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns.

View Article: PubMed Central - PubMed

Affiliation: Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA ; Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA.

ABSTRACT
Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common. Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level. GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns. We compare GSOA to alternative methods and demonstrate its ability to identify pathways known to play a role in various cancer phenotypes. Software implementing the GSOA method is freely available from https://bitbucket.org/srp33/gsoa.

No MeSH data available.


Related in: MedlinePlus