Limits...
Inferring pathway dysregulation in cancers from multiple types of omic data.

MacNeil SM, Johnson WE, Li DY, Piccolo SR, Bild AH - Genome Med (2015)

Bottom Line: Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common.Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level.GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns.

View Article: PubMed Central - PubMed

Affiliation: Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA ; Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA.

ABSTRACT
Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common. Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level. GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns. We compare GSOA to alternative methods and demonstrate its ability to identify pathways known to play a role in various cancer phenotypes. Software implementing the GSOA method is freely available from https://bitbucket.org/srp33/gsoa.

No MeSH data available.


Related in: MedlinePlus

High-level description of the GSOA methodology. After mapping input data to gene sets, GSOA uses the SVM algorithm to assess how accurately samples from the two classes can be classified. Gene sets for which relatively high classification accuracy is attained are considered most likely to play a role in the biological question of interest
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4499940&req=5

Fig1: High-level description of the GSOA methodology. After mapping input data to gene sets, GSOA uses the SVM algorithm to assess how accurately samples from the two classes can be classified. Gene sets for which relatively high classification accuracy is attained are considered most likely to play a role in the biological question of interest

Mentions: The GSOA code implementation is freely available at [28]. A schematic overview of the GSOA method is shown in Fig. 1. Required inputs are: (1) a data file containing omic measurements for each sample; (2) a data file indicating the condition or phenotype status for each sample; and (3) a file that indicates which genes map to which gene sets. Data file #1 uses a simple matrix format in which samples represent columns and rows represent genomic features. This file also should contain a header row with an identifier for each sample. Each row should start with a value that indicates the gene name. Multiple rows per gene may be listed - for example, when an omic-profiling technology produces multiple data values per gene. When multiple types of omic data are available for the same samples, multiple data files can be specified using wildcards. Data file #2 contains two columns; the first value in each row should be a sample identifier (and should correspond exactly with the identifiers in data file #1), and the second value should indicate which class (for example, condition or phenotype status) the sample represents. Data file #3 should be in Gene Matrix Transposed (GMT) format as used in the Molecular Signatures Database [29]. The first value in each row is the gene set name, the second value is a descriptor, and the remaining, tab-separated values are the genes associated with that gene set. Data files #2 and #3 should contain no header row, and all files should use tab characters as delimiters. Our software implementation of GSOA provides examples of each of these file types.Fig. 1


Inferring pathway dysregulation in cancers from multiple types of omic data.

MacNeil SM, Johnson WE, Li DY, Piccolo SR, Bild AH - Genome Med (2015)

High-level description of the GSOA methodology. After mapping input data to gene sets, GSOA uses the SVM algorithm to assess how accurately samples from the two classes can be classified. Gene sets for which relatively high classification accuracy is attained are considered most likely to play a role in the biological question of interest
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4499940&req=5

Fig1: High-level description of the GSOA methodology. After mapping input data to gene sets, GSOA uses the SVM algorithm to assess how accurately samples from the two classes can be classified. Gene sets for which relatively high classification accuracy is attained are considered most likely to play a role in the biological question of interest
Mentions: The GSOA code implementation is freely available at [28]. A schematic overview of the GSOA method is shown in Fig. 1. Required inputs are: (1) a data file containing omic measurements for each sample; (2) a data file indicating the condition or phenotype status for each sample; and (3) a file that indicates which genes map to which gene sets. Data file #1 uses a simple matrix format in which samples represent columns and rows represent genomic features. This file also should contain a header row with an identifier for each sample. Each row should start with a value that indicates the gene name. Multiple rows per gene may be listed - for example, when an omic-profiling technology produces multiple data values per gene. When multiple types of omic data are available for the same samples, multiple data files can be specified using wildcards. Data file #2 contains two columns; the first value in each row should be a sample identifier (and should correspond exactly with the identifiers in data file #1), and the second value should indicate which class (for example, condition or phenotype status) the sample represents. Data file #3 should be in Gene Matrix Transposed (GMT) format as used in the Molecular Signatures Database [29]. The first value in each row is the gene set name, the second value is a descriptor, and the remaining, tab-separated values are the genes associated with that gene set. Data files #2 and #3 should contain no header row, and all files should use tab characters as delimiters. Our software implementation of GSOA provides examples of each of these file types.Fig. 1

Bottom Line: Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common.Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level.GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns.

View Article: PubMed Central - PubMed

Affiliation: Department of Oncological Sciences, University of Utah, Salt Lake City, UT USA ; Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT USA.

ABSTRACT
Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common. Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level. GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns. We compare GSOA to alternative methods and demonstrate its ability to identify pathways known to play a role in various cancer phenotypes. Software implementing the GSOA method is freely available from https://bitbucket.org/srp33/gsoa.

No MeSH data available.


Related in: MedlinePlus