Limits...
Text mining in cancer gene and pathway prioritization.

Luo Y, Riedlinger G, Szolovits P - Cancer Inform (2014)

Bottom Line: Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed.A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways.We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

View Article: PubMed Central - PubMed

Affiliation: Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.

ABSTRACT
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

No MeSH data available.


Related in: MedlinePlus

The Omic hierarchy on the left, biological networks on the right, and their interactions. TF stands for transcription factor, The figure shows some typical network interaction scenarios such as: a signaling network activates transcription factors for a regulatory network; transcription factor complexes that control a regulatory network may be formed through protein interactions (eg, binding); a metabolic network may produce energy (through catabolism) and amino acids (through anabolism) that are necessary for other functional networks; and enzymes that catalyze many metabolic networks are in fact proteins and are produced and regulated by other biological networks. Note that regulatory networks often have participants from multiple levels of the Omic hierarchy.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4216063&req=5

f1-cin-suppl.1-2014-069: The Omic hierarchy on the left, biological networks on the right, and their interactions. TF stands for transcription factor, The figure shows some typical network interaction scenarios such as: a signaling network activates transcription factors for a regulatory network; transcription factor complexes that control a regulatory network may be formed through protein interactions (eg, binding); a metabolic network may produce energy (through catabolism) and amino acids (through anabolism) that are necessary for other functional networks; and enzymes that catalyze many metabolic networks are in fact proteins and are produced and regulated by other biological networks. Note that regulatory networks often have participants from multiple levels of the Omic hierarchy.

Mentions: To understand the pathophysiological mechanisms, it is insufficient to investigate only at the level of single-nucleotide polymorphisms (SNPs) or copy number variations (CNV) as in genome-wide association study (GWAS). This is partly due to the lack of statistical strength that plagues GWAS,1 which stems from the need to correct for multiple testing, and makes it difficult to implicate causative genes. On the other hand, cancers that involve complex and epistatic genetic traits cannot be adequately explained by additive genetic models. Moreover, cancer phenotypes necessarily involve the full scope of Omics including genomics, transcriptomics, epigenomics, proteomics, and metabolomics, as well as their interaction and correlation with pheno-types. Rather, levels in the Omic hierarchy need to be considered simultaneously and interactively. This is because the genes, RNAs, proteins, and epigenetic factors interplay in the form of signaling networks, metabolic networks, regulatory networks, etc. Moreover, biological networks further interact and exchange information among each other, as shown in Figure 1.


Text mining in cancer gene and pathway prioritization.

Luo Y, Riedlinger G, Szolovits P - Cancer Inform (2014)

The Omic hierarchy on the left, biological networks on the right, and their interactions. TF stands for transcription factor, The figure shows some typical network interaction scenarios such as: a signaling network activates transcription factors for a regulatory network; transcription factor complexes that control a regulatory network may be formed through protein interactions (eg, binding); a metabolic network may produce energy (through catabolism) and amino acids (through anabolism) that are necessary for other functional networks; and enzymes that catalyze many metabolic networks are in fact proteins and are produced and regulated by other biological networks. Note that regulatory networks often have participants from multiple levels of the Omic hierarchy.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4216063&req=5

f1-cin-suppl.1-2014-069: The Omic hierarchy on the left, biological networks on the right, and their interactions. TF stands for transcription factor, The figure shows some typical network interaction scenarios such as: a signaling network activates transcription factors for a regulatory network; transcription factor complexes that control a regulatory network may be formed through protein interactions (eg, binding); a metabolic network may produce energy (through catabolism) and amino acids (through anabolism) that are necessary for other functional networks; and enzymes that catalyze many metabolic networks are in fact proteins and are produced and regulated by other biological networks. Note that regulatory networks often have participants from multiple levels of the Omic hierarchy.
Mentions: To understand the pathophysiological mechanisms, it is insufficient to investigate only at the level of single-nucleotide polymorphisms (SNPs) or copy number variations (CNV) as in genome-wide association study (GWAS). This is partly due to the lack of statistical strength that plagues GWAS,1 which stems from the need to correct for multiple testing, and makes it difficult to implicate causative genes. On the other hand, cancers that involve complex and epistatic genetic traits cannot be adequately explained by additive genetic models. Moreover, cancer phenotypes necessarily involve the full scope of Omics including genomics, transcriptomics, epigenomics, proteomics, and metabolomics, as well as their interaction and correlation with pheno-types. Rather, levels in the Omic hierarchy need to be considered simultaneously and interactively. This is because the genes, RNAs, proteins, and epigenetic factors interplay in the form of signaling networks, metabolic networks, regulatory networks, etc. Moreover, biological networks further interact and exchange information among each other, as shown in Figure 1.

Bottom Line: Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed.A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways.We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

View Article: PubMed Central - PubMed

Affiliation: Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.

ABSTRACT
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

No MeSH data available.


Related in: MedlinePlus