Limits...
Text mining in cancer gene and pathway prioritization.

Luo Y, Riedlinger G, Szolovits P - Cancer Inform (2014)

Bottom Line: Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed.A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways.We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

View Article: PubMed Central - PubMed

Affiliation: Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.

ABSTRACT
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

No MeSH data available.


Related in: MedlinePlus

(A) The network representation for the example sentence: “More recent data have suggested that targeting mutations in BRAF, AKT1, ERBB2 and PIK3CA and fusions that involve ROS1 and RET may also be successful”. (B) and (C) are two sub-networks of (A).
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4216063&req=5

f2-cin-suppl.1-2014-069: (A) The network representation for the example sentence: “More recent data have suggested that targeting mutations in BRAF, AKT1, ERBB2 and PIK3CA and fusions that involve ROS1 and RET may also be successful”. (B) and (C) are two sub-networks of (A).

Mentions: The task of automatically annotating biomedical text with semantic information is an area of active research of medical Natural Language Processing (NLP). There are existing methods that extract named entities such as genes and proteins, as well as their relations (see related work section of124 for an overview). Previous gene prioritization methods have partially used these methods to identify gene and protein names, but not as much their relations, when constructing gene and protein networks. This is partly due to the fact that most NLP based semantic relation extraction tools identify only binary relations that are too coarse for gene prioritization tasks so that they do not offer much more information compared to simple co-occurrence statistics. Recently, Luo et al.124 proposed an algorithm that translates text into a network representation, where the nodes of the network may be nominal concepts such as genes and proteins or relational concepts such as a verb specifying an interaction. The edges are syntactic dependency links. We give an example sentence–network translation in Figure 2, which shows the network representation for a sentence from the first paragraph in an example paper,125 along with two of its sub-networks.


Text mining in cancer gene and pathway prioritization.

Luo Y, Riedlinger G, Szolovits P - Cancer Inform (2014)

(A) The network representation for the example sentence: “More recent data have suggested that targeting mutations in BRAF, AKT1, ERBB2 and PIK3CA and fusions that involve ROS1 and RET may also be successful”. (B) and (C) are two sub-networks of (A).
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4216063&req=5

f2-cin-suppl.1-2014-069: (A) The network representation for the example sentence: “More recent data have suggested that targeting mutations in BRAF, AKT1, ERBB2 and PIK3CA and fusions that involve ROS1 and RET may also be successful”. (B) and (C) are two sub-networks of (A).
Mentions: The task of automatically annotating biomedical text with semantic information is an area of active research of medical Natural Language Processing (NLP). There are existing methods that extract named entities such as genes and proteins, as well as their relations (see related work section of124 for an overview). Previous gene prioritization methods have partially used these methods to identify gene and protein names, but not as much their relations, when constructing gene and protein networks. This is partly due to the fact that most NLP based semantic relation extraction tools identify only binary relations that are too coarse for gene prioritization tasks so that they do not offer much more information compared to simple co-occurrence statistics. Recently, Luo et al.124 proposed an algorithm that translates text into a network representation, where the nodes of the network may be nominal concepts such as genes and proteins or relational concepts such as a verb specifying an interaction. The edges are syntactic dependency links. We give an example sentence–network translation in Figure 2, which shows the network representation for a sentence from the first paragraph in an example paper,125 along with two of its sub-networks.

Bottom Line: Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed.A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways.We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

View Article: PubMed Central - PubMed

Affiliation: Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.

ABSTRACT
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

No MeSH data available.


Related in: MedlinePlus