Limits...
A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

Xiang Z, Qin T, Qin ZS, He Y - BMC Syst Biol (2013)

Bottom Line: Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level.The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships.The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining system that effectively predicts implicit gene-gene interaction relationships and networks in a genome-wide scope.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level.

Results: The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists.

Conclusions: The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining system that effectively predicts implicit gene-gene interaction relationships and networks in a genome-wide scope.

Show MeSH

Related in: MedlinePlus

Analysis of the term “Neutrophil Activation” from the GenoMesh MeSHBrowse website. After browsing the MeSH hierarchical tree from “Phenomena and Processes” → “Immune System Phenomena” → “Immune System Processes” → “Neutrophil Activation”, 23 E. coli genes were found to be associated with the MeSH term “Neutrophil Activation". The related genes and gene pairs were then provided next to the hierarchical tree. Furthermore, a network of these 23 E. coli genes was automatically generated (note: the network image will only be generated if the gene number is less than 100). The gray or red-colored edges represent respectively interactions or predicted interactions. The GenoMesh annotation of the gene pair ytjC and yjhR is provided when a user moves the mouse cursor over the red line (edge) linking these two genes. A click on this link would lead the page to a detailed analysis of the gene pair (not shown).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3852244&req=5

Figure 6: Analysis of the term “Neutrophil Activation” from the GenoMesh MeSHBrowse website. After browsing the MeSH hierarchical tree from “Phenomena and Processes” → “Immune System Phenomena” → “Immune System Processes” → “Neutrophil Activation”, 23 E. coli genes were found to be associated with the MeSH term “Neutrophil Activation". The related genes and gene pairs were then provided next to the hierarchical tree. Furthermore, a network of these 23 E. coli genes was automatically generated (note: the network image will only be generated if the gene number is less than 100). The gray or red-colored edges represent respectively interactions or predicted interactions. The GenoMesh annotation of the gene pair ytjC and yjhR is provided when a user moves the mouse cursor over the red line (edge) linking these two genes. A click on this link would lead the page to a detailed analysis of the gene pair (not shown).

Mentions: The MeSH terms are laid out in a hierarchical tree structure. Different MeSH terms are associated with 0, 1, or many genes. Therefore, it is possible to lay out the MeSH hierarchical structure and display the genes and gene network associated with any specific MeSH term. Based on this strategy, we have developed a MeSHBrowse tool (http://genomesh.hegroup.org/meshbrowse/). For example, 23 E. coli genes have been found to be associated with the MeSH term “Neutrophil Activation” with a specific MeSH hierarchy (Figure 6). These 23 genes form the nodes of a gene network which includes the gene-gene associations with known literature reports (grey-colored edges) and predicted implicit gene-gene associations (red-colored edges).


A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

Xiang Z, Qin T, Qin ZS, He Y - BMC Syst Biol (2013)

Analysis of the term “Neutrophil Activation” from the GenoMesh MeSHBrowse website. After browsing the MeSH hierarchical tree from “Phenomena and Processes” → “Immune System Phenomena” → “Immune System Processes” → “Neutrophil Activation”, 23 E. coli genes were found to be associated with the MeSH term “Neutrophil Activation". The related genes and gene pairs were then provided next to the hierarchical tree. Furthermore, a network of these 23 E. coli genes was automatically generated (note: the network image will only be generated if the gene number is less than 100). The gray or red-colored edges represent respectively interactions or predicted interactions. The GenoMesh annotation of the gene pair ytjC and yjhR is provided when a user moves the mouse cursor over the red line (edge) linking these two genes. A click on this link would lead the page to a detailed analysis of the gene pair (not shown).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3852244&req=5

Figure 6: Analysis of the term “Neutrophil Activation” from the GenoMesh MeSHBrowse website. After browsing the MeSH hierarchical tree from “Phenomena and Processes” → “Immune System Phenomena” → “Immune System Processes” → “Neutrophil Activation”, 23 E. coli genes were found to be associated with the MeSH term “Neutrophil Activation". The related genes and gene pairs were then provided next to the hierarchical tree. Furthermore, a network of these 23 E. coli genes was automatically generated (note: the network image will only be generated if the gene number is less than 100). The gray or red-colored edges represent respectively interactions or predicted interactions. The GenoMesh annotation of the gene pair ytjC and yjhR is provided when a user moves the mouse cursor over the red line (edge) linking these two genes. A click on this link would lead the page to a detailed analysis of the gene pair (not shown).
Mentions: The MeSH terms are laid out in a hierarchical tree structure. Different MeSH terms are associated with 0, 1, or many genes. Therefore, it is possible to lay out the MeSH hierarchical structure and display the genes and gene network associated with any specific MeSH term. Based on this strategy, we have developed a MeSHBrowse tool (http://genomesh.hegroup.org/meshbrowse/). For example, 23 E. coli genes have been found to be associated with the MeSH term “Neutrophil Activation” with a specific MeSH hierarchy (Figure 6). These 23 genes form the nodes of a gene network which includes the gene-gene associations with known literature reports (grey-colored edges) and predicted implicit gene-gene associations (red-colored edges).

Bottom Line: Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level.The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships.The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining system that effectively predicts implicit gene-gene interaction relationships and networks in a genome-wide scope.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level.

Results: The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists.

Conclusions: The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining system that effectively predicts implicit gene-gene interaction relationships and networks in a genome-wide scope.

Show MeSH
Related in: MedlinePlus