Limits...
Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system.

Silva DG, Schönbach C, Brusic V, Socha LA, Nagashima T, Petrovsky N - BMC Genomics (2004)

Bottom Line: These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders.Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease.For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature.

View Article: PubMed Central - HTML - PubMed

Affiliation: Medical Informatics Centre, University of Canberra, ACT 2601 Australia. diego.silva@anu.edu.au

ABSTRACT

Background: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern.

Results: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70-85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders.

Conclusions: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.

Show MeSH
Flow chart of method used for the identification of "pathologs". Obtained from the FANTOM2 dataset, "similar to" clones were analysed using a manual (left) and a semi-automated approach (right) to identify "patholog" genes. HDR clones: clones with Human Disease Relationship.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC420239&req=5

Figure 1: Flow chart of method used for the identification of "pathologs". Obtained from the FANTOM2 dataset, "similar to" clones were analysed using a manual (left) and a semi-automated approach (right) to identify "patholog" genes. HDR clones: clones with Human Disease Relationship.

Mentions: We identified 182 candidate pathologs from amongst 2578 FANTOM2 mouse "similar to" cDNA transcripts (Figure 1). Each of the transcripts representing these targets shows 70–85% identity over more than 70% of its length to a known human disease related gene or protein found by sequence similarity comparisons (see Methods). Of these, 146 were identified by manual and 133 by a semi-automated approach with 97 (53.3%) of targets being detected by both methods. The manual approach uniquely detected 49 (26.9%) human disease-related gene targets and semi-automated approach uniquely detected 36 (19.8%) targets.


Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system.

Silva DG, Schönbach C, Brusic V, Socha LA, Nagashima T, Petrovsky N - BMC Genomics (2004)

Flow chart of method used for the identification of "pathologs". Obtained from the FANTOM2 dataset, "similar to" clones were analysed using a manual (left) and a semi-automated approach (right) to identify "patholog" genes. HDR clones: clones with Human Disease Relationship.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC420239&req=5

Figure 1: Flow chart of method used for the identification of "pathologs". Obtained from the FANTOM2 dataset, "similar to" clones were analysed using a manual (left) and a semi-automated approach (right) to identify "patholog" genes. HDR clones: clones with Human Disease Relationship.
Mentions: We identified 182 candidate pathologs from amongst 2578 FANTOM2 mouse "similar to" cDNA transcripts (Figure 1). Each of the transcripts representing these targets shows 70–85% identity over more than 70% of its length to a known human disease related gene or protein found by sequence similarity comparisons (see Methods). Of these, 146 were identified by manual and 133 by a semi-automated approach with 97 (53.3%) of targets being detected by both methods. The manual approach uniquely detected 49 (26.9%) human disease-related gene targets and semi-automated approach uniquely detected 36 (19.8%) targets.

Bottom Line: These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders.Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease.For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature.

View Article: PubMed Central - HTML - PubMed

Affiliation: Medical Informatics Centre, University of Canberra, ACT 2601 Australia. diego.silva@anu.edu.au

ABSTRACT

Background: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern.

Results: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70-85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders.

Conclusions: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.

Show MeSH