Limits...
DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection.

Chen TW, Wu TH, Ng WV, Lin WC - BMC Bioinformatics (2010)

Bottom Line: Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity.Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes.The output results of DODO are highly comparable with other known ortholog databases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.

ABSTRACT

Background: Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired.

Results: An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases.

Conclusions: DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from http://140.109.42.19:16080/dodo_web/home.htm.

Show MeSH
Choosing more than one anchor genomes can rescue missing ortholog groups. This cartoon figure illustrated examples of three different ortholog group distributions in four species A, B, C and D. Four rectangles in gray line stand for four different genomes. Protein sequences and domain are shown as line and rectangles. As shown in the figure, there are total three different ortholog groups in which group 1 exist in all genomes, group 2 is a clade 2 specific ortholog group and group 3 had a gene miss event in genome A. When choose species A in clade 1 as the anchor genome, DODO will only report group 1 and both group 2 and group 3 will be missed. Those missing ortholog groups could be identified if choose multiple genomes as anchor genomes in DODO pipeline.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2957689&req=5

Figure 5: Choosing more than one anchor genomes can rescue missing ortholog groups. This cartoon figure illustrated examples of three different ortholog group distributions in four species A, B, C and D. Four rectangles in gray line stand for four different genomes. Protein sequences and domain are shown as line and rectangles. As shown in the figure, there are total three different ortholog groups in which group 1 exist in all genomes, group 2 is a clade 2 specific ortholog group and group 3 had a gene miss event in genome A. When choose species A in clade 1 as the anchor genome, DODO will only report group 1 and both group 2 and group 3 will be missed. Those missing ortholog groups could be identified if choose multiple genomes as anchor genomes in DODO pipeline.

Mentions: The results also show that DODO is useful in ortholog detection between distantly related genomes. For a database having multiple genomes, specifically multiple distantly related genomes, it is conceivable that detection of ortholog groups may not be sufficient by a single anchor genome. There are some clade-specific genes which essentially do not have ortholog relationship to genomes in other clades. A clade-specific ortholog group can only be detected when choosing a genome within that clade as an anchor genome. For those genes, the ortholog relationship can be rescued by setting more than one anchor genome. As an example shown in Figure 5, the clade 2 specific ortholog group - group 2, could be rescued if choose genome in clade 2 (genome C or genome D) as extra anchor genome. As show in Figure 5, this strategy could also be useful in the event of gene loss in the anchor genome.


DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection.

Chen TW, Wu TH, Ng WV, Lin WC - BMC Bioinformatics (2010)

Choosing more than one anchor genomes can rescue missing ortholog groups. This cartoon figure illustrated examples of three different ortholog group distributions in four species A, B, C and D. Four rectangles in gray line stand for four different genomes. Protein sequences and domain are shown as line and rectangles. As shown in the figure, there are total three different ortholog groups in which group 1 exist in all genomes, group 2 is a clade 2 specific ortholog group and group 3 had a gene miss event in genome A. When choose species A in clade 1 as the anchor genome, DODO will only report group 1 and both group 2 and group 3 will be missed. Those missing ortholog groups could be identified if choose multiple genomes as anchor genomes in DODO pipeline.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2957689&req=5

Figure 5: Choosing more than one anchor genomes can rescue missing ortholog groups. This cartoon figure illustrated examples of three different ortholog group distributions in four species A, B, C and D. Four rectangles in gray line stand for four different genomes. Protein sequences and domain are shown as line and rectangles. As shown in the figure, there are total three different ortholog groups in which group 1 exist in all genomes, group 2 is a clade 2 specific ortholog group and group 3 had a gene miss event in genome A. When choose species A in clade 1 as the anchor genome, DODO will only report group 1 and both group 2 and group 3 will be missed. Those missing ortholog groups could be identified if choose multiple genomes as anchor genomes in DODO pipeline.
Mentions: The results also show that DODO is useful in ortholog detection between distantly related genomes. For a database having multiple genomes, specifically multiple distantly related genomes, it is conceivable that detection of ortholog groups may not be sufficient by a single anchor genome. There are some clade-specific genes which essentially do not have ortholog relationship to genomes in other clades. A clade-specific ortholog group can only be detected when choosing a genome within that clade as an anchor genome. For those genes, the ortholog relationship can be rescued by setting more than one anchor genome. As an example shown in Figure 5, the clade 2 specific ortholog group - group 2, could be rescued if choose genome in clade 2 (genome C or genome D) as extra anchor genome. As show in Figure 5, this strategy could also be useful in the event of gene loss in the anchor genome.

Bottom Line: Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity.Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes.The output results of DODO are highly comparable with other known ortholog databases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.

ABSTRACT

Background: Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired.

Results: An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases.

Conclusions: DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from http://140.109.42.19:16080/dodo_web/home.htm.

Show MeSH