Limits...
DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection.

Chen TW, Wu TH, Ng WV, Lin WC - BMC Bioinformatics (2010)

Bottom Line: Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity.Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes.The output results of DODO are highly comparable with other known ortholog databases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.

ABSTRACT

Background: Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired.

Results: An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases.

Conclusions: DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from http://140.109.42.19:16080/dodo_web/home.htm.

Show MeSH
Species closeness and gene length of the ortholog groups identified with DODO. There are two set of ortholog groups identified with DODO, when compare to HomoloGene database. One set of them (n = 8507) has same classification as HomoloGene and the other set of them (n = 9695) has different classification from HomoloGene. (A) The closeness of each ortholog group in these two sets was calculated according to the similarity of taxonomy as described in NCBI. The set of same classification was significantly higher than the different set (wilcoxon test, p-value < 2.2e-16). This result shows that part of ortholog groups identified with DODO contains putative orthodox from distantly related species. (B) The average gene length was calculated for each ortholog group in either the same classification or different classification set. The set of same classification had significantly longer average gene length than different classification set (wilcoxon test, p-value = 8.93e- 10). This implied that DODO did find some ortholog groups composed of shorter sequences. Those shorter sequences may contain insufficient information; therefore, their orthologous relationship could not be found by conventional RBH ortholog detection method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2957689&req=5

Figure 1: Species closeness and gene length of the ortholog groups identified with DODO. There are two set of ortholog groups identified with DODO, when compare to HomoloGene database. One set of them (n = 8507) has same classification as HomoloGene and the other set of them (n = 9695) has different classification from HomoloGene. (A) The closeness of each ortholog group in these two sets was calculated according to the similarity of taxonomy as described in NCBI. The set of same classification was significantly higher than the different set (wilcoxon test, p-value < 2.2e-16). This result shows that part of ortholog groups identified with DODO contains putative orthodox from distantly related species. (B) The average gene length was calculated for each ortholog group in either the same classification or different classification set. The set of same classification had significantly longer average gene length than different classification set (wilcoxon test, p-value = 8.93e- 10). This implied that DODO did find some ortholog groups composed of shorter sequences. Those shorter sequences may contain insufficient information; therefore, their orthologous relationship could not be found by conventional RBH ortholog detection method.

Mentions: Since previous domain rearrangement study showed that most domain fusion events happened once in the protein evolution history [19], orthologs sharing the same domain architecture identified with DODO but not in HomoloGene database may be putative orthologs. We speculated the reason of why these putative orthologs cannot be detected solely by primary sequences is possibly due to short sequence length or low sequence similarities which may be rescued by considering domain information. Further statistical analysis indicated that those ortholog groups were composed of significantly shorter sequences and distantly related species as shown in Figure 1. Those orthologs may be rescued when considering their domain information. This fits in with DODO's assumption that domain should be more conservative than primary sequences, and taking those into consideration may increase the sensitivity in ortholog detection.


DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection.

Chen TW, Wu TH, Ng WV, Lin WC - BMC Bioinformatics (2010)

Species closeness and gene length of the ortholog groups identified with DODO. There are two set of ortholog groups identified with DODO, when compare to HomoloGene database. One set of them (n = 8507) has same classification as HomoloGene and the other set of them (n = 9695) has different classification from HomoloGene. (A) The closeness of each ortholog group in these two sets was calculated according to the similarity of taxonomy as described in NCBI. The set of same classification was significantly higher than the different set (wilcoxon test, p-value < 2.2e-16). This result shows that part of ortholog groups identified with DODO contains putative orthodox from distantly related species. (B) The average gene length was calculated for each ortholog group in either the same classification or different classification set. The set of same classification had significantly longer average gene length than different classification set (wilcoxon test, p-value = 8.93e- 10). This implied that DODO did find some ortholog groups composed of shorter sequences. Those shorter sequences may contain insufficient information; therefore, their orthologous relationship could not be found by conventional RBH ortholog detection method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2957689&req=5

Figure 1: Species closeness and gene length of the ortholog groups identified with DODO. There are two set of ortholog groups identified with DODO, when compare to HomoloGene database. One set of them (n = 8507) has same classification as HomoloGene and the other set of them (n = 9695) has different classification from HomoloGene. (A) The closeness of each ortholog group in these two sets was calculated according to the similarity of taxonomy as described in NCBI. The set of same classification was significantly higher than the different set (wilcoxon test, p-value < 2.2e-16). This result shows that part of ortholog groups identified with DODO contains putative orthodox from distantly related species. (B) The average gene length was calculated for each ortholog group in either the same classification or different classification set. The set of same classification had significantly longer average gene length than different classification set (wilcoxon test, p-value = 8.93e- 10). This implied that DODO did find some ortholog groups composed of shorter sequences. Those shorter sequences may contain insufficient information; therefore, their orthologous relationship could not be found by conventional RBH ortholog detection method.
Mentions: Since previous domain rearrangement study showed that most domain fusion events happened once in the protein evolution history [19], orthologs sharing the same domain architecture identified with DODO but not in HomoloGene database may be putative orthologs. We speculated the reason of why these putative orthologs cannot be detected solely by primary sequences is possibly due to short sequence length or low sequence similarities which may be rescued by considering domain information. Further statistical analysis indicated that those ortholog groups were composed of significantly shorter sequences and distantly related species as shown in Figure 1. Those orthologs may be rescued when considering their domain information. This fits in with DODO's assumption that domain should be more conservative than primary sequences, and taking those into consideration may increase the sensitivity in ortholog detection.

Bottom Line: Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity.Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes.The output results of DODO are highly comparable with other known ortholog databases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.

ABSTRACT

Background: Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired.

Results: An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases.

Conclusions: DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from http://140.109.42.19:16080/dodo_web/home.htm.

Show MeSH