Limits...
An approach of orthology detection from homologous sequences under minimum evolution.

Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H - Nucleic Acids Res. (2008)

Bottom Line: For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision.Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree.Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology, Laboratory of Bioinformatics and Population Genetics, Seoul National University, Seoul 151-742, Korea.

ABSTRACT
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

Show MeSH
Real running time of Mestortho according to the number of sequences for each dataset. To plot the graph, 21 EXProt and 90 KOG alignments were used. The remaining five KOG alignments with sequence sizes of 67, 76, 49, 60 and 71 had running times of 26301, 18200, 274, 539 and 4540 s, respectively. The equation of the fit curve is y = 0.002x2.365 (R2 = 0.798).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553584&req=5

Figure 3: Real running time of Mestortho according to the number of sequences for each dataset. To plot the graph, 21 EXProt and 90 KOG alignments were used. The remaining five KOG alignments with sequence sizes of 67, 76, 49, 60 and 71 had running times of 26301, 18200, 274, 539 and 4540 s, respectively. The equation of the fit curve is y = 0.002x2.365 (R2 = 0.798).

Mentions: We assessed the running time of Mestortho in the analysis of the 116 total datasets examined in this study. The running time of each dataset was plotted in Figure 3. The nonlinear curve of y = 0.002x2.365, where y is running time and x is the number of sequences in a given dataset, fitted well to the running times of 116 datasets with the smallest R2-value of 0.798. Under the big O concept, most datasets had a computational complexity of the polynomial big O (N > 1 and S > 2), although 13 and 4 datasets showed constant (N = 1) and quadratic (N > 1 and S = 2) time complexity, respectively. On a computer with a 2.66 GHz processor, each dataset with less than 20 000 possible combinations of putative orthologs was completely analyzed by Mestortho within 60 s. Five KOG datasets with more than 20 000 possible combinations (e.g. 279 936 – 107) took 274, 539, 4540, 18 200 and 26 301 s to complete the Mestortho processes. As a guide for the approximate running time of Mestortho for any input dataset, we developed the module ‘time estimator’, which calculates the number of all possible combinations of putative orthologs and subsequently estimates the approximate running time (http://snugenome.snu.ac.kr/Mestortho/).Figure 3.


An approach of orthology detection from homologous sequences under minimum evolution.

Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H - Nucleic Acids Res. (2008)

Real running time of Mestortho according to the number of sequences for each dataset. To plot the graph, 21 EXProt and 90 KOG alignments were used. The remaining five KOG alignments with sequence sizes of 67, 76, 49, 60 and 71 had running times of 26301, 18200, 274, 539 and 4540 s, respectively. The equation of the fit curve is y = 0.002x2.365 (R2 = 0.798).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553584&req=5

Figure 3: Real running time of Mestortho according to the number of sequences for each dataset. To plot the graph, 21 EXProt and 90 KOG alignments were used. The remaining five KOG alignments with sequence sizes of 67, 76, 49, 60 and 71 had running times of 26301, 18200, 274, 539 and 4540 s, respectively. The equation of the fit curve is y = 0.002x2.365 (R2 = 0.798).
Mentions: We assessed the running time of Mestortho in the analysis of the 116 total datasets examined in this study. The running time of each dataset was plotted in Figure 3. The nonlinear curve of y = 0.002x2.365, where y is running time and x is the number of sequences in a given dataset, fitted well to the running times of 116 datasets with the smallest R2-value of 0.798. Under the big O concept, most datasets had a computational complexity of the polynomial big O (N > 1 and S > 2), although 13 and 4 datasets showed constant (N = 1) and quadratic (N > 1 and S = 2) time complexity, respectively. On a computer with a 2.66 GHz processor, each dataset with less than 20 000 possible combinations of putative orthologs was completely analyzed by Mestortho within 60 s. Five KOG datasets with more than 20 000 possible combinations (e.g. 279 936 – 107) took 274, 539, 4540, 18 200 and 26 301 s to complete the Mestortho processes. As a guide for the approximate running time of Mestortho for any input dataset, we developed the module ‘time estimator’, which calculates the number of all possible combinations of putative orthologs and subsequently estimates the approximate running time (http://snugenome.snu.ac.kr/Mestortho/).Figure 3.

Bottom Line: For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision.Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree.Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology, Laboratory of Bioinformatics and Population Genetics, Seoul National University, Seoul 151-742, Korea.

ABSTRACT
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

Show MeSH