Limits...
An approach of orthology detection from homologous sequences under minimum evolution.

Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H - Nucleic Acids Res. (2008)

Bottom Line: For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision.Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree.Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology, Laboratory of Bioinformatics and Population Genetics, Seoul National University, Seoul 151-742, Korea.

ABSTRACT
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

Show MeSH
The confidence interval of True datasets at a one-tailed 95% significance level. Among 116 datasets (KOG+EXProt), 33 alignments were used to calculate the confidence interval. For each of 33 datasets, the black dot indicates the MES of False sequences, while the shaded rectangle indicates the MES range of the one-tailed 95% confidence interval of True sequences. A number in parentheses is the number of False sequences in each of the datasets.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553584&req=5

Figure 5: The confidence interval of True datasets at a one-tailed 95% significance level. Among 116 datasets (KOG+EXProt), 33 alignments were used to calculate the confidence interval. For each of 33 datasets, the black dot indicates the MES of False sequences, while the shaded rectangle indicates the MES range of the one-tailed 95% confidence interval of True sequences. A number in parentheses is the number of False sequences in each of the datasets.

Mentions: There were always more True sequences than False sequences in the 116 datasets analyzed. Since phylogenetic reconstruction requires more than two sequences, we examined the number of False sequences for each dataset. As a result, 33 out of the 116 datasets had more than two False sequences. For each of these 33 datasets, the confidence intervals of MES of True sequences were determined with 100 random subsets under a one-tailed 95% significance level, and were plotted together with the MES of false sequences (Figure 5). In every case, the MES of False sequences was significantly higher than those of True sequences.Figure 5.


An approach of orthology detection from homologous sequences under minimum evolution.

Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H - Nucleic Acids Res. (2008)

The confidence interval of True datasets at a one-tailed 95% significance level. Among 116 datasets (KOG+EXProt), 33 alignments were used to calculate the confidence interval. For each of 33 datasets, the black dot indicates the MES of False sequences, while the shaded rectangle indicates the MES range of the one-tailed 95% confidence interval of True sequences. A number in parentheses is the number of False sequences in each of the datasets.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553584&req=5

Figure 5: The confidence interval of True datasets at a one-tailed 95% significance level. Among 116 datasets (KOG+EXProt), 33 alignments were used to calculate the confidence interval. For each of 33 datasets, the black dot indicates the MES of False sequences, while the shaded rectangle indicates the MES range of the one-tailed 95% confidence interval of True sequences. A number in parentheses is the number of False sequences in each of the datasets.
Mentions: There were always more True sequences than False sequences in the 116 datasets analyzed. Since phylogenetic reconstruction requires more than two sequences, we examined the number of False sequences for each dataset. As a result, 33 out of the 116 datasets had more than two False sequences. For each of these 33 datasets, the confidence intervals of MES of True sequences were determined with 100 random subsets under a one-tailed 95% significance level, and were plotted together with the MES of false sequences (Figure 5). In every case, the MES of False sequences was significantly higher than those of True sequences.Figure 5.

Bottom Line: For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision.Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree.Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology, Laboratory of Bioinformatics and Population Genetics, Seoul National University, Seoul 151-742, Korea.

ABSTRACT
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

Show MeSH