Limits...
An approach of orthology detection from homologous sequences under minimum evolution.

Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H - Nucleic Acids Res. (2008)

Bottom Line: For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision.Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree.Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology, Laboratory of Bioinformatics and Population Genetics, Seoul National University, Seoul 151-742, Korea.

ABSTRACT
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

Show MeSH
Two examples of errors when running Mestortho. (a and b) In the phylogenetic trees, A, B and C indicate species, and α and β denote two descendants after gene duplication. In each of the trees, red branches indicate the lineages detected as orthologous sequences by Mestortho. (c) The phylogenetic tree of (b) is obtained from an EXProt dataset (EC 1.15.1.1) corresponding to the model (b). The monophyletic sequence group is marked by a triangle. The branches of genes detected as orthologs by Mestortho are indicated in bold. Among taxa of the phylogenetic tree, True sequences are also marked in bold and red. The asterisk symbol indicates the reference sequence of the EXProt dataset. The dotted line shows the paralogous relationships between sequences.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553584&req=5

Figure 6: Two examples of errors when running Mestortho. (a and b) In the phylogenetic trees, A, B and C indicate species, and α and β denote two descendants after gene duplication. In each of the trees, red branches indicate the lineages detected as orthologous sequences by Mestortho. (c) The phylogenetic tree of (b) is obtained from an EXProt dataset (EC 1.15.1.1) corresponding to the model (b). The monophyletic sequence group is marked by a triangle. The branches of genes detected as orthologs by Mestortho are indicated in bold. Among taxa of the phylogenetic tree, True sequences are also marked in bold and red. The asterisk symbol indicates the reference sequence of the EXProt dataset. The dotted line shows the paralogous relationships between sequences.

Mentions: Several methods are available for detecting orthologs among homologous sequences. Unfortunately, all of them, including Mestortho, produce different false positive and negative rates depending on the algorithm used (9). We assume that orthologs of a reference sequence have different functional constraints than other orthologous groups in a given homologous sequence dataset. If an orthologous group with the reference sequence has evolutionary constraints stronger than or similar to other orthologous groups in a given dataset, Mestortho will probably detect the sequence members of the orthologous group more accurately because the evolutionary cost of the ancestral divergence between the two groups (α5+β5 in Figure 1b) is sufficient to yield a difference in their MESs. In our simulation, the MES confidence interval of 33 orthologous groups showed that True orthologous groups including reference sequences evolved slower than False groups (Figure 5), indicating that our assumption for orthology detection is reliable. However, there are some exceptional cases that lead to incorrect orthology detection. First, if an orthologous group of a reference sequence includes a pseudogene-like sequence (αB; Figure 6a) that evolved faster than other sequences in a given dataset, Mestortho would detect the paralog of the pseudogene-like sequence as an ortholog (βB; Figure 6a). Similarly, if a gene was missed in a species (αB; Figure 6a), the false ortholog would be detected as an ortholog (βB; Figure 6a). Second, if an ortholog of the reference (αB; Figure 6b) is more closely clustered to the paralogs of the reference than the other true orthologs (αA; Figure 6b), our approach would detect a false positive as an orthologous sequence (Figure 6b). For example, in the EXProt dataset of superoxide dismutase (EC 1.15.1.1), the paralogous gene of P. aeruginosa, instead of the true ortholog of the species, was identified as being orthologous to the reference sequence, due to its closer clustering with the reference sequence of E. coli (Figure 6c).Figure 6.


An approach of orthology detection from homologous sequences under minimum evolution.

Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H - Nucleic Acids Res. (2008)

Two examples of errors when running Mestortho. (a and b) In the phylogenetic trees, A, B and C indicate species, and α and β denote two descendants after gene duplication. In each of the trees, red branches indicate the lineages detected as orthologous sequences by Mestortho. (c) The phylogenetic tree of (b) is obtained from an EXProt dataset (EC 1.15.1.1) corresponding to the model (b). The monophyletic sequence group is marked by a triangle. The branches of genes detected as orthologs by Mestortho are indicated in bold. Among taxa of the phylogenetic tree, True sequences are also marked in bold and red. The asterisk symbol indicates the reference sequence of the EXProt dataset. The dotted line shows the paralogous relationships between sequences.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553584&req=5

Figure 6: Two examples of errors when running Mestortho. (a and b) In the phylogenetic trees, A, B and C indicate species, and α and β denote two descendants after gene duplication. In each of the trees, red branches indicate the lineages detected as orthologous sequences by Mestortho. (c) The phylogenetic tree of (b) is obtained from an EXProt dataset (EC 1.15.1.1) corresponding to the model (b). The monophyletic sequence group is marked by a triangle. The branches of genes detected as orthologs by Mestortho are indicated in bold. Among taxa of the phylogenetic tree, True sequences are also marked in bold and red. The asterisk symbol indicates the reference sequence of the EXProt dataset. The dotted line shows the paralogous relationships between sequences.
Mentions: Several methods are available for detecting orthologs among homologous sequences. Unfortunately, all of them, including Mestortho, produce different false positive and negative rates depending on the algorithm used (9). We assume that orthologs of a reference sequence have different functional constraints than other orthologous groups in a given homologous sequence dataset. If an orthologous group with the reference sequence has evolutionary constraints stronger than or similar to other orthologous groups in a given dataset, Mestortho will probably detect the sequence members of the orthologous group more accurately because the evolutionary cost of the ancestral divergence between the two groups (α5+β5 in Figure 1b) is sufficient to yield a difference in their MESs. In our simulation, the MES confidence interval of 33 orthologous groups showed that True orthologous groups including reference sequences evolved slower than False groups (Figure 5), indicating that our assumption for orthology detection is reliable. However, there are some exceptional cases that lead to incorrect orthology detection. First, if an orthologous group of a reference sequence includes a pseudogene-like sequence (αB; Figure 6a) that evolved faster than other sequences in a given dataset, Mestortho would detect the paralog of the pseudogene-like sequence as an ortholog (βB; Figure 6a). Similarly, if a gene was missed in a species (αB; Figure 6a), the false ortholog would be detected as an ortholog (βB; Figure 6a). Second, if an ortholog of the reference (αB; Figure 6b) is more closely clustered to the paralogs of the reference than the other true orthologs (αA; Figure 6b), our approach would detect a false positive as an orthologous sequence (Figure 6b). For example, in the EXProt dataset of superoxide dismutase (EC 1.15.1.1), the paralogous gene of P. aeruginosa, instead of the true ortholog of the species, was identified as being orthologous to the reference sequence, due to its closer clustering with the reference sequence of E. coli (Figure 6c).Figure 6.

Bottom Line: For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision.Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree.Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology, Laboratory of Bioinformatics and Population Genetics, Seoul National University, Seoul 151-742, Korea.

ABSTRACT
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

Show MeSH