Limits...
An approach of orthology detection from homologous sequences under minimum evolution.

Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H - Nucleic Acids Res. (2008)

Bottom Line: For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision.Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree.Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology, Laboratory of Bioinformatics and Population Genetics, Seoul National University, Seoul 151-742, Korea.

ABSTRACT
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

Show MeSH

Related in: MedlinePlus

Conceptual representation of the detection of orthologs in a given alignment. (a) In an input alignment, a sequence name consists of a sequence identifier and the species information. The letters before an underbar denote a sequence identifier, while those after the underbar indicate an abbreviation of the scientific name of a species. (b) An upper box includes sequences with one occurrence per species, while the lower box includes paralogous sequences with more than one occurrence per species. (c) All possible combinations of sequences in which one species is represented only once by one of its sequences in a combination. (d) Datasets in which (b) dataset was merged into one of (c) datasets. (e) Collection of phylogenetic trees reconstructed from the merged datasets. (f) Calculation of minimum evolution scores for obtained phylogenetic trees. (g) Selection of the smallest minimum evolution score. (h) Determination of orthologous sequences.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553584&req=5

Figure 2: Conceptual representation of the detection of orthologs in a given alignment. (a) In an input alignment, a sequence name consists of a sequence identifier and the species information. The letters before an underbar denote a sequence identifier, while those after the underbar indicate an abbreviation of the scientific name of a species. (b) An upper box includes sequences with one occurrence per species, while the lower box includes paralogous sequences with more than one occurrence per species. (c) All possible combinations of sequences in which one species is represented only once by one of its sequences in a combination. (d) Datasets in which (b) dataset was merged into one of (c) datasets. (e) Collection of phylogenetic trees reconstructed from the merged datasets. (f) Calculation of minimum evolution scores for obtained phylogenetic trees. (g) Selection of the smallest minimum evolution score. (h) Determination of orthologous sequences.

Mentions: To implement the algorithm, we developed a novel program called Mestortho (minimum evolution score to orthology). For a given multiple sequence alignment, the program automatically considers more than one sequence per species as having a paralogous relationship. For each of all datasets which are generated by all possible combinations of candidate orthologs, the program generates an MES by reconstructing an NJ tree and then calculating SBL. Finally, the sequence set with the smallest MES is determined as a reliable orthologous cluster. The program requires a multiple sequence alignment in which the name of each sequence should consist of the sequence identifier and species information (Figure 2a). In general, paralogous relationships of homologous sequences occur when there are more than one orthologous cluster (Figure 1b). Thus, the program requires a user-defined reference sequence to determine which orthologous cluster should be detected (shown in bold in Figure 2a). Given an alignment, the sequences are classified into two groups (Figure 2b): group 1 consists of the sequences with one occurrence per species, and group 2 is composed of the sequences with more than one occurrence per species. For group 2, exhaustive combinatorial sets with one sequence per species are created (Figure 2c). If the reference sequence is included in group 2, only the datasets with a reference sequence are selected for further analyses. In addition, sequences separated by a genetic distance of zero are regarded as one sequence to reduce the number of combinations. Then, the group 1 sequences are merged with each dataset obtained from group 2 (Figure 2d). For each merged dataset, an NJ tree is reconstructed and its MES is calculated (Figure 2e–g). Finally, the merged dataset with the smallest MES is chosen as a set of orthologs (Figure 2h).Figure 2.


An approach of orthology detection from homologous sequences under minimum evolution.

Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H - Nucleic Acids Res. (2008)

Conceptual representation of the detection of orthologs in a given alignment. (a) In an input alignment, a sequence name consists of a sequence identifier and the species information. The letters before an underbar denote a sequence identifier, while those after the underbar indicate an abbreviation of the scientific name of a species. (b) An upper box includes sequences with one occurrence per species, while the lower box includes paralogous sequences with more than one occurrence per species. (c) All possible combinations of sequences in which one species is represented only once by one of its sequences in a combination. (d) Datasets in which (b) dataset was merged into one of (c) datasets. (e) Collection of phylogenetic trees reconstructed from the merged datasets. (f) Calculation of minimum evolution scores for obtained phylogenetic trees. (g) Selection of the smallest minimum evolution score. (h) Determination of orthologous sequences.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553584&req=5

Figure 2: Conceptual representation of the detection of orthologs in a given alignment. (a) In an input alignment, a sequence name consists of a sequence identifier and the species information. The letters before an underbar denote a sequence identifier, while those after the underbar indicate an abbreviation of the scientific name of a species. (b) An upper box includes sequences with one occurrence per species, while the lower box includes paralogous sequences with more than one occurrence per species. (c) All possible combinations of sequences in which one species is represented only once by one of its sequences in a combination. (d) Datasets in which (b) dataset was merged into one of (c) datasets. (e) Collection of phylogenetic trees reconstructed from the merged datasets. (f) Calculation of minimum evolution scores for obtained phylogenetic trees. (g) Selection of the smallest minimum evolution score. (h) Determination of orthologous sequences.
Mentions: To implement the algorithm, we developed a novel program called Mestortho (minimum evolution score to orthology). For a given multiple sequence alignment, the program automatically considers more than one sequence per species as having a paralogous relationship. For each of all datasets which are generated by all possible combinations of candidate orthologs, the program generates an MES by reconstructing an NJ tree and then calculating SBL. Finally, the sequence set with the smallest MES is determined as a reliable orthologous cluster. The program requires a multiple sequence alignment in which the name of each sequence should consist of the sequence identifier and species information (Figure 2a). In general, paralogous relationships of homologous sequences occur when there are more than one orthologous cluster (Figure 1b). Thus, the program requires a user-defined reference sequence to determine which orthologous cluster should be detected (shown in bold in Figure 2a). Given an alignment, the sequences are classified into two groups (Figure 2b): group 1 consists of the sequences with one occurrence per species, and group 2 is composed of the sequences with more than one occurrence per species. For group 2, exhaustive combinatorial sets with one sequence per species are created (Figure 2c). If the reference sequence is included in group 2, only the datasets with a reference sequence are selected for further analyses. In addition, sequences separated by a genetic distance of zero are regarded as one sequence to reduce the number of combinations. Then, the group 1 sequences are merged with each dataset obtained from group 2 (Figure 2d). For each merged dataset, an NJ tree is reconstructed and its MES is calculated (Figure 2e–g). Finally, the merged dataset with the smallest MES is chosen as a set of orthologs (Figure 2h).Figure 2.

Bottom Line: For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision.Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree.Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

View Article: PubMed Central - PubMed

Affiliation: Department of Agricultural Biotechnology, Laboratory of Bioinformatics and Population Genetics, Seoul National University, Seoul 151-742, Korea.

ABSTRACT
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.

Show MeSH
Related in: MedlinePlus