Limits...
The human phylome.

Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldón T - Genome Biol. (2007)

Bottom Line: Topological variations among phylogenies for different genes are to be expected, highlighting the danger of gene-sampling effects in phylogenomic analyses.Several links can be established between the functions of gene families duplicated at certain phylogenetic splits and major evolutionary transitions in those lineages.The pipeline implemented here can be easily adapted for use in other organisms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Department, Centro de Investigación Príncipe Felipe, Autopista del Saler, 46013 Valencia, Spain.

ABSTRACT

Background: Phylogenomics analyses serve to establish evolutionary relationships among organisms and their genes. A phylome, the complete collection of all gene phylogenies in a genome, constitutes a valuable source of information, but its use in large genomes still constitutes a technical challenge. The use of phylomes also requires the development of new methods that help us to interpret them.

Results: We reconstruct here the human phylome, which includes the evolutionary relationships of all human proteins and their homologs among 39 fully sequenced eukaryotes. Phylogenetic techniques used include alignment trimming, branch length optimization, evolutionary model testing and maximum likelihood and Bayesian methods. Although differences with alternative topologies are minor, most of the trees support the Coelomata and Unikont hypotheses as well as the grouping of primates with laurasatheria to the exclusion of rodents. We assess the extent of gene duplication events and their relationship with the functional roles of the protein families involved. We find support for at least one, and probably two, rounds of whole genome duplications before vertebrate radiation. Using a novel algorithm that is independent from a species phylogeny, we derive orthology and paralogy relationships of human proteins among eukaryotic genomes.

Conclusion: Topological variations among phylogenies for different genes are to be expected, highlighting the danger of gene-sampling effects in phylogenomic analyses. Several links can be established between the functions of gene families duplicated at certain phylogenetic splits and major evolutionary transitions in those lineages. The pipeline implemented here can be easily adapted for use in other organisms.

Show MeSH

Related in: MedlinePlus

Benchmarking comparison of different orthology inference algorithms. The reference set used in the benchmark of Hulsen et al. [82] is taken as a gold standard to compute the number of true positives (TP), false positives (FP) and false negatives (FN) yielded by each method. For each method the sensitivity (S = TP/(TP+FN)) and the positive predictive value (P = TP/(TP + FP)) are computed. Methods described in [82] are indicated as BBH (Best reciprocal hits), MCL (OrthoMCL), ZIH (Z-score 1-hundred.), INP (Inparanoid), PGT (phylogeny-based algorithm used in [95]), KOG (Clusters of eukaryotic orthologous goups). 'Phylome' represents the results of our pipeline and algorithm, and Ensbl the orthology relationships predicted by Ensembl database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2394744&req=5

Figure 4: Benchmarking comparison of different orthology inference algorithms. The reference set used in the benchmark of Hulsen et al. [82] is taken as a gold standard to compute the number of true positives (TP), false positives (FP) and false negatives (FN) yielded by each method. For each method the sensitivity (S = TP/(TP+FN)) and the positive predictive value (P = TP/(TP + FP)) are computed. Methods described in [82] are indicated as BBH (Best reciprocal hits), MCL (OrthoMCL), ZIH (Z-score 1-hundred.), INP (Inparanoid), PGT (phylogeny-based algorithm used in [95]), KOG (Clusters of eukaryotic orthologous goups). 'Phylome' represents the results of our pipeline and algorithm, and Ensbl the orthology relationships predicted by Ensembl database.

Mentions: We compared our predictions with those from other algorithms by using a recent reference dataset comprising 67 human-mouse and 45 human-worm orthologous pairs from five multi-gene families [82]. Considering the size of the families and the intricate evolutionary histories involved, this reference set should be considered a highly stringent test. For each of the methods compared we computed the sensitivity, which is a measure of the coverage over the reference set, and the positive predictive value, which is the proportion of correct orthology predictions, that is, the number of true positives over the sum of true positives and false negatives. The results of the benchmark showed narrow differences in terms of sensitivity (Figure 4). All methods are able to predict only about half (40% to 66%) of the orthologous pairs in the reference set. Our method scores second best, with 61.6% sensitivity compared to 66.1% for the clusters of eukaryotic orthologous genes (KOG) method; Ensembl reaches a coverage of 55.57%. As we noted before, this low coverage reflects the inherent difficulty of the reference set, in which manual orthology assignments have taken into account domain organization analysis and other sources of information.


The human phylome.

Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldón T - Genome Biol. (2007)

Benchmarking comparison of different orthology inference algorithms. The reference set used in the benchmark of Hulsen et al. [82] is taken as a gold standard to compute the number of true positives (TP), false positives (FP) and false negatives (FN) yielded by each method. For each method the sensitivity (S = TP/(TP+FN)) and the positive predictive value (P = TP/(TP + FP)) are computed. Methods described in [82] are indicated as BBH (Best reciprocal hits), MCL (OrthoMCL), ZIH (Z-score 1-hundred.), INP (Inparanoid), PGT (phylogeny-based algorithm used in [95]), KOG (Clusters of eukaryotic orthologous goups). 'Phylome' represents the results of our pipeline and algorithm, and Ensbl the orthology relationships predicted by Ensembl database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2394744&req=5

Figure 4: Benchmarking comparison of different orthology inference algorithms. The reference set used in the benchmark of Hulsen et al. [82] is taken as a gold standard to compute the number of true positives (TP), false positives (FP) and false negatives (FN) yielded by each method. For each method the sensitivity (S = TP/(TP+FN)) and the positive predictive value (P = TP/(TP + FP)) are computed. Methods described in [82] are indicated as BBH (Best reciprocal hits), MCL (OrthoMCL), ZIH (Z-score 1-hundred.), INP (Inparanoid), PGT (phylogeny-based algorithm used in [95]), KOG (Clusters of eukaryotic orthologous goups). 'Phylome' represents the results of our pipeline and algorithm, and Ensbl the orthology relationships predicted by Ensembl database.
Mentions: We compared our predictions with those from other algorithms by using a recent reference dataset comprising 67 human-mouse and 45 human-worm orthologous pairs from five multi-gene families [82]. Considering the size of the families and the intricate evolutionary histories involved, this reference set should be considered a highly stringent test. For each of the methods compared we computed the sensitivity, which is a measure of the coverage over the reference set, and the positive predictive value, which is the proportion of correct orthology predictions, that is, the number of true positives over the sum of true positives and false negatives. The results of the benchmark showed narrow differences in terms of sensitivity (Figure 4). All methods are able to predict only about half (40% to 66%) of the orthologous pairs in the reference set. Our method scores second best, with 61.6% sensitivity compared to 66.1% for the clusters of eukaryotic orthologous genes (KOG) method; Ensembl reaches a coverage of 55.57%. As we noted before, this low coverage reflects the inherent difficulty of the reference set, in which manual orthology assignments have taken into account domain organization analysis and other sources of information.

Bottom Line: Topological variations among phylogenies for different genes are to be expected, highlighting the danger of gene-sampling effects in phylogenomic analyses.Several links can be established between the functions of gene families duplicated at certain phylogenetic splits and major evolutionary transitions in those lineages.The pipeline implemented here can be easily adapted for use in other organisms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Department, Centro de Investigación Príncipe Felipe, Autopista del Saler, 46013 Valencia, Spain.

ABSTRACT

Background: Phylogenomics analyses serve to establish evolutionary relationships among organisms and their genes. A phylome, the complete collection of all gene phylogenies in a genome, constitutes a valuable source of information, but its use in large genomes still constitutes a technical challenge. The use of phylomes also requires the development of new methods that help us to interpret them.

Results: We reconstruct here the human phylome, which includes the evolutionary relationships of all human proteins and their homologs among 39 fully sequenced eukaryotes. Phylogenetic techniques used include alignment trimming, branch length optimization, evolutionary model testing and maximum likelihood and Bayesian methods. Although differences with alternative topologies are minor, most of the trees support the Coelomata and Unikont hypotheses as well as the grouping of primates with laurasatheria to the exclusion of rodents. We assess the extent of gene duplication events and their relationship with the functional roles of the protein families involved. We find support for at least one, and probably two, rounds of whole genome duplications before vertebrate radiation. Using a novel algorithm that is independent from a species phylogeny, we derive orthology and paralogy relationships of human proteins among eukaryotic genomes.

Conclusion: Topological variations among phylogenies for different genes are to be expected, highlighting the danger of gene-sampling effects in phylogenomic analyses. Several links can be established between the functions of gene families duplicated at certain phylogenetic splits and major evolutionary transitions in those lineages. The pipeline implemented here can be easily adapted for use in other organisms.

Show MeSH
Related in: MedlinePlus