Limits...
SynBlast: assisting the analysis of conserved synteny information.

Lehmann J, Stadler PF, Prohaska SJ - BMC Bioinformatics (2008)

Bottom Line: This situation can be improved in many cases by including conserved synteny information.The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail.We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany. joe@bioinf.uni-leipzig.de

ABSTRACT

Motivation: In the last years more than 20 vertebrate genomes have been sequenced, and the rate at which genomic DNA information becomes available is rapidly accelerating. Gene duplication and gene loss events inherently limit the accuracy of orthology detection based on sequence similarity alone. Fully automated methods for orthology annotation do exist but often fail to identify individual members in cases of large gene families, or to distinguish missing data from traceable gene losses. This situation can be improved in many cases by including conserved synteny information.

Results: Here we present the SynBlast pipeline that is designed to construct and evaluate local synteny information. SynBlast uses the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and ranks them in accord with the available evidence for homology. The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail. We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples.

Software: The SynBlast package written in Perl is available under the GNU General Public License at http://www.bioinf.uni-leipzig.de/Software/SynBlast/.

Show MeSH

Related in: MedlinePlus

Overview on pipeline results for vertebrate Hox clusters. SynBlast results and manually extracted orthologous cluster positions and identities for selected vertebrate species are listed. Unless otherwise indicated, positions correspond to assigned blast hits' intervals from Hox1 to Hox13/Evx hits in gene order alignment. Cluster orientation is w.r.t. the human reference clusters, which are HOXA9_ENSG00000078399_5e5; HOXB9_ENSG00000170689_2e5; HOXC9_ENSG00000180806_3e5; HOXD9_ENSG00000128709_5e5. Unassigned loci from the reference may be due to overlaps of chained HSPs. A '*' indicates loci that are absent in agreement with the literature [45]. Data for Ensembl release 42 (Dec 2006).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2543028&req=5

Figure 4: Overview on pipeline results for vertebrate Hox clusters. SynBlast results and manually extracted orthologous cluster positions and identities for selected vertebrate species are listed. Unless otherwise indicated, positions correspond to assigned blast hits' intervals from Hox1 to Hox13/Evx hits in gene order alignment. Cluster orientation is w.r.t. the human reference clusters, which are HOXA9_ENSG00000078399_5e5; HOXB9_ENSG00000170689_2e5; HOXC9_ENSG00000180806_3e5; HOXD9_ENSG00000128709_5e5. Unassigned loci from the reference may be due to overlaps of chained HSPs. A '*' indicates loci that are absent in agreement with the literature [45]. Data for Ensembl release 42 (Dec 2006).

Mentions: We used the four human Hox clusters as reference and searched the vertebrate target species with SynBlast. We consider here a diverse set of vertebrate genomes which contains both tetrapods (with 4 paralogous Hox clusters) and teleosts (with 8 paralogons). The cluster locations, gene inventories, and SynBlast scores are listed in Figure 4. In case of genomes with complete assemblies, the correct assignment of cluster orthology and the correct assignment of Hox gene identity is straightforward by visual inspection of the SynBlast cluster alignments, see Table 1 for an example. Here, both the gene order alignment score and the logRatioSum score is suitable to assign cluster identity to the target loci in the zebrafish genome. However, the logRatioSum score clearly out-performs the gene order alignment score in case of the Danio Bb cluster. In combination, the two scores provide the best means to rank orthologous loci at the top. The zebrafish Zv7 assembly contains two inparalog copies DrCa1 and DrCa2 of the zebrafish HoxCa cluster. This is, however, certainly an assembly artifact and contradicts all of the existing literature, see e.g. [34] and the references therein. SynBlast correctly retrieves both copies with comparable scores.


SynBlast: assisting the analysis of conserved synteny information.

Lehmann J, Stadler PF, Prohaska SJ - BMC Bioinformatics (2008)

Overview on pipeline results for vertebrate Hox clusters. SynBlast results and manually extracted orthologous cluster positions and identities for selected vertebrate species are listed. Unless otherwise indicated, positions correspond to assigned blast hits' intervals from Hox1 to Hox13/Evx hits in gene order alignment. Cluster orientation is w.r.t. the human reference clusters, which are HOXA9_ENSG00000078399_5e5; HOXB9_ENSG00000170689_2e5; HOXC9_ENSG00000180806_3e5; HOXD9_ENSG00000128709_5e5. Unassigned loci from the reference may be due to overlaps of chained HSPs. A '*' indicates loci that are absent in agreement with the literature [45]. Data for Ensembl release 42 (Dec 2006).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2543028&req=5

Figure 4: Overview on pipeline results for vertebrate Hox clusters. SynBlast results and manually extracted orthologous cluster positions and identities for selected vertebrate species are listed. Unless otherwise indicated, positions correspond to assigned blast hits' intervals from Hox1 to Hox13/Evx hits in gene order alignment. Cluster orientation is w.r.t. the human reference clusters, which are HOXA9_ENSG00000078399_5e5; HOXB9_ENSG00000170689_2e5; HOXC9_ENSG00000180806_3e5; HOXD9_ENSG00000128709_5e5. Unassigned loci from the reference may be due to overlaps of chained HSPs. A '*' indicates loci that are absent in agreement with the literature [45]. Data for Ensembl release 42 (Dec 2006).
Mentions: We used the four human Hox clusters as reference and searched the vertebrate target species with SynBlast. We consider here a diverse set of vertebrate genomes which contains both tetrapods (with 4 paralogous Hox clusters) and teleosts (with 8 paralogons). The cluster locations, gene inventories, and SynBlast scores are listed in Figure 4. In case of genomes with complete assemblies, the correct assignment of cluster orthology and the correct assignment of Hox gene identity is straightforward by visual inspection of the SynBlast cluster alignments, see Table 1 for an example. Here, both the gene order alignment score and the logRatioSum score is suitable to assign cluster identity to the target loci in the zebrafish genome. However, the logRatioSum score clearly out-performs the gene order alignment score in case of the Danio Bb cluster. In combination, the two scores provide the best means to rank orthologous loci at the top. The zebrafish Zv7 assembly contains two inparalog copies DrCa1 and DrCa2 of the zebrafish HoxCa cluster. This is, however, certainly an assembly artifact and contradicts all of the existing literature, see e.g. [34] and the references therein. SynBlast correctly retrieves both copies with comparable scores.

Bottom Line: This situation can be improved in many cases by including conserved synteny information.The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail.We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany. joe@bioinf.uni-leipzig.de

ABSTRACT

Motivation: In the last years more than 20 vertebrate genomes have been sequenced, and the rate at which genomic DNA information becomes available is rapidly accelerating. Gene duplication and gene loss events inherently limit the accuracy of orthology detection based on sequence similarity alone. Fully automated methods for orthology annotation do exist but often fail to identify individual members in cases of large gene families, or to distinguish missing data from traceable gene losses. This situation can be improved in many cases by including conserved synteny information.

Results: Here we present the SynBlast pipeline that is designed to construct and evaluate local synteny information. SynBlast uses the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and ranks them in accord with the available evidence for homology. The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail. We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples.

Software: The SynBlast package written in Perl is available under the GNU General Public License at http://www.bioinf.uni-leipzig.de/Software/SynBlast/.

Show MeSH
Related in: MedlinePlus