Limits...
SynBlast: assisting the analysis of conserved synteny information.

Lehmann J, Stadler PF, Prohaska SJ - BMC Bioinformatics (2008)

Bottom Line: This situation can be improved in many cases by including conserved synteny information.The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail.We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany. joe@bioinf.uni-leipzig.de

ABSTRACT

Motivation: In the last years more than 20 vertebrate genomes have been sequenced, and the rate at which genomic DNA information becomes available is rapidly accelerating. Gene duplication and gene loss events inherently limit the accuracy of orthology detection based on sequence similarity alone. Fully automated methods for orthology annotation do exist but often fail to identify individual members in cases of large gene families, or to distinguish missing data from traceable gene losses. This situation can be improved in many cases by including conserved synteny information.

Results: Here we present the SynBlast pipeline that is designed to construct and evaluate local synteny information. SynBlast uses the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and ranks them in accord with the available evidence for homology. The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail. We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples.

Software: The SynBlast package written in Perl is available under the GNU General Public License at http://www.bioinf.uni-leipzig.de/Software/SynBlast/.

Show MeSH
ParaHox example application. SynBlast was used to determine the four pairs of paralogous regions generated by the fish-specific genome duplication from the four gnathostome ParaHox regions. We show alignment dot-plots for the high-ranking hits (according to the gene order alignment score and logRatioSum score (in brackets)) of the four query regions against the zebrafish genome (Zv7, Ensembl release 46, Aug 2007). Parameters for the synteny filtering step were N = 1, L = 2. See text for more details.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2543028&req=5

Figure 6: ParaHox example application. SynBlast was used to determine the four pairs of paralogous regions generated by the fish-specific genome duplication from the four gnathostome ParaHox regions. We show alignment dot-plots for the high-ranking hits (according to the gene order alignment score and logRatioSum score (in brackets)) of the four query regions against the zebrafish genome (Zv7, Ensembl release 46, Aug 2007). Parameters for the synteny filtering step were N = 1, L = 2. See text for more details.

Mentions: One copy of ParaHoxA retained 13 of 24 genes flanking the Cdx2 locus even though Cdx2 itself was obviously lost. This is a case where gene loss can reliably be distinguished from missing data based on well-conserved synteny information (see Figure 6). The second copy retained only 5 of the 24 flanking genes. In line with the analysis of [41,42], we observe that the Cdx2 gene has been lost from both copies. We also observe that one of the two ParaHoxA contains the only copy of Gsh1 which is located at DrA2 (Chr.5), while the only copy of Pdx1 is located at DrA1 (Chr.24). Note that this information independently confirms the assignment of the two zebrafish ParaHoxA paralogs to the ancestral A cluster. SynBlast reports additional syntenic regions in the zebrafish genome that contain homologs of some of the genes of the HsA query. These are located on chromosomes 7, 14, 20, and 21, and can be assumed to be orthologs of the ParaHox B, C, and D clusters. In order to confirm this assumption, we also consider the remaining three human ParaHox regions as queries.


SynBlast: assisting the analysis of conserved synteny information.

Lehmann J, Stadler PF, Prohaska SJ - BMC Bioinformatics (2008)

ParaHox example application. SynBlast was used to determine the four pairs of paralogous regions generated by the fish-specific genome duplication from the four gnathostome ParaHox regions. We show alignment dot-plots for the high-ranking hits (according to the gene order alignment score and logRatioSum score (in brackets)) of the four query regions against the zebrafish genome (Zv7, Ensembl release 46, Aug 2007). Parameters for the synteny filtering step were N = 1, L = 2. See text for more details.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2543028&req=5

Figure 6: ParaHox example application. SynBlast was used to determine the four pairs of paralogous regions generated by the fish-specific genome duplication from the four gnathostome ParaHox regions. We show alignment dot-plots for the high-ranking hits (according to the gene order alignment score and logRatioSum score (in brackets)) of the four query regions against the zebrafish genome (Zv7, Ensembl release 46, Aug 2007). Parameters for the synteny filtering step were N = 1, L = 2. See text for more details.
Mentions: One copy of ParaHoxA retained 13 of 24 genes flanking the Cdx2 locus even though Cdx2 itself was obviously lost. This is a case where gene loss can reliably be distinguished from missing data based on well-conserved synteny information (see Figure 6). The second copy retained only 5 of the 24 flanking genes. In line with the analysis of [41,42], we observe that the Cdx2 gene has been lost from both copies. We also observe that one of the two ParaHoxA contains the only copy of Gsh1 which is located at DrA2 (Chr.5), while the only copy of Pdx1 is located at DrA1 (Chr.24). Note that this information independently confirms the assignment of the two zebrafish ParaHoxA paralogs to the ancestral A cluster. SynBlast reports additional syntenic regions in the zebrafish genome that contain homologs of some of the genes of the HsA query. These are located on chromosomes 7, 14, 20, and 21, and can be assumed to be orthologs of the ParaHox B, C, and D clusters. In order to confirm this assumption, we also consider the remaining three human ParaHox regions as queries.

Bottom Line: This situation can be improved in many cases by including conserved synteny information.The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail.We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany. joe@bioinf.uni-leipzig.de

ABSTRACT

Motivation: In the last years more than 20 vertebrate genomes have been sequenced, and the rate at which genomic DNA information becomes available is rapidly accelerating. Gene duplication and gene loss events inherently limit the accuracy of orthology detection based on sequence similarity alone. Fully automated methods for orthology annotation do exist but often fail to identify individual members in cases of large gene families, or to distinguish missing data from traceable gene losses. This situation can be improved in many cases by including conserved synteny information.

Results: Here we present the SynBlast pipeline that is designed to construct and evaluate local synteny information. SynBlast uses the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and ranks them in accord with the available evidence for homology. The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail. We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples.

Software: The SynBlast package written in Perl is available under the GNU General Public License at http://www.bioinf.uni-leipzig.de/Software/SynBlast/.

Show MeSH