Limits...
Evolution of C2H2-zinc finger genes revisited.

Thomas JH, Emerson RO - BMC Evol. Biol. (2009)

Bottom Line: One of the main conclusions from the paper is that there is a dramatic rate of gene duplication and gene loss, including the surprising result that 118 human ZF genes are absent in chimpanzee.This discrepancy appears to result from the fact that the SCAN domain did indeed arise before the KRAB domain but is present only in non-ZF genes until a much later date.In addition, we present evidence that provides a more parsimonious explanation for the large proportion of mammalian KRAB-ZF genes without a SCAN domain.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, 91895, USA. jht@u.washington.edu

ABSTRACT

Background: A recent study by Tadepally et al. describes the clustering of zinc finger (ZF) genes in the human genome and traces their evolutionary history among several placental mammals with complete or draft genome sequences. One of the main conclusions from the paper is that there is a dramatic rate of gene duplication and gene loss, including the surprising result that 118 human ZF genes are absent in chimpanzee. The authors also present evidence concerning the ancestral order in which the ZF-associated KRAB and SCAN domains were recruited to ZF proteins.

Results: Based on our analysis of two of the largest human ZF gene clusters, we find that nearly all of the human genes have plausible orthologs in chimpanzee. The one exception may be a result of the incomplete sequence coverage in the draft chimpanzee genome. The discrepancy in gene content analysis may result from the authors' dependence on the preliminary NCBI gene prediction set for chimpanzee, which appears to either fail to predict or to mispredict many chimpanzee ZF genes. Similar problems may affect the authors' interpretation of the more divergent dog, mouse, and rat ZF gene complements. In addition, we present evidence that the KRAB domain was recruited to ZF genes before the SCAN domain, rather than the reverse as the authors suggest. This discrepancy appears to result from the fact that the SCAN domain did indeed arise before the KRAB domain but is present only in non-ZF genes until a much later date.

Conclusion: When comparing gene content among species, especially when using draft genome assemblies, dependence on preliminary gene prediction sets can be seriously misleading. In such studies, genic sequences must be identified in a manner that is as independent as possible of prediction sets. In addition, we present evidence that provides a more parsimonious explanation for the large proportion of mammalian KRAB-ZF genes without a SCAN domain.

Show MeSH

Related in: MedlinePlus

The top panel shows a maximum-likelihood tree for all the human proteins (green) encoded in cluster 19.12 and their putative chimpanzee orthologs (blue). Black circles on branches indicate aLRT branch support of 0.95 or higher. The groups from Figure 5 of Tadepally et al. [2] correspond to the leftmost seven pairs of proteins (group III), the rightmost single pair of proteins (group I), and the rest of the tree (group II). Only the ZF exon regions were used in constructing the tree (see Methods). The genome position of each ZF exon is given in the fasta name. The lower left panel shows a UCSC browser image for 28 kb around chimpanzee ZNF610, one of the genes reported as absent from chimpanzee by Tadepally et al. [2]. The track "ZNF Related Domains" shows genomic domain matches from our added track (Additional file 2); the track "Gap Locations" shows the absence of known sequence gaps in this region; the RefSeq track shows the standard UCSC alignment of human ZNF610 to the chimpanzee genome; the Ensembl track shows an Ensembl gene prediction for the chimpanzee ZNF610 ortholog; the Human Net track shows that the entire region is syntenic to human chr 19. The lower right panel shows the chimpanzee protein aligned to its human ZNF610 ortholog.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2667407&req=5

Figure 1: The top panel shows a maximum-likelihood tree for all the human proteins (green) encoded in cluster 19.12 and their putative chimpanzee orthologs (blue). Black circles on branches indicate aLRT branch support of 0.95 or higher. The groups from Figure 5 of Tadepally et al. [2] correspond to the leftmost seven pairs of proteins (group III), the rightmost single pair of proteins (group I), and the rest of the tree (group II). Only the ZF exon regions were used in constructing the tree (see Methods). The genome position of each ZF exon is given in the fasta name. The lower left panel shows a UCSC browser image for 28 kb around chimpanzee ZNF610, one of the genes reported as absent from chimpanzee by Tadepally et al. [2]. The track "ZNF Related Domains" shows genomic domain matches from our added track (Additional file 2); the track "Gap Locations" shows the absence of known sequence gaps in this region; the RefSeq track shows the standard UCSC alignment of human ZNF610 to the chimpanzee genome; the Ensembl track shows an Ensembl gene prediction for the chimpanzee ZNF610 ortholog; the Human Net track shows that the entire region is syntenic to human chr 19. The lower right panel shows the chimpanzee protein aligned to its human ZNF610 ortholog.

Mentions: Tadepally et al. [2] report that 118 of 510 human ZF genes are missing in the chimpanzee genome assembly and that many others are pseudogenes. Given the generally high conservation between human and chimpanzee, this is an extremely surprising result [5]. We investigated this pattern for two of the large ZF gene clusters on chromosome 19, including the gene cluster analyzed in detail in their Figure Five. Using their nomenclature, these are cluster 19.6 (28 human genes, 9 reported as missing in chimpanzee and another five reported as pseudogenes) and cluster 19.12 (43 human genes, 14 reported as missing in chimpanzee and another 10 reported as pseudogenes). According to our analysis, nearly all of the human proteins in these clusters contain a KRAB domain (see Methods and Additional file 1). The KRAB containing subset of the ZF superfamily has been carefully hand-curated in the human genome [1]. Nearly all KZNF (KRAB-zinc finger) genes in humans have a similar exon structure: the entire set of ZF domains is encoded on a single long 3' exon and the KRAB domain is split among one or more short 5' exons [1]. Typically, 80–90% of the final protein is encoded by the ZF exon. We took advantage of this fact to identify putative chimpanzee genes based on a simple tblastn method, which works independently of gene predictions. Combining results from the two gene clusters, we identified probable chimpanzee orthologs for 69 of 70 human KZNF genes (the slight change in human gene count results from current database predictions of 5 new ZF genes and retirement of 6 ZF genes as probable pseudogenes, see Additional file 1). It is possible that the single missing gene is a result of incomplete sequence coverage in the current chimpanzee genome assembly. Our results for cluster 19.12 are summarized in Figure 1 and tabular results for both clusters are given in Additional file 1, including predicted domain content and genome positions of the putative chimpanzee genes. Candidate orthologs for nearly all the human genes were also found in the orangutan genome but were not analyzed in detail (data not shown). The 69 putative chimpanzee orthologs were identified based only on their ZF-encoding exon. To test whether these exons are plausibly part of full KRAB-containing ZF genes, we used tblastn to search for ORFs that encode the most conserved part of the KRAB domain. 64 of the 69 putative chimpanzee ZF exons had a potential KRAB-encoding exon within 13 kb upstream (mean 7.0 kb). Most of the chimpanzee predictions that lacked a nearby KRAB exon correspond to human genes that are also predicted to lack a KRAB domain (Additional file 1). We also derived full gene models for 11 of the chimpanzee genes from cluster 19.12, nine of which are given as missing in chimpanzee in Tadepally et al. [2]. All 11 predictions aligned with their human counterparts with only a few amino acid changes; one is shown in Figure 1. Finally, Ensembl chimpanzee gene predictions exist for 14 of the genes from cluster 19.12 given as missing or defective in Tadepally et al. [2]. Though we remain uncertain about how many of the 69 putative chimpanzee genes from these two gene clusters will prove to be functional, our results are in much better accord with the high degree of overall similarity of the human and chimpanzee genomes. At the very least, sequence corresponding to no more than one of the 70 human genes is entirely missing in chimpanzee. We conclude that it remains possible that there is a perfect one-to-one correspondence of ZF genes between human and chimpanzee. To facilitate viewing potential ZF genes in the chimpanzee genome, we conducted a whole genome profile search and compiled the results as a BED format text file that can be loaded into the UCSC genome browser (see Methods and Additional file 2). The positions of all genomic matches to common ZF-associated domains (KRAB, SCAN, ZF, SET, and BTB) appear in this track regardless of prediction status. With this track displayed in full, inspection of the chimpanzee genome in the regions corresponding to human ZF gene clusters reveals multiple potential ZF genes that are currently unpredicted and unannotated.


Evolution of C2H2-zinc finger genes revisited.

Thomas JH, Emerson RO - BMC Evol. Biol. (2009)

The top panel shows a maximum-likelihood tree for all the human proteins (green) encoded in cluster 19.12 and their putative chimpanzee orthologs (blue). Black circles on branches indicate aLRT branch support of 0.95 or higher. The groups from Figure 5 of Tadepally et al. [2] correspond to the leftmost seven pairs of proteins (group III), the rightmost single pair of proteins (group I), and the rest of the tree (group II). Only the ZF exon regions were used in constructing the tree (see Methods). The genome position of each ZF exon is given in the fasta name. The lower left panel shows a UCSC browser image for 28 kb around chimpanzee ZNF610, one of the genes reported as absent from chimpanzee by Tadepally et al. [2]. The track "ZNF Related Domains" shows genomic domain matches from our added track (Additional file 2); the track "Gap Locations" shows the absence of known sequence gaps in this region; the RefSeq track shows the standard UCSC alignment of human ZNF610 to the chimpanzee genome; the Ensembl track shows an Ensembl gene prediction for the chimpanzee ZNF610 ortholog; the Human Net track shows that the entire region is syntenic to human chr 19. The lower right panel shows the chimpanzee protein aligned to its human ZNF610 ortholog.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2667407&req=5

Figure 1: The top panel shows a maximum-likelihood tree for all the human proteins (green) encoded in cluster 19.12 and their putative chimpanzee orthologs (blue). Black circles on branches indicate aLRT branch support of 0.95 or higher. The groups from Figure 5 of Tadepally et al. [2] correspond to the leftmost seven pairs of proteins (group III), the rightmost single pair of proteins (group I), and the rest of the tree (group II). Only the ZF exon regions were used in constructing the tree (see Methods). The genome position of each ZF exon is given in the fasta name. The lower left panel shows a UCSC browser image for 28 kb around chimpanzee ZNF610, one of the genes reported as absent from chimpanzee by Tadepally et al. [2]. The track "ZNF Related Domains" shows genomic domain matches from our added track (Additional file 2); the track "Gap Locations" shows the absence of known sequence gaps in this region; the RefSeq track shows the standard UCSC alignment of human ZNF610 to the chimpanzee genome; the Ensembl track shows an Ensembl gene prediction for the chimpanzee ZNF610 ortholog; the Human Net track shows that the entire region is syntenic to human chr 19. The lower right panel shows the chimpanzee protein aligned to its human ZNF610 ortholog.
Mentions: Tadepally et al. [2] report that 118 of 510 human ZF genes are missing in the chimpanzee genome assembly and that many others are pseudogenes. Given the generally high conservation between human and chimpanzee, this is an extremely surprising result [5]. We investigated this pattern for two of the large ZF gene clusters on chromosome 19, including the gene cluster analyzed in detail in their Figure Five. Using their nomenclature, these are cluster 19.6 (28 human genes, 9 reported as missing in chimpanzee and another five reported as pseudogenes) and cluster 19.12 (43 human genes, 14 reported as missing in chimpanzee and another 10 reported as pseudogenes). According to our analysis, nearly all of the human proteins in these clusters contain a KRAB domain (see Methods and Additional file 1). The KRAB containing subset of the ZF superfamily has been carefully hand-curated in the human genome [1]. Nearly all KZNF (KRAB-zinc finger) genes in humans have a similar exon structure: the entire set of ZF domains is encoded on a single long 3' exon and the KRAB domain is split among one or more short 5' exons [1]. Typically, 80–90% of the final protein is encoded by the ZF exon. We took advantage of this fact to identify putative chimpanzee genes based on a simple tblastn method, which works independently of gene predictions. Combining results from the two gene clusters, we identified probable chimpanzee orthologs for 69 of 70 human KZNF genes (the slight change in human gene count results from current database predictions of 5 new ZF genes and retirement of 6 ZF genes as probable pseudogenes, see Additional file 1). It is possible that the single missing gene is a result of incomplete sequence coverage in the current chimpanzee genome assembly. Our results for cluster 19.12 are summarized in Figure 1 and tabular results for both clusters are given in Additional file 1, including predicted domain content and genome positions of the putative chimpanzee genes. Candidate orthologs for nearly all the human genes were also found in the orangutan genome but were not analyzed in detail (data not shown). The 69 putative chimpanzee orthologs were identified based only on their ZF-encoding exon. To test whether these exons are plausibly part of full KRAB-containing ZF genes, we used tblastn to search for ORFs that encode the most conserved part of the KRAB domain. 64 of the 69 putative chimpanzee ZF exons had a potential KRAB-encoding exon within 13 kb upstream (mean 7.0 kb). Most of the chimpanzee predictions that lacked a nearby KRAB exon correspond to human genes that are also predicted to lack a KRAB domain (Additional file 1). We also derived full gene models for 11 of the chimpanzee genes from cluster 19.12, nine of which are given as missing in chimpanzee in Tadepally et al. [2]. All 11 predictions aligned with their human counterparts with only a few amino acid changes; one is shown in Figure 1. Finally, Ensembl chimpanzee gene predictions exist for 14 of the genes from cluster 19.12 given as missing or defective in Tadepally et al. [2]. Though we remain uncertain about how many of the 69 putative chimpanzee genes from these two gene clusters will prove to be functional, our results are in much better accord with the high degree of overall similarity of the human and chimpanzee genomes. At the very least, sequence corresponding to no more than one of the 70 human genes is entirely missing in chimpanzee. We conclude that it remains possible that there is a perfect one-to-one correspondence of ZF genes between human and chimpanzee. To facilitate viewing potential ZF genes in the chimpanzee genome, we conducted a whole genome profile search and compiled the results as a BED format text file that can be loaded into the UCSC genome browser (see Methods and Additional file 2). The positions of all genomic matches to common ZF-associated domains (KRAB, SCAN, ZF, SET, and BTB) appear in this track regardless of prediction status. With this track displayed in full, inspection of the chimpanzee genome in the regions corresponding to human ZF gene clusters reveals multiple potential ZF genes that are currently unpredicted and unannotated.

Bottom Line: One of the main conclusions from the paper is that there is a dramatic rate of gene duplication and gene loss, including the surprising result that 118 human ZF genes are absent in chimpanzee.This discrepancy appears to result from the fact that the SCAN domain did indeed arise before the KRAB domain but is present only in non-ZF genes until a much later date.In addition, we present evidence that provides a more parsimonious explanation for the large proportion of mammalian KRAB-ZF genes without a SCAN domain.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, 91895, USA. jht@u.washington.edu

ABSTRACT

Background: A recent study by Tadepally et al. describes the clustering of zinc finger (ZF) genes in the human genome and traces their evolutionary history among several placental mammals with complete or draft genome sequences. One of the main conclusions from the paper is that there is a dramatic rate of gene duplication and gene loss, including the surprising result that 118 human ZF genes are absent in chimpanzee. The authors also present evidence concerning the ancestral order in which the ZF-associated KRAB and SCAN domains were recruited to ZF proteins.

Results: Based on our analysis of two of the largest human ZF gene clusters, we find that nearly all of the human genes have plausible orthologs in chimpanzee. The one exception may be a result of the incomplete sequence coverage in the draft chimpanzee genome. The discrepancy in gene content analysis may result from the authors' dependence on the preliminary NCBI gene prediction set for chimpanzee, which appears to either fail to predict or to mispredict many chimpanzee ZF genes. Similar problems may affect the authors' interpretation of the more divergent dog, mouse, and rat ZF gene complements. In addition, we present evidence that the KRAB domain was recruited to ZF genes before the SCAN domain, rather than the reverse as the authors suggest. This discrepancy appears to result from the fact that the SCAN domain did indeed arise before the KRAB domain but is present only in non-ZF genes until a much later date.

Conclusion: When comparing gene content among species, especially when using draft genome assemblies, dependence on preliminary gene prediction sets can be seriously misleading. In such studies, genic sequences must be identified in a manner that is as independent as possible of prediction sets. In addition, we present evidence that provides a more parsimonious explanation for the large proportion of mammalian KRAB-ZF genes without a SCAN domain.

Show MeSH
Related in: MedlinePlus