Evidence of convergent evolution in humans and macaques supports an adaptive role for copy number variation of the β-defensin-2 gene.
Bottom Line: Remarkably, we found that the structure of the CNV is different between primates, with distinct mutational origins and CNV boundaries defined by retroviral long terminal repeat elements.In addition, the rhesus macaque gene has been subject to divergent positive selection at the amino acid level following its initial duplication event between 3 and 9.5 Ma, suggesting adaptation of this gene as the macaque successfully colonized novel environments outside Africa.Therefore, the molecular phenotype of β-defensin-2 CNV has undergone convergent evolution, and this gene shows evidence of adaptation at the amino acid level in rhesus macaques.
Affiliation: Department of Genetics, University of Leicester, United Kingdom.Show MeSH
Related in: MedlinePlus
Mentions: In parallel, we isolated BAC clones from a rhesus macaque genomic library using two sequences as probes (chr8:8068951–8069850; chr8:8072076–8073004) both in vitro by probing arrayed-BAC filters and in silico by database searching. We characterized six BACs containing the DEFB2L region using a combination of BAC end sequencing, PCR analysis, and FISH (supplementary table S3, Supplementary Material online), confirming that DEFB107–DEFB2L were arranged as predicted in the rheMac2 genome assembly (supplementary table S2, Supplementary Material online), and that DEFB2L mapped uniquely to distal 8p (fig. 2). FISH also confirmed that the BACs mapping to this region contain many dispersed repeats, and that at interphase DEFB2L is at the edge of the chromosome 8 domain. The full Sanger sequence of two further BACs from the same library (CHORI-250) was available (BAC 243E20 accession AC191454.4, BAC 65I2 accession AC193549.4) and analysis of these sequences showed two copies of the DEFB2L gene on a 20-kb tandem duplication (supplementary fig. S2, Supplementary Material online), with a distal boundary of the duplication consistent with the CNV boundary identified from aCGH data. The proximal boundary of the duplication, identified in 243E20, is not assembled on chromosome 8 in the rheMac2 assembly, illustrating the limitations of using a whole-genome shotgun assembly for array CGH design and the importance of BAC sequencing in assembling complex genomes. The two full-length copies on 243E20 share 91.3% identity at the nucleotide level, and we can estimate that the duplication event occurred around a similar time to divergence of the macaque lineage from baboon (Elango et al. 2009), about 10 Ma (Raaum et al. 2005). However, directly dating the origin of the duplication is hampered by the lack of convincing orthologous sequences for the entire 20-kb repeat from other Old World monkeys, and the sequence homogenizing influence of gene conversion. Fortuitously, in BAC 243E20, the two full-length paralogs are distinguished by an AluYRa1 SINE insertion on the proximal copy and an L1PA5 LINE insertion on the distal copy. The AluYRa1 subfamily has been estimated to be approximately 9.5 Ma old (Han et al. 2007; Liu et al. 2009), putting the earliest origin of the duplication at that point. By designing primers matching the sequence flanking these insertions, we can identify the presence of the duplication using PCR (fig. 3). Analysis of other Old World monkeys with known divergence times strongly suggests that two paralogous copies distinguished by the LINE and SINE insertions arose after the divergence of the lineage leading to Macaca sylvanus (∼4 Ma), but before the divergence of the lineage leading to M. fascicularis (∼3 Ma). This suggests that the duplication could be as recent as 3 Ma (fig. 3). In M. mulatta, paralogs without the integration of the LINE or SINE are seen in some individuals, confirming that the integrations occurred after the initial increase in copy number.Fig. 2.—
Affiliation: Department of Genetics, University of Leicester, United Kingdom.