Limits...
Genome and gene alterations by insertions and deletions in the evolution of human and chimpanzee chromosome 22.

Volfovsky N, Oleksyk TK, Cruz KC, Truelove AL, Stephens RM, Smith MW - BMC Genomics (2009)

Bottom Line: In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally.These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.

View Article: PubMed Central - HTML - PubMed

Affiliation: Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick, National Cancer Institute at Frederick, Frederick, MD 21702, USA. natalia@ncifcrf.gov

ABSTRACT

Background: Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels) play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.

Results: Specifically, we identified 6,279 indels of 10 bp or greater in a ~33 Mb alignment between human and chimpanzee chromosome 22. After the exclusion of those in repetitive DNA, 1,429 or 23% of indels still remained. This group was characterized according to the local or genome-wide repetitive nature, size, location relative to genes, and other genomic features. We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally. Among these classes, we encountered a high number of exactly repeated indel sequences, most likely due to recent duplications. Many of these indels (683 of 1,429) were in proximity of known human genes. Coding sequences and splice sites contained significantly fewer of these indels than expected from random expectations, suggesting that selection is a factor in limiting their persistence. A subset of indels from coding regions was experimentally validated and their impacts were predicted based on direct sequencing in several human populations as well as chimpanzees, bonobos, gorillas, and two subspecies of orangutans.

Conclusion: Our analysis demonstrates that while indels are distributed essentially randomly in intergenic and intronic genomic regions, they are significantly under-represented in coding sequences. There are substantial differences in representation of indel classes among genomic elements, most likely caused by differences in their evolutionary histories. Using local sequence context, we predicted origins and phylogenetic relationships of gene-impacting indels in primate species. These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.

Show MeSH
Distribution of Indel Length Among the Three Core Classes. (A) Approximate indels have the largest length, followed by exact, and then unique (Table S2.1A, p < .0001). (B) Approximate and unique indels are shorter than expected (Table S1.1A, p < .0001 (see Additional File 1)). Distribution of exact indels in both here and in Fig. S3 appears jagged due to the lower sample size (n = 168) in this class compared to the other two: approximate and unique.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2654908&req=5

Figure 4: Distribution of Indel Length Among the Three Core Classes. (A) Approximate indels have the largest length, followed by exact, and then unique (Table S2.1A, p < .0001). (B) Approximate and unique indels are shorter than expected (Table S1.1A, p < .0001 (see Additional File 1)). Distribution of exact indels in both here and in Fig. S3 appears jagged due to the lower sample size (n = 168) in this class compared to the other two: approximate and unique.

Mentions: A comparison of indel lengths (Fig. 4 and Table S2.1(see Additional file 1)). A and among the three core classes and across gene elements was performed. Approximate indels, making up the largest core class, were shorter in length in the observed compared to the resampled dataset (47 vs. 52 bp on average, p = 0.0003; Table S2.1A (see Additional file 1)). Unique and exact indels were similar in size, and both contained indels that were on average smaller than their counterparts in the approximate class (p < 0.0001, Table S2.1A and Fig. S1(see Additional file 1)). Observed indels located within genes were the shortest (26 bp), shorter than that expected from the resampled distribution (34 bp on average, p < 0.0001; Table S2.1A (see Additional file 1)). Approximate indels showed the greatest difference in size within gene elements (p < 0.0001, Table S2.1A (see Additional file 1)), while their presence in other categories, such as introns and intergenic elements, had no effect on length (p = 0.2–0.7, Table S2.1A (see Additional file 1)). Unique indels were shorter than expected overall, but the most significant differences between observed and expected length were found among those unique indels located outside genes, rather than within gene elements (p = 0.007–0.009, Table S2.1A (see Additional file 1)).


Genome and gene alterations by insertions and deletions in the evolution of human and chimpanzee chromosome 22.

Volfovsky N, Oleksyk TK, Cruz KC, Truelove AL, Stephens RM, Smith MW - BMC Genomics (2009)

Distribution of Indel Length Among the Three Core Classes. (A) Approximate indels have the largest length, followed by exact, and then unique (Table S2.1A, p < .0001). (B) Approximate and unique indels are shorter than expected (Table S1.1A, p < .0001 (see Additional File 1)). Distribution of exact indels in both here and in Fig. S3 appears jagged due to the lower sample size (n = 168) in this class compared to the other two: approximate and unique.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2654908&req=5

Figure 4: Distribution of Indel Length Among the Three Core Classes. (A) Approximate indels have the largest length, followed by exact, and then unique (Table S2.1A, p < .0001). (B) Approximate and unique indels are shorter than expected (Table S1.1A, p < .0001 (see Additional File 1)). Distribution of exact indels in both here and in Fig. S3 appears jagged due to the lower sample size (n = 168) in this class compared to the other two: approximate and unique.
Mentions: A comparison of indel lengths (Fig. 4 and Table S2.1(see Additional file 1)). A and among the three core classes and across gene elements was performed. Approximate indels, making up the largest core class, were shorter in length in the observed compared to the resampled dataset (47 vs. 52 bp on average, p = 0.0003; Table S2.1A (see Additional file 1)). Unique and exact indels were similar in size, and both contained indels that were on average smaller than their counterparts in the approximate class (p < 0.0001, Table S2.1A and Fig. S1(see Additional file 1)). Observed indels located within genes were the shortest (26 bp), shorter than that expected from the resampled distribution (34 bp on average, p < 0.0001; Table S2.1A (see Additional file 1)). Approximate indels showed the greatest difference in size within gene elements (p < 0.0001, Table S2.1A (see Additional file 1)), while their presence in other categories, such as introns and intergenic elements, had no effect on length (p = 0.2–0.7, Table S2.1A (see Additional file 1)). Unique indels were shorter than expected overall, but the most significant differences between observed and expected length were found among those unique indels located outside genes, rather than within gene elements (p = 0.007–0.009, Table S2.1A (see Additional file 1)).

Bottom Line: In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally.These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.

View Article: PubMed Central - HTML - PubMed

Affiliation: Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick, National Cancer Institute at Frederick, Frederick, MD 21702, USA. natalia@ncifcrf.gov

ABSTRACT

Background: Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels) play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.

Results: Specifically, we identified 6,279 indels of 10 bp or greater in a ~33 Mb alignment between human and chimpanzee chromosome 22. After the exclusion of those in repetitive DNA, 1,429 or 23% of indels still remained. This group was characterized according to the local or genome-wide repetitive nature, size, location relative to genes, and other genomic features. We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally. Among these classes, we encountered a high number of exactly repeated indel sequences, most likely due to recent duplications. Many of these indels (683 of 1,429) were in proximity of known human genes. Coding sequences and splice sites contained significantly fewer of these indels than expected from random expectations, suggesting that selection is a factor in limiting their persistence. A subset of indels from coding regions was experimentally validated and their impacts were predicted based on direct sequencing in several human populations as well as chimpanzees, bonobos, gorillas, and two subspecies of orangutans.

Conclusion: Our analysis demonstrates that while indels are distributed essentially randomly in intergenic and intronic genomic regions, they are significantly under-represented in coding sequences. There are substantial differences in representation of indel classes among genomic elements, most likely caused by differences in their evolutionary histories. Using local sequence context, we predicted origins and phylogenetic relationships of gene-impacting indels in primate species. These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.

Show MeSH