Limits...
Genome and gene alterations by insertions and deletions in the evolution of human and chimpanzee chromosome 22.

Volfovsky N, Oleksyk TK, Cruz KC, Truelove AL, Stephens RM, Smith MW - BMC Genomics (2009)

Bottom Line: In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally.These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.

View Article: PubMed Central - HTML - PubMed

Affiliation: Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick, National Cancer Institute at Frederick, Frederick, MD 21702, USA. natalia@ncifcrf.gov

ABSTRACT

Background: Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels) play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.

Results: Specifically, we identified 6,279 indels of 10 bp or greater in a ~33 Mb alignment between human and chimpanzee chromosome 22. After the exclusion of those in repetitive DNA, 1,429 or 23% of indels still remained. This group was characterized according to the local or genome-wide repetitive nature, size, location relative to genes, and other genomic features. We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally. Among these classes, we encountered a high number of exactly repeated indel sequences, most likely due to recent duplications. Many of these indels (683 of 1,429) were in proximity of known human genes. Coding sequences and splice sites contained significantly fewer of these indels than expected from random expectations, suggesting that selection is a factor in limiting their persistence. A subset of indels from coding regions was experimentally validated and their impacts were predicted based on direct sequencing in several human populations as well as chimpanzees, bonobos, gorillas, and two subspecies of orangutans.

Conclusion: Our analysis demonstrates that while indels are distributed essentially randomly in intergenic and intronic genomic regions, they are significantly under-represented in coding sequences. There are substantial differences in representation of indel classes among genomic elements, most likely caused by differences in their evolutionary histories. Using local sequence context, we predicted origins and phylogenetic relationships of gene-impacting indels in primate species. These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.

Show MeSH
Distribution of Indels Among Different Genome and Core Indel Classes. Frequency of the three core indel classes: approximate, exact, and unique, are contrasted in the observed and resampled datasets (orange bars) There is an excess of observed approximate and exact indels, and a shortage of unique indels compared to the expected values for chromosome 22 (LR χ2, d.f. = 2, χ2 = 916, p < 0.0001). Colours within the bars representing observed data indicate the relative frequency of the three genome classes (chromosome-multiple, chromosome-unique, and genome-unique). The distribution of core indel types among the genome classes is not random with majority represented by genome-unique indels (LR χ2, d.f. = 4, χ2 = 23.28, p = 0.0001, Table S2.1A) (see Additional file 1)).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2654908&req=5

Figure 2: Distribution of Indels Among Different Genome and Core Indel Classes. Frequency of the three core indel classes: approximate, exact, and unique, are contrasted in the observed and resampled datasets (orange bars) There is an excess of observed approximate and exact indels, and a shortage of unique indels compared to the expected values for chromosome 22 (LR χ2, d.f. = 2, χ2 = 916, p < 0.0001). Colours within the bars representing observed data indicate the relative frequency of the three genome classes (chromosome-multiple, chromosome-unique, and genome-unique). The distribution of core indel types among the genome classes is not random with majority represented by genome-unique indels (LR χ2, d.f. = 4, χ2 = 23.28, p = 0.0001, Table S2.1A) (see Additional file 1)).

Mentions: Indels from the three core classes (approximate, exact, or unique) were not represented equally among the three genome classes (genome-unique, chromosome-unique, or chromosome-multiple) (LR χ2 = 23.28, 4 d.f., p = 0.0001, Table S2.1A (see Additional file 1)). We identified a significant excess of approximate and exact indels at the expense of unique indels in the observed dataset (Likelihood Ratio (LR) χ2 = 916, 2 d.f., P < 0.0001; Table S1(see Additional file 1) and Fig. 2). The majority of genome-unique indels had copies locally and were classified as approximate core indels (56.3%) reflecting their locally repetitive nature (Table S1(see Additional file 1)). Most chromosome-unique indels were also found in the approximate core class (75.7%). At the same time, the unique core indels (those with no copies in the 10 Kb flanking sequence) were better represented among the genome-unique (no copies elsewhere in the genome) (31.7%) compared to the chromosome-unique indels (copies unique to the chromosome) (14.6%) (Table S1, Fig. S2(see Additional file 1)).


Genome and gene alterations by insertions and deletions in the evolution of human and chimpanzee chromosome 22.

Volfovsky N, Oleksyk TK, Cruz KC, Truelove AL, Stephens RM, Smith MW - BMC Genomics (2009)

Distribution of Indels Among Different Genome and Core Indel Classes. Frequency of the three core indel classes: approximate, exact, and unique, are contrasted in the observed and resampled datasets (orange bars) There is an excess of observed approximate and exact indels, and a shortage of unique indels compared to the expected values for chromosome 22 (LR χ2, d.f. = 2, χ2 = 916, p < 0.0001). Colours within the bars representing observed data indicate the relative frequency of the three genome classes (chromosome-multiple, chromosome-unique, and genome-unique). The distribution of core indel types among the genome classes is not random with majority represented by genome-unique indels (LR χ2, d.f. = 4, χ2 = 23.28, p = 0.0001, Table S2.1A) (see Additional file 1)).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2654908&req=5

Figure 2: Distribution of Indels Among Different Genome and Core Indel Classes. Frequency of the three core indel classes: approximate, exact, and unique, are contrasted in the observed and resampled datasets (orange bars) There is an excess of observed approximate and exact indels, and a shortage of unique indels compared to the expected values for chromosome 22 (LR χ2, d.f. = 2, χ2 = 916, p < 0.0001). Colours within the bars representing observed data indicate the relative frequency of the three genome classes (chromosome-multiple, chromosome-unique, and genome-unique). The distribution of core indel types among the genome classes is not random with majority represented by genome-unique indels (LR χ2, d.f. = 4, χ2 = 23.28, p = 0.0001, Table S2.1A) (see Additional file 1)).
Mentions: Indels from the three core classes (approximate, exact, or unique) were not represented equally among the three genome classes (genome-unique, chromosome-unique, or chromosome-multiple) (LR χ2 = 23.28, 4 d.f., p = 0.0001, Table S2.1A (see Additional file 1)). We identified a significant excess of approximate and exact indels at the expense of unique indels in the observed dataset (Likelihood Ratio (LR) χ2 = 916, 2 d.f., P < 0.0001; Table S1(see Additional file 1) and Fig. 2). The majority of genome-unique indels had copies locally and were classified as approximate core indels (56.3%) reflecting their locally repetitive nature (Table S1(see Additional file 1)). Most chromosome-unique indels were also found in the approximate core class (75.7%). At the same time, the unique core indels (those with no copies in the 10 Kb flanking sequence) were better represented among the genome-unique (no copies elsewhere in the genome) (31.7%) compared to the chromosome-unique indels (copies unique to the chromosome) (14.6%) (Table S1, Fig. S2(see Additional file 1)).

Bottom Line: In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally.These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.

View Article: PubMed Central - HTML - PubMed

Affiliation: Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick, National Cancer Institute at Frederick, Frederick, MD 21702, USA. natalia@ncifcrf.gov

ABSTRACT

Background: Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels) play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.

Results: Specifically, we identified 6,279 indels of 10 bp or greater in a ~33 Mb alignment between human and chimpanzee chromosome 22. After the exclusion of those in repetitive DNA, 1,429 or 23% of indels still remained. This group was characterized according to the local or genome-wide repetitive nature, size, location relative to genes, and other genomic features. We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally. Among these classes, we encountered a high number of exactly repeated indel sequences, most likely due to recent duplications. Many of these indels (683 of 1,429) were in proximity of known human genes. Coding sequences and splice sites contained significantly fewer of these indels than expected from random expectations, suggesting that selection is a factor in limiting their persistence. A subset of indels from coding regions was experimentally validated and their impacts were predicted based on direct sequencing in several human populations as well as chimpanzees, bonobos, gorillas, and two subspecies of orangutans.

Conclusion: Our analysis demonstrates that while indels are distributed essentially randomly in intergenic and intronic genomic regions, they are significantly under-represented in coding sequences. There are substantial differences in representation of indel classes among genomic elements, most likely caused by differences in their evolutionary histories. Using local sequence context, we predicted origins and phylogenetic relationships of gene-impacting indels in primate species. These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.

Show MeSH