Limits...
Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins

View Article: PubMed Central - PubMed

ABSTRACT

Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.

No MeSH data available.


Related in: MedlinePlus

Repeats in the human proteome.(a) Distribution of the most frequent interval (MFI) representing the period/repeats length. (b) Distribution of the number of repeats in a protein. Inset depicts the distribution of recurrence type (FT=fully tandem; PT=partially tandem, that is, at least two repeats recurring in tandem; NT=non-tandem). See examples in Supplementary Fig. 1. (c) Spearman correlation between the physical and evolutionary distances of repeats in a protein. (d) Distribution of dN/dS ratios of all valid pair comparisons in a protein, across the proteome, on log-scale (n=319K, out of all possible 510K pairwise comparisons). Black curve corresponds to the identified repeats (median=0.52) and red curve corresponds to shuffled repeats (median=0.99). Bins equal 0.01. (e) Distribution of the mean dN/dS ratio of all valid pair comparisons in a protein, <dN/dS>, across the proteome, for real repeats (black) and shuffled repeats (red). Bins equal 0.1, shown on linear-scale. Inset shows a scatter plot of <dN/dS> of real repeats versus the respective shuffled repeats, where very short repeats (≤10AA) are superimposed (cyan). (f) The relationship between <dN/dS> and the error on the mean. Inset depicts the relative error.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120217&req=5

f3: Repeats in the human proteome.(a) Distribution of the most frequent interval (MFI) representing the period/repeats length. (b) Distribution of the number of repeats in a protein. Inset depicts the distribution of recurrence type (FT=fully tandem; PT=partially tandem, that is, at least two repeats recurring in tandem; NT=non-tandem). See examples in Supplementary Fig. 1. (c) Spearman correlation between the physical and evolutionary distances of repeats in a protein. (d) Distribution of dN/dS ratios of all valid pair comparisons in a protein, across the proteome, on log-scale (n=319K, out of all possible 510K pairwise comparisons). Black curve corresponds to the identified repeats (median=0.52) and red curve corresponds to shuffled repeats (median=0.99). Bins equal 0.01. (e) Distribution of the mean dN/dS ratio of all valid pair comparisons in a protein, <dN/dS>, across the proteome, for real repeats (black) and shuffled repeats (red). Bins equal 0.1, shown on linear-scale. Inset shows a scatter plot of <dN/dS> of real repeats versus the respective shuffled repeats, where very short repeats (≤10AA) are superimposed (cyan). (f) The relationship between <dN/dS> and the error on the mean. Inset depicts the relative error.

Mentions: The statistics of the repeats and their evolutionary characterization across the human proteome are summarized in Fig. 3 (Supplementary Data 1). The distribution of the repeat lengths (Fig. 3a) highlights evident peaks observed at: MFI=28AA, identified in 37% of the proteins (398 of 1,081), all of which are Zn-fingers, and at 105AA, associated with protocadherin repeats. Other dominant families are keratin, collagen, and ankyrin repeats. Enrichment analysis of GO annotations, using GOrilla48, shows that functional categories DNA/RNA binding, transcription, regulation, extracellular organization, and various metabolic and biosynthesis processes are enriched for proteins containing repeats (Supplementary Data 2).


Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins
Repeats in the human proteome.(a) Distribution of the most frequent interval (MFI) representing the period/repeats length. (b) Distribution of the number of repeats in a protein. Inset depicts the distribution of recurrence type (FT=fully tandem; PT=partially tandem, that is, at least two repeats recurring in tandem; NT=non-tandem). See examples in Supplementary Fig. 1. (c) Spearman correlation between the physical and evolutionary distances of repeats in a protein. (d) Distribution of dN/dS ratios of all valid pair comparisons in a protein, across the proteome, on log-scale (n=319K, out of all possible 510K pairwise comparisons). Black curve corresponds to the identified repeats (median=0.52) and red curve corresponds to shuffled repeats (median=0.99). Bins equal 0.01. (e) Distribution of the mean dN/dS ratio of all valid pair comparisons in a protein, <dN/dS>, across the proteome, for real repeats (black) and shuffled repeats (red). Bins equal 0.1, shown on linear-scale. Inset shows a scatter plot of <dN/dS> of real repeats versus the respective shuffled repeats, where very short repeats (≤10AA) are superimposed (cyan). (f) The relationship between <dN/dS> and the error on the mean. Inset depicts the relative error.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120217&req=5

f3: Repeats in the human proteome.(a) Distribution of the most frequent interval (MFI) representing the period/repeats length. (b) Distribution of the number of repeats in a protein. Inset depicts the distribution of recurrence type (FT=fully tandem; PT=partially tandem, that is, at least two repeats recurring in tandem; NT=non-tandem). See examples in Supplementary Fig. 1. (c) Spearman correlation between the physical and evolutionary distances of repeats in a protein. (d) Distribution of dN/dS ratios of all valid pair comparisons in a protein, across the proteome, on log-scale (n=319K, out of all possible 510K pairwise comparisons). Black curve corresponds to the identified repeats (median=0.52) and red curve corresponds to shuffled repeats (median=0.99). Bins equal 0.01. (e) Distribution of the mean dN/dS ratio of all valid pair comparisons in a protein, <dN/dS>, across the proteome, for real repeats (black) and shuffled repeats (red). Bins equal 0.1, shown on linear-scale. Inset shows a scatter plot of <dN/dS> of real repeats versus the respective shuffled repeats, where very short repeats (≤10AA) are superimposed (cyan). (f) The relationship between <dN/dS> and the error on the mean. Inset depicts the relative error.
Mentions: The statistics of the repeats and their evolutionary characterization across the human proteome are summarized in Fig. 3 (Supplementary Data 1). The distribution of the repeat lengths (Fig. 3a) highlights evident peaks observed at: MFI=28AA, identified in 37% of the proteins (398 of 1,081), all of which are Zn-fingers, and at 105AA, associated with protocadherin repeats. Other dominant families are keratin, collagen, and ankyrin repeats. Enrichment analysis of GO annotations, using GOrilla48, shows that functional categories DNA/RNA binding, transcription, regulation, extracellular organization, and various metabolic and biosynthesis processes are enriched for proteins containing repeats (Supplementary Data 2).

View Article: PubMed Central - PubMed

ABSTRACT

Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.

No MeSH data available.


Related in: MedlinePlus