Limits...
Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins

View Article: PubMed Central - PubMed

ABSTRACT

Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.

No MeSH data available.


Selection in the horizontal and vertical evolution of repeats in sets of orthologous proteins.(a) Schematic illustration of two aligned orthologous proteins from species A and species B, containing 4 and 3 repeats (marked as blue segments), respectively. Both horizontal and vertical selection measures are based on all orthologous repeats (indicated by the dashed box). Horizontal selection (R-intra) is given by the average across species of <dN/dS> (Supplementary Fig. 4), where <dN/dS> is estimated twice: once as the average dN/dS of all repeat pairs within a protein/species, and second as the average dN/dS of all consecutive pairs of repeats. Vertical selection (R-inter) is estimated, once globally, by the dN/dS of merged segments of orthologous repeats, and locally (unit-based), by the average dN/dS of all orthologous repeat pairs. (b) Distribution of dN/dS for the horizontal evolution of repeats in repeat (R) containing proteins (n=798), followed by the distribution of dN/dS of complete proteins: non-repetitive (NR) proteins (n=10,086), and repetitive proteins, with (entire) or without (−R) the repetitive part, followed by dN/dS distributions for the vertical evolution of repeats. (c) The medians of the distributions in b for all pairs of organisms within each quartet or pair of species. At the top, the numbers of orthologous repetitive proteins relative to the number of all orthologous proteins in each quartet or pair are shown. For the complete dN/dS distributions of all species pairs, see Supplementary Fig. 8.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120217&req=5

f5: Selection in the horizontal and vertical evolution of repeats in sets of orthologous proteins.(a) Schematic illustration of two aligned orthologous proteins from species A and species B, containing 4 and 3 repeats (marked as blue segments), respectively. Both horizontal and vertical selection measures are based on all orthologous repeats (indicated by the dashed box). Horizontal selection (R-intra) is given by the average across species of <dN/dS> (Supplementary Fig. 4), where <dN/dS> is estimated twice: once as the average dN/dS of all repeat pairs within a protein/species, and second as the average dN/dS of all consecutive pairs of repeats. Vertical selection (R-inter) is estimated, once globally, by the dN/dS of merged segments of orthologous repeats, and locally (unit-based), by the average dN/dS of all orthologous repeat pairs. (b) Distribution of dN/dS for the horizontal evolution of repeats in repeat (R) containing proteins (n=798), followed by the distribution of dN/dS of complete proteins: non-repetitive (NR) proteins (n=10,086), and repetitive proteins, with (entire) or without (−R) the repetitive part, followed by dN/dS distributions for the vertical evolution of repeats. (c) The medians of the distributions in b for all pairs of organisms within each quartet or pair of species. At the top, the numbers of orthologous repetitive proteins relative to the number of all orthologous proteins in each quartet or pair are shown. For the complete dN/dS distributions of all species pairs, see Supplementary Fig. 8.

Mentions: Next, we similarly analyse a set of organisms from several diverse major taxa (Fig. 4). As expected, the number of repetitive proteins significantly drops from vertebrates to invertebrates to plants to unicellular organisms (Fig. 4a). There are both evident similarities and differences in the distributions of the period lengths (Fig. 4b). Zn-fingers are ubiquitous in vertebrates, but not in other organisms. In fish, there is a large family of uncharacterized repetitive proteins with period of 58AA. The dominant periods in the fly (Drosophila melanogaster) are 6AA (n=62), associated with a variety of functions and some uncharacterized proteins (n=37), and 18AA (n=27), mostly in histone H1. In plants (Arabidopsis thaliana), the dominant period is 24AA (n=57), associated with diverse functions. In yeast (Saccharomyces cerevisiae), the dominant period is 12AA, associated with Helicases, and additionally, there is a clear enrichment of cell wall proteins (25 of 106 among repetitive proteins compared with 215 of 5,917 in the entire proteome) with various period lengths. Despite these notable differences between the functional repertoires of repetitive proteins, both the horizontal propagation of repeats within proteins implied by the correlation between physical and evolutionary distances (Fig. 4d) and the much lower <dN/dS> values for the horizontal evolution of repeats compared with shuffled repeats (Fig. 5e) are universal phenomena.


Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins
Selection in the horizontal and vertical evolution of repeats in sets of orthologous proteins.(a) Schematic illustration of two aligned orthologous proteins from species A and species B, containing 4 and 3 repeats (marked as blue segments), respectively. Both horizontal and vertical selection measures are based on all orthologous repeats (indicated by the dashed box). Horizontal selection (R-intra) is given by the average across species of <dN/dS> (Supplementary Fig. 4), where <dN/dS> is estimated twice: once as the average dN/dS of all repeat pairs within a protein/species, and second as the average dN/dS of all consecutive pairs of repeats. Vertical selection (R-inter) is estimated, once globally, by the dN/dS of merged segments of orthologous repeats, and locally (unit-based), by the average dN/dS of all orthologous repeat pairs. (b) Distribution of dN/dS for the horizontal evolution of repeats in repeat (R) containing proteins (n=798), followed by the distribution of dN/dS of complete proteins: non-repetitive (NR) proteins (n=10,086), and repetitive proteins, with (entire) or without (−R) the repetitive part, followed by dN/dS distributions for the vertical evolution of repeats. (c) The medians of the distributions in b for all pairs of organisms within each quartet or pair of species. At the top, the numbers of orthologous repetitive proteins relative to the number of all orthologous proteins in each quartet or pair are shown. For the complete dN/dS distributions of all species pairs, see Supplementary Fig. 8.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120217&req=5

f5: Selection in the horizontal and vertical evolution of repeats in sets of orthologous proteins.(a) Schematic illustration of two aligned orthologous proteins from species A and species B, containing 4 and 3 repeats (marked as blue segments), respectively. Both horizontal and vertical selection measures are based on all orthologous repeats (indicated by the dashed box). Horizontal selection (R-intra) is given by the average across species of <dN/dS> (Supplementary Fig. 4), where <dN/dS> is estimated twice: once as the average dN/dS of all repeat pairs within a protein/species, and second as the average dN/dS of all consecutive pairs of repeats. Vertical selection (R-inter) is estimated, once globally, by the dN/dS of merged segments of orthologous repeats, and locally (unit-based), by the average dN/dS of all orthologous repeat pairs. (b) Distribution of dN/dS for the horizontal evolution of repeats in repeat (R) containing proteins (n=798), followed by the distribution of dN/dS of complete proteins: non-repetitive (NR) proteins (n=10,086), and repetitive proteins, with (entire) or without (−R) the repetitive part, followed by dN/dS distributions for the vertical evolution of repeats. (c) The medians of the distributions in b for all pairs of organisms within each quartet or pair of species. At the top, the numbers of orthologous repetitive proteins relative to the number of all orthologous proteins in each quartet or pair are shown. For the complete dN/dS distributions of all species pairs, see Supplementary Fig. 8.
Mentions: Next, we similarly analyse a set of organisms from several diverse major taxa (Fig. 4). As expected, the number of repetitive proteins significantly drops from vertebrates to invertebrates to plants to unicellular organisms (Fig. 4a). There are both evident similarities and differences in the distributions of the period lengths (Fig. 4b). Zn-fingers are ubiquitous in vertebrates, but not in other organisms. In fish, there is a large family of uncharacterized repetitive proteins with period of 58AA. The dominant periods in the fly (Drosophila melanogaster) are 6AA (n=62), associated with a variety of functions and some uncharacterized proteins (n=37), and 18AA (n=27), mostly in histone H1. In plants (Arabidopsis thaliana), the dominant period is 24AA (n=57), associated with diverse functions. In yeast (Saccharomyces cerevisiae), the dominant period is 12AA, associated with Helicases, and additionally, there is a clear enrichment of cell wall proteins (25 of 106 among repetitive proteins compared with 215 of 5,917 in the entire proteome) with various period lengths. Despite these notable differences between the functional repertoires of repetitive proteins, both the horizontal propagation of repeats within proteins implied by the correlation between physical and evolutionary distances (Fig. 4d) and the much lower <dN/dS> values for the horizontal evolution of repeats compared with shuffled repeats (Fig. 5e) are universal phenomena.

View Article: PubMed Central - PubMed

ABSTRACT

Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.

No MeSH data available.