Limits...
Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins

View Article: PubMed Central - PubMed

ABSTRACT

Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.

No MeSH data available.


Related in: MedlinePlus

Example of analysis of a single protein, the 894AA long human zinc-finger PRDM9.The 13 tandemly 28AA long repeats (ID) are identified by the algorithm at the end of the protein, ordered by their location on the protein. Underlined letters correspond to the Zinc fingers annotated in SwissProt (first finger starts 7AA before the beginning of the first repeat). The order by which the method accumulates the repeats (Order), reveals clusters of identical repeats: 1 (ID=3,6,7), 2 (ID=4,5) and 3 (ID=8,11). Black coloured repeats represent the seed identified in the second step, and red coloured repeats are those identified by the PPM-based predictor in the third step. Repeats are highly similar, with high IC, as shown by the sequence logo. A maximum-likelihood tree of the repeats is shown in the right panel, where the repeats IDs are given on the y axis. The plot beneath the tree shows the positive correlation between evolutionary distance (in substitutions per site) and physical distance (in amino acids), obtained by comparing all repeat pairs, where the red line represents a linear regression fit (Spearman correlation=0.53, P value=6.7e−7). The mean dN/dS for all pair comparisons (n=78): <dN/dS>=2.7146±0.23, that is, significant positive selection.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120217&req=5

f2: Example of analysis of a single protein, the 894AA long human zinc-finger PRDM9.The 13 tandemly 28AA long repeats (ID) are identified by the algorithm at the end of the protein, ordered by their location on the protein. Underlined letters correspond to the Zinc fingers annotated in SwissProt (first finger starts 7AA before the beginning of the first repeat). The order by which the method accumulates the repeats (Order), reveals clusters of identical repeats: 1 (ID=3,6,7), 2 (ID=4,5) and 3 (ID=8,11). Black coloured repeats represent the seed identified in the second step, and red coloured repeats are those identified by the PPM-based predictor in the third step. Repeats are highly similar, with high IC, as shown by the sequence logo. A maximum-likelihood tree of the repeats is shown in the right panel, where the repeats IDs are given on the y axis. The plot beneath the tree shows the positive correlation between evolutionary distance (in substitutions per site) and physical distance (in amino acids), obtained by comparing all repeat pairs, where the red line represents a linear regression fit (Spearman correlation=0.53, P value=6.7e−7). The mean dN/dS for all pair comparisons (n=78): <dN/dS>=2.7146±0.23, that is, significant positive selection.

Mentions: Figure 2 exemplifies the analysis of a single protein, the human PRDM9 Zn-finger DNA-binding protein, which contains 13 tandem repeats. Additional examples are provided in Supplementary Figs 1 and 2, emphasizing that repeats can form diverse patterns and do not always recur in perfect tandem. The PRDM9 protein binds to double-stranded DNA breaks and promotes meiotic recombination in humans and mice, and is the only mammalian gene so far shown to play a distinct role in speciation4445. Rapid evolution of PRDM9 has been demonstrated including lineage-specific expansion of the Zn-fingers and positive selection in DNA-contacting positions4647. With the sequences of repeats at hand, we represent their evolutionary relationships by a maximum-likelihood repeat tree (see Methods), from which the evolutionary distances between repeats can be estimated and compared with the respective physical distances. Furthermore, treating repeats as paralogous elements, we estimate their pairwise dN/dS (synonymous to non-synonymous substitution rates) ratios by comparing the coding sequences for each pair of repeats (Methods). The mean over all pairs, <dN/dS>, yields a stable measure, as indicated by the small error on <dN/dS>, for the horizontal evolution of repeats within a protein. In the case of PRDM9, <dN/dS>=2.7±0.2, which is unequivocal evidence of positive selection in the horizontal evolution of the Zn-finger repeats, in agreement with previous findings47. We next apply this analysis to all 1081 repeat-containing proteins identified in Swissprot (Fig. 1b).


Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins
Example of analysis of a single protein, the 894AA long human zinc-finger PRDM9.The 13 tandemly 28AA long repeats (ID) are identified by the algorithm at the end of the protein, ordered by their location on the protein. Underlined letters correspond to the Zinc fingers annotated in SwissProt (first finger starts 7AA before the beginning of the first repeat). The order by which the method accumulates the repeats (Order), reveals clusters of identical repeats: 1 (ID=3,6,7), 2 (ID=4,5) and 3 (ID=8,11). Black coloured repeats represent the seed identified in the second step, and red coloured repeats are those identified by the PPM-based predictor in the third step. Repeats are highly similar, with high IC, as shown by the sequence logo. A maximum-likelihood tree of the repeats is shown in the right panel, where the repeats IDs are given on the y axis. The plot beneath the tree shows the positive correlation between evolutionary distance (in substitutions per site) and physical distance (in amino acids), obtained by comparing all repeat pairs, where the red line represents a linear regression fit (Spearman correlation=0.53, P value=6.7e−7). The mean dN/dS for all pair comparisons (n=78): <dN/dS>=2.7146±0.23, that is, significant positive selection.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120217&req=5

f2: Example of analysis of a single protein, the 894AA long human zinc-finger PRDM9.The 13 tandemly 28AA long repeats (ID) are identified by the algorithm at the end of the protein, ordered by their location on the protein. Underlined letters correspond to the Zinc fingers annotated in SwissProt (first finger starts 7AA before the beginning of the first repeat). The order by which the method accumulates the repeats (Order), reveals clusters of identical repeats: 1 (ID=3,6,7), 2 (ID=4,5) and 3 (ID=8,11). Black coloured repeats represent the seed identified in the second step, and red coloured repeats are those identified by the PPM-based predictor in the third step. Repeats are highly similar, with high IC, as shown by the sequence logo. A maximum-likelihood tree of the repeats is shown in the right panel, where the repeats IDs are given on the y axis. The plot beneath the tree shows the positive correlation between evolutionary distance (in substitutions per site) and physical distance (in amino acids), obtained by comparing all repeat pairs, where the red line represents a linear regression fit (Spearman correlation=0.53, P value=6.7e−7). The mean dN/dS for all pair comparisons (n=78): <dN/dS>=2.7146±0.23, that is, significant positive selection.
Mentions: Figure 2 exemplifies the analysis of a single protein, the human PRDM9 Zn-finger DNA-binding protein, which contains 13 tandem repeats. Additional examples are provided in Supplementary Figs 1 and 2, emphasizing that repeats can form diverse patterns and do not always recur in perfect tandem. The PRDM9 protein binds to double-stranded DNA breaks and promotes meiotic recombination in humans and mice, and is the only mammalian gene so far shown to play a distinct role in speciation4445. Rapid evolution of PRDM9 has been demonstrated including lineage-specific expansion of the Zn-fingers and positive selection in DNA-contacting positions4647. With the sequences of repeats at hand, we represent their evolutionary relationships by a maximum-likelihood repeat tree (see Methods), from which the evolutionary distances between repeats can be estimated and compared with the respective physical distances. Furthermore, treating repeats as paralogous elements, we estimate their pairwise dN/dS (synonymous to non-synonymous substitution rates) ratios by comparing the coding sequences for each pair of repeats (Methods). The mean over all pairs, <dN/dS>, yields a stable measure, as indicated by the small error on <dN/dS>, for the horizontal evolution of repeats within a protein. In the case of PRDM9, <dN/dS>=2.7±0.2, which is unequivocal evidence of positive selection in the horizontal evolution of the Zn-finger repeats, in agreement with previous findings47. We next apply this analysis to all 1081 repeat-containing proteins identified in Swissprot (Fig. 1b).

View Article: PubMed Central - PubMed

ABSTRACT

Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.

No MeSH data available.


Related in: MedlinePlus