Limits...
The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

Ferrada E - PLoS Comput. Biol. (2014)

Bottom Line: I characterize the properties underlying these differences and relate them to the structure of the potential.Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence.I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins.

View Article: PubMed Central - PubMed

Affiliation: Santa Fe Institute, Santa Fe, New Mexico, United States of America.

ABSTRACT
The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.

Show MeSH
Fraction of new phenotypes across k-neighborhoods at distance d.For each of the 245 potentials analysed in this study (Table S1 in Text S1), I draw 1,000 random non-degenerate sequences and for each pair of sequences (), calculate ,  and , at constant k and variable distances d. I average  values according to their type of potential (I-VI) (color code, see Fig. 3 and Table 1). (A) k = 2. (B) k = 4. Error bars represent one standard deviation from the mean. Grey dashed lines illustrate the overlapping threshold: .
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256021&req=5

pcbi-1003946-g011: Fraction of new phenotypes across k-neighborhoods at distance d.For each of the 245 potentials analysed in this study (Table S1 in Text S1), I draw 1,000 random non-degenerate sequences and for each pair of sequences (), calculate , and , at constant k and variable distances d. I average values according to their type of potential (I-VI) (color code, see Fig. 3 and Table 1). (A) k = 2. (B) k = 4. Error bars represent one standard deviation from the mean. Grey dashed lines illustrate the overlapping threshold: .

Mentions: Figure 11 presents and as a function of distance for potentials type I-VI. At very short distances (with even overlapped neighborhoods), shows 50 to 70% of unique phenotypes (Fig 11A). As expected, at short and larger , decreases as a function of (Fig. 11B). In the case of 2-neighborhoods, the fraction of unique phenotypes increases rapidly with distance and, at short , there are only slight differences between types of potentials. At the overlapping threshold of (, dashed line Fig. 11), approximately 85 to 95% of phenotypes are unique to pairs of neighborhoods. At larger distances, however, differ considerably across potentials. For instance, at , potentials type I access 2-neighborhoods with 100% new phenotypes; whereas, distant 2-neighborhoods of potentials types II, III, IV and VI, share from 10 to 15% phenotypes. This trend intensifies in the case of potentials type V, that reach similar values compared to 2-neighborhoods at short distances (Figure 11A).


The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

Ferrada E - PLoS Comput. Biol. (2014)

Fraction of new phenotypes across k-neighborhoods at distance d.For each of the 245 potentials analysed in this study (Table S1 in Text S1), I draw 1,000 random non-degenerate sequences and for each pair of sequences (), calculate ,  and , at constant k and variable distances d. I average  values according to their type of potential (I-VI) (color code, see Fig. 3 and Table 1). (A) k = 2. (B) k = 4. Error bars represent one standard deviation from the mean. Grey dashed lines illustrate the overlapping threshold: .
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256021&req=5

pcbi-1003946-g011: Fraction of new phenotypes across k-neighborhoods at distance d.For each of the 245 potentials analysed in this study (Table S1 in Text S1), I draw 1,000 random non-degenerate sequences and for each pair of sequences (), calculate , and , at constant k and variable distances d. I average values according to their type of potential (I-VI) (color code, see Fig. 3 and Table 1). (A) k = 2. (B) k = 4. Error bars represent one standard deviation from the mean. Grey dashed lines illustrate the overlapping threshold: .
Mentions: Figure 11 presents and as a function of distance for potentials type I-VI. At very short distances (with even overlapped neighborhoods), shows 50 to 70% of unique phenotypes (Fig 11A). As expected, at short and larger , decreases as a function of (Fig. 11B). In the case of 2-neighborhoods, the fraction of unique phenotypes increases rapidly with distance and, at short , there are only slight differences between types of potentials. At the overlapping threshold of (, dashed line Fig. 11), approximately 85 to 95% of phenotypes are unique to pairs of neighborhoods. At larger distances, however, differ considerably across potentials. For instance, at , potentials type I access 2-neighborhoods with 100% new phenotypes; whereas, distant 2-neighborhoods of potentials types II, III, IV and VI, share from 10 to 15% phenotypes. This trend intensifies in the case of potentials type V, that reach similar values compared to 2-neighborhoods at short distances (Figure 11A).

Bottom Line: I characterize the properties underlying these differences and relate them to the structure of the potential.Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence.I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins.

View Article: PubMed Central - PubMed

Affiliation: Santa Fe Institute, Santa Fe, New Mexico, United States of America.

ABSTRACT
The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.

Show MeSH