Limits...
The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

Ferrada E - PLoS Comput. Biol. (2014)

Bottom Line: I characterize the properties underlying these differences and relate them to the structure of the potential.Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence.I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins.

View Article: PubMed Central - PubMed

Affiliation: Santa Fe Institute, Santa Fe, New Mexico, United States of America.

ABSTRACT
The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.

Show MeSH
Hierarchical clustering of phenotype spaces generated by the sequence-structure maps of binary potentials.Potentials are sampled by considering −1.00, −0.75, −0.50, −0.25, 0.00, 0.25, 0.50, 0.75, 1.00 (see main text and Table S1 in Text S1). Hierarchical clustering was carried out using similarity measure  and the group-average method.  values of each potential are specified on a color scale at the branches' tips, with  specified by the outermost value. Branches are colored according to the 7 different potentials described in Fig. 3 (see also main text and Table 1). Green and blue stacked bars following the color-coded potentials, correspond to non-degeneracy and encodability, respectively. Boxplots, in black, represent the distribution of foldability values over non-degenerate genotypes for each map. Canonical potentials are the HP and AB models and their shifted versions (Fig. 2). They are highlighted with red dots.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256021&req=5

pcbi-1003946-g004: Hierarchical clustering of phenotype spaces generated by the sequence-structure maps of binary potentials.Potentials are sampled by considering −1.00, −0.75, −0.50, −0.25, 0.00, 0.25, 0.50, 0.75, 1.00 (see main text and Table S1 in Text S1). Hierarchical clustering was carried out using similarity measure and the group-average method. values of each potential are specified on a color scale at the branches' tips, with specified by the outermost value. Branches are colored according to the 7 different potentials described in Fig. 3 (see also main text and Table 1). Green and blue stacked bars following the color-coded potentials, correspond to non-degeneracy and encodability, respectively. Boxplots, in black, represent the distribution of foldability values over non-degenerate genotypes for each map. Canonical potentials are the HP and AB models and their shifted versions (Fig. 2). They are highlighted with red dots.

Mentions: Figure 4 presents a hierarchical clustering of phenotype space based on (and ), for all possible pair combinations of binary potentials and (, 1,…, 245). Here I arbitrarily choose to focus on , however, similar conclusions arise from the analysis of the Jaccard index on genotype space () (Figure S1). Each tip of the tree represents an independent sequence-structure map. Maps that cluster closely in this tree have similar sets of accessible phenotypes (), that is, values close to 1.0. values that compose each potential are specified on a color scale at the branches' tips, with , specified at the outermost value. Branches are colored according to the potential, as described above (Table 1, Figure 3). Green and blue stacked bars following the color-coded potentials, correspond to non-degeneracy and encodability values, respectively. Boxplots, in black, represent the distribution of foldability over all non-degenerate sequences of each map.


The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

Ferrada E - PLoS Comput. Biol. (2014)

Hierarchical clustering of phenotype spaces generated by the sequence-structure maps of binary potentials.Potentials are sampled by considering −1.00, −0.75, −0.50, −0.25, 0.00, 0.25, 0.50, 0.75, 1.00 (see main text and Table S1 in Text S1). Hierarchical clustering was carried out using similarity measure  and the group-average method.  values of each potential are specified on a color scale at the branches' tips, with  specified by the outermost value. Branches are colored according to the 7 different potentials described in Fig. 3 (see also main text and Table 1). Green and blue stacked bars following the color-coded potentials, correspond to non-degeneracy and encodability, respectively. Boxplots, in black, represent the distribution of foldability values over non-degenerate genotypes for each map. Canonical potentials are the HP and AB models and their shifted versions (Fig. 2). They are highlighted with red dots.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256021&req=5

pcbi-1003946-g004: Hierarchical clustering of phenotype spaces generated by the sequence-structure maps of binary potentials.Potentials are sampled by considering −1.00, −0.75, −0.50, −0.25, 0.00, 0.25, 0.50, 0.75, 1.00 (see main text and Table S1 in Text S1). Hierarchical clustering was carried out using similarity measure and the group-average method. values of each potential are specified on a color scale at the branches' tips, with specified by the outermost value. Branches are colored according to the 7 different potentials described in Fig. 3 (see also main text and Table 1). Green and blue stacked bars following the color-coded potentials, correspond to non-degeneracy and encodability, respectively. Boxplots, in black, represent the distribution of foldability values over non-degenerate genotypes for each map. Canonical potentials are the HP and AB models and their shifted versions (Fig. 2). They are highlighted with red dots.
Mentions: Figure 4 presents a hierarchical clustering of phenotype space based on (and ), for all possible pair combinations of binary potentials and (, 1,…, 245). Here I arbitrarily choose to focus on , however, similar conclusions arise from the analysis of the Jaccard index on genotype space () (Figure S1). Each tip of the tree represents an independent sequence-structure map. Maps that cluster closely in this tree have similar sets of accessible phenotypes (), that is, values close to 1.0. values that compose each potential are specified on a color scale at the branches' tips, with , specified at the outermost value. Branches are colored according to the potential, as described above (Table 1, Figure 3). Green and blue stacked bars following the color-coded potentials, correspond to non-degeneracy and encodability values, respectively. Boxplots, in black, represent the distribution of foldability over all non-degenerate sequences of each map.

Bottom Line: I characterize the properties underlying these differences and relate them to the structure of the potential.Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence.I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins.

View Article: PubMed Central - PubMed

Affiliation: Santa Fe Institute, Santa Fe, New Mexico, United States of America.

ABSTRACT
The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.

Show MeSH