Limits...
Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.

Teraguchi S, Patil A, Standley DM - BMC Bioinformatics (2010)

Bottom Line: These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions.The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01).This is most likely a result of biophysical restraints that have yet to be elucidated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Host Defense, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan. teraguch@ifrec.osaka-u.ac.jp

ABSTRACT

Background: In order to characterize mammalian intrinsically disordered domains (IDDs) we examined the patterns in their amino acid abundance as well as overrepresented local sequence motifs. We considered IDDs from mouse proteins associated with innate immune responses as well as a set of generic human genes. These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions. IDDs were then clustered by amino acid abundance, and further analyzed in terms of co-occurrence of clusters with functionally characterized Pfam domains.

Results: Overall, IDDs were very different from randomly generated sequences. The deviation from random distributions was at least as great as that for ordered domains, for which the deviation can be rationalized in terms of strong evolutionary pressure for structure and function. The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01). Local sequence motifs that were over-represented in the innate immune set consisted mostly of low complexity fragments, primarily characterized by amino acid repeats, and could not be assigned an obvious functional role.

Conclusions: Our results suggest that IDDs are constrained within a narrow subset of possible sequences. This is most likely a result of biophysical restraints that have yet to be elucidated. More detailed examination of the functional relationship between the IDDs and associated Pfam domains is one possible avenue of investigation.

Show MeSH

Related in: MedlinePlus

Observed frequency of amino acid histogram similarity scores. The similarity score is scaled from 0 to 20 for convenience (i.e., 100% identical histograms would have a score of 20). Native refers to actual protein sequences and random to artificially generated sequences with the same overall amino acid composition and length distribution as native sequences. A) Data are shown for non-immune random, immune random, non-immune IDD, and immune IDD sets. B) Data are shown for random ordered and native ordered sequences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2957690&req=5

Figure 1: Observed frequency of amino acid histogram similarity scores. The similarity score is scaled from 0 to 20 for convenience (i.e., 100% identical histograms would have a score of 20). Native refers to actual protein sequences and random to artificially generated sequences with the same overall amino acid composition and length distribution as native sequences. A) Data are shown for non-immune random, immune random, non-immune IDD, and immune IDD sets. B) Data are shown for random ordered and native ordered sequences.

Mentions: Using the Gaussian-based similarity score we carried out all-against-all comparison of IDDs in the immune and non-immune sets. For each of the IDD sets, we also constructed a randomized sequence set with the identical overall amino acid composition, sequence number and domain length frequency by shuffling the residues in the original native sequence set, as described in Methods. We then constructed a histogram of the similarities within each of the resulting 4 sets by binning the calculated similarity scores into 50 equal-sized windows. As figure 1A illustrates, the random distributions are skewed toward the high end of the similarity spectrum, while native IDDs are much more diverse. Thus, the similarity between either of the IDD sets is much lower than the similarity between random-immune and random-non-immune sets. This shows clearly that IDDs are not constructed randomly from a pool of disorder promoting amino acids. As a comparison, we performed the same calculations on a set of ordered protein sequences extracted from a representative set of structured domains. As figure 1B illustrates, the ordered domains are also much more different from each other than random sequences are, even when the length distribution and overall composition are held constant. However, the overall similarity (as indicated by the peak in the distribution) is much lower in the disordered set than in the ordered set.


Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.

Teraguchi S, Patil A, Standley DM - BMC Bioinformatics (2010)

Observed frequency of amino acid histogram similarity scores. The similarity score is scaled from 0 to 20 for convenience (i.e., 100% identical histograms would have a score of 20). Native refers to actual protein sequences and random to artificially generated sequences with the same overall amino acid composition and length distribution as native sequences. A) Data are shown for non-immune random, immune random, non-immune IDD, and immune IDD sets. B) Data are shown for random ordered and native ordered sequences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2957690&req=5

Figure 1: Observed frequency of amino acid histogram similarity scores. The similarity score is scaled from 0 to 20 for convenience (i.e., 100% identical histograms would have a score of 20). Native refers to actual protein sequences and random to artificially generated sequences with the same overall amino acid composition and length distribution as native sequences. A) Data are shown for non-immune random, immune random, non-immune IDD, and immune IDD sets. B) Data are shown for random ordered and native ordered sequences.
Mentions: Using the Gaussian-based similarity score we carried out all-against-all comparison of IDDs in the immune and non-immune sets. For each of the IDD sets, we also constructed a randomized sequence set with the identical overall amino acid composition, sequence number and domain length frequency by shuffling the residues in the original native sequence set, as described in Methods. We then constructed a histogram of the similarities within each of the resulting 4 sets by binning the calculated similarity scores into 50 equal-sized windows. As figure 1A illustrates, the random distributions are skewed toward the high end of the similarity spectrum, while native IDDs are much more diverse. Thus, the similarity between either of the IDD sets is much lower than the similarity between random-immune and random-non-immune sets. This shows clearly that IDDs are not constructed randomly from a pool of disorder promoting amino acids. As a comparison, we performed the same calculations on a set of ordered protein sequences extracted from a representative set of structured domains. As figure 1B illustrates, the ordered domains are also much more different from each other than random sequences are, even when the length distribution and overall composition are held constant. However, the overall similarity (as indicated by the peak in the distribution) is much lower in the disordered set than in the ordered set.

Bottom Line: These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions.The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01).This is most likely a result of biophysical restraints that have yet to be elucidated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Host Defense, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan. teraguch@ifrec.osaka-u.ac.jp

ABSTRACT

Background: In order to characterize mammalian intrinsically disordered domains (IDDs) we examined the patterns in their amino acid abundance as well as overrepresented local sequence motifs. We considered IDDs from mouse proteins associated with innate immune responses as well as a set of generic human genes. These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions. IDDs were then clustered by amino acid abundance, and further analyzed in terms of co-occurrence of clusters with functionally characterized Pfam domains.

Results: Overall, IDDs were very different from randomly generated sequences. The deviation from random distributions was at least as great as that for ordered domains, for which the deviation can be rationalized in terms of strong evolutionary pressure for structure and function. The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01). Local sequence motifs that were over-represented in the innate immune set consisted mostly of low complexity fragments, primarily characterized by amino acid repeats, and could not be assigned an obvious functional role.

Conclusions: Our results suggest that IDDs are constrained within a narrow subset of possible sequences. This is most likely a result of biophysical restraints that have yet to be elucidated. More detailed examination of the functional relationship between the IDDs and associated Pfam domains is one possible avenue of investigation.

Show MeSH
Related in: MedlinePlus