Limits...
Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.

Teraguchi S, Patil A, Standley DM - BMC Bioinformatics (2010)

Bottom Line: These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions.The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01).This is most likely a result of biophysical restraints that have yet to be elucidated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Host Defense, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan. teraguch@ifrec.osaka-u.ac.jp

ABSTRACT

Background: In order to characterize mammalian intrinsically disordered domains (IDDs) we examined the patterns in their amino acid abundance as well as overrepresented local sequence motifs. We considered IDDs from mouse proteins associated with innate immune responses as well as a set of generic human genes. These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions. IDDs were then clustered by amino acid abundance, and further analyzed in terms of co-occurrence of clusters with functionally characterized Pfam domains.

Results: Overall, IDDs were very different from randomly generated sequences. The deviation from random distributions was at least as great as that for ordered domains, for which the deviation can be rationalized in terms of strong evolutionary pressure for structure and function. The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01). Local sequence motifs that were over-represented in the innate immune set consisted mostly of low complexity fragments, primarily characterized by amino acid repeats, and could not be assigned an obvious functional role.

Conclusions: Our results suggest that IDDs are constrained within a narrow subset of possible sequences. This is most likely a result of biophysical restraints that have yet to be elucidated. More detailed examination of the functional relationship between the IDDs and associated Pfam domains is one possible avenue of investigation.

Show MeSH

Related in: MedlinePlus

Sequence motif analysis. All possible fragments of length 2-5 were enumerated and their observed and expected frequencies were computed. The x-axis represents the natural log of the ratio of the observed to the expected frequency. The y-axis is the histogram of these values in 6 different sets: Immune IDDs, non-immune IDDs, ordered domains, random immune IDDs, random non-immune IDDs, and random ordered domains. Panels A-D illustrate motifs of length 2-5, respectively. Random sequences produced zero counts for the motifs of length 5.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2957690&req=5

Figure 3: Sequence motif analysis. All possible fragments of length 2-5 were enumerated and their observed and expected frequencies were computed. The x-axis represents the natural log of the ratio of the observed to the expected frequency. The y-axis is the histogram of these values in 6 different sets: Immune IDDs, non-immune IDDs, ordered domains, random immune IDDs, random non-immune IDDs, and random ordered domains. Panels A-D illustrate motifs of length 2-5, respectively. Random sequences produced zero counts for the motifs of length 5.

Mentions: Since fragments with very rare occurrence could not be interpreted statistically, we discarded any motifs with less than 10 counts. Figure 3 displays the histogram of the distribution of the natural log of the ratios. It shows that the deviations of observed frequencies for IDDs from the expected values are larger than that for randomly generated sets, indicating that IDDs tend to have some particular motifs. In principle, the center of the distribution is 0, where the observed frequency equals the expectation value. This is actually the case for doublet and triplet fragments where almost every sequence motif is observed a number of times. However, for the quartet and quintet, the center is shifted to the right, most likely due to the fact that motifs with less than 10 counts were discarded. For all the random sets, there was no Quintet with observed frequency greater than 10. (Note that this simple model does not distinguish between multiple motifs found in the same protein sequence and those found in different sequences.)


Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.

Teraguchi S, Patil A, Standley DM - BMC Bioinformatics (2010)

Sequence motif analysis. All possible fragments of length 2-5 were enumerated and their observed and expected frequencies were computed. The x-axis represents the natural log of the ratio of the observed to the expected frequency. The y-axis is the histogram of these values in 6 different sets: Immune IDDs, non-immune IDDs, ordered domains, random immune IDDs, random non-immune IDDs, and random ordered domains. Panels A-D illustrate motifs of length 2-5, respectively. Random sequences produced zero counts for the motifs of length 5.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2957690&req=5

Figure 3: Sequence motif analysis. All possible fragments of length 2-5 were enumerated and their observed and expected frequencies were computed. The x-axis represents the natural log of the ratio of the observed to the expected frequency. The y-axis is the histogram of these values in 6 different sets: Immune IDDs, non-immune IDDs, ordered domains, random immune IDDs, random non-immune IDDs, and random ordered domains. Panels A-D illustrate motifs of length 2-5, respectively. Random sequences produced zero counts for the motifs of length 5.
Mentions: Since fragments with very rare occurrence could not be interpreted statistically, we discarded any motifs with less than 10 counts. Figure 3 displays the histogram of the distribution of the natural log of the ratios. It shows that the deviations of observed frequencies for IDDs from the expected values are larger than that for randomly generated sets, indicating that IDDs tend to have some particular motifs. In principle, the center of the distribution is 0, where the observed frequency equals the expectation value. This is actually the case for doublet and triplet fragments where almost every sequence motif is observed a number of times. However, for the quartet and quintet, the center is shifted to the right, most likely due to the fact that motifs with less than 10 counts were discarded. For all the random sets, there was no Quintet with observed frequency greater than 10. (Note that this simple model does not distinguish between multiple motifs found in the same protein sequence and those found in different sequences.)

Bottom Line: These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions.The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01).This is most likely a result of biophysical restraints that have yet to be elucidated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Host Defense, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan. teraguch@ifrec.osaka-u.ac.jp

ABSTRACT

Background: In order to characterize mammalian intrinsically disordered domains (IDDs) we examined the patterns in their amino acid abundance as well as overrepresented local sequence motifs. We considered IDDs from mouse proteins associated with innate immune responses as well as a set of generic human genes. These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions. IDDs were then clustered by amino acid abundance, and further analyzed in terms of co-occurrence of clusters with functionally characterized Pfam domains.

Results: Overall, IDDs were very different from randomly generated sequences. The deviation from random distributions was at least as great as that for ordered domains, for which the deviation can be rationalized in terms of strong evolutionary pressure for structure and function. The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01). Local sequence motifs that were over-represented in the innate immune set consisted mostly of low complexity fragments, primarily characterized by amino acid repeats, and could not be assigned an obvious functional role.

Conclusions: Our results suggest that IDDs are constrained within a narrow subset of possible sequences. This is most likely a result of biophysical restraints that have yet to be elucidated. More detailed examination of the functional relationship between the IDDs and associated Pfam domains is one possible avenue of investigation.

Show MeSH
Related in: MedlinePlus