Limits...
Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.

Teraguchi S, Patil A, Standley DM - BMC Bioinformatics (2010)

Bottom Line: These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions.The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01).This is most likely a result of biophysical restraints that have yet to be elucidated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Host Defense, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan. teraguch@ifrec.osaka-u.ac.jp

ABSTRACT

Background: In order to characterize mammalian intrinsically disordered domains (IDDs) we examined the patterns in their amino acid abundance as well as overrepresented local sequence motifs. We considered IDDs from mouse proteins associated with innate immune responses as well as a set of generic human genes. These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions. IDDs were then clustered by amino acid abundance, and further analyzed in terms of co-occurrence of clusters with functionally characterized Pfam domains.

Results: Overall, IDDs were very different from randomly generated sequences. The deviation from random distributions was at least as great as that for ordered domains, for which the deviation can be rationalized in terms of strong evolutionary pressure for structure and function. The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01). Local sequence motifs that were over-represented in the innate immune set consisted mostly of low complexity fragments, primarily characterized by amino acid repeats, and could not be assigned an obvious functional role.

Conclusions: Our results suggest that IDDs are constrained within a narrow subset of possible sequences. This is most likely a result of biophysical restraints that have yet to be elucidated. More detailed examination of the functional relationship between the IDDs and associated Pfam domains is one possible avenue of investigation.

Show MeSH

Related in: MedlinePlus

Sequence identity within ordered folds. The figure was constructed by picking 10 query domains at random, and calculating the sequence identity of all similar folds to the query as returned by the SeSAW structural alignment server [18].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2957690&req=5

Figure 6: Sequence identity within ordered folds. The figure was constructed by picking 10 query domains at random, and calculating the sequence identity of all similar folds to the query as returned by the SeSAW structural alignment server [18].

Mentions: In this study we carried out comparison of IDDs at both the overall amino acid composition level and at the local sequence motif level. These two levels of comparison span a wide range and yet we observe similar trends in both extremes. Namely, individual IDD sequences are very different from artificially constructed sequences picked naively. This, in turn, might imply that there is strong selective pressure on IDDs, just as there is strong pressure on ordered domains; however, direct evidence for this interpretation is beyond the scope of the current study. In the case of ordered domains we can understand such pressure in terms of the structural and functional requirements. The resulting distribution of ordered protein sequences is a trade-off between genetic drift, which tends toward randomization, and biochemical function, which tends to limit the observed amino acid sequences to a small subset of the possible random combinations. If we examine the distribution of sequence identities within a given fold, for example, we usually see two peaks (figure 6). One small peak is near 100% and contains the close family members. The other peak is broader and covers the "twilight zone" region from 0-30%. It is thus not unreasonable to hypothesize that a similar trade-off occurs for IDDs, and that the pressure in this case is due to the need for IDDs to be metastable, only becoming ordered upon binding a target protein. Understanding the exact role of specific IDDs will help to refine the interpretation of their compositional diversity.


Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.

Teraguchi S, Patil A, Standley DM - BMC Bioinformatics (2010)

Sequence identity within ordered folds. The figure was constructed by picking 10 query domains at random, and calculating the sequence identity of all similar folds to the query as returned by the SeSAW structural alignment server [18].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2957690&req=5

Figure 6: Sequence identity within ordered folds. The figure was constructed by picking 10 query domains at random, and calculating the sequence identity of all similar folds to the query as returned by the SeSAW structural alignment server [18].
Mentions: In this study we carried out comparison of IDDs at both the overall amino acid composition level and at the local sequence motif level. These two levels of comparison span a wide range and yet we observe similar trends in both extremes. Namely, individual IDD sequences are very different from artificially constructed sequences picked naively. This, in turn, might imply that there is strong selective pressure on IDDs, just as there is strong pressure on ordered domains; however, direct evidence for this interpretation is beyond the scope of the current study. In the case of ordered domains we can understand such pressure in terms of the structural and functional requirements. The resulting distribution of ordered protein sequences is a trade-off between genetic drift, which tends toward randomization, and biochemical function, which tends to limit the observed amino acid sequences to a small subset of the possible random combinations. If we examine the distribution of sequence identities within a given fold, for example, we usually see two peaks (figure 6). One small peak is near 100% and contains the close family members. The other peak is broader and covers the "twilight zone" region from 0-30%. It is thus not unreasonable to hypothesize that a similar trade-off occurs for IDDs, and that the pressure in this case is due to the need for IDDs to be metastable, only becoming ordered upon binding a target protein. Understanding the exact role of specific IDDs will help to refine the interpretation of their compositional diversity.

Bottom Line: These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions.The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01).This is most likely a result of biophysical restraints that have yet to be elucidated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Host Defense, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan. teraguch@ifrec.osaka-u.ac.jp

ABSTRACT

Background: In order to characterize mammalian intrinsically disordered domains (IDDs) we examined the patterns in their amino acid abundance as well as overrepresented local sequence motifs. We considered IDDs from mouse proteins associated with innate immune responses as well as a set of generic human genes. These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions. IDDs were then clustered by amino acid abundance, and further analyzed in terms of co-occurrence of clusters with functionally characterized Pfam domains.

Results: Overall, IDDs were very different from randomly generated sequences. The deviation from random distributions was at least as great as that for ordered domains, for which the deviation can be rationalized in terms of strong evolutionary pressure for structure and function. The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01). Local sequence motifs that were over-represented in the innate immune set consisted mostly of low complexity fragments, primarily characterized by amino acid repeats, and could not be assigned an obvious functional role.

Conclusions: Our results suggest that IDDs are constrained within a narrow subset of possible sequences. This is most likely a result of biophysical restraints that have yet to be elucidated. More detailed examination of the functional relationship between the IDDs and associated Pfam domains is one possible avenue of investigation.

Show MeSH
Related in: MedlinePlus