Limits...
Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.

Teraguchi S, Patil A, Standley DM - BMC Bioinformatics (2010)

Bottom Line: These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions.The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01).This is most likely a result of biophysical restraints that have yet to be elucidated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Host Defense, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan. teraguch@ifrec.osaka-u.ac.jp

ABSTRACT

Background: In order to characterize mammalian intrinsically disordered domains (IDDs) we examined the patterns in their amino acid abundance as well as overrepresented local sequence motifs. We considered IDDs from mouse proteins associated with innate immune responses as well as a set of generic human genes. These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions. IDDs were then clustered by amino acid abundance, and further analyzed in terms of co-occurrence of clusters with functionally characterized Pfam domains.

Results: Overall, IDDs were very different from randomly generated sequences. The deviation from random distributions was at least as great as that for ordered domains, for which the deviation can be rationalized in terms of strong evolutionary pressure for structure and function. The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01). Local sequence motifs that were over-represented in the innate immune set consisted mostly of low complexity fragments, primarily characterized by amino acid repeats, and could not be assigned an obvious functional role.

Conclusions: Our results suggest that IDDs are constrained within a narrow subset of possible sequences. This is most likely a result of biophysical restraints that have yet to be elucidated. More detailed examination of the functional relationship between the IDDs and associated Pfam domains is one possible avenue of investigation.

Show MeSH

Related in: MedlinePlus

Background distribution of Pfam domain co-occurrence. Instead of using the Gaussian similarity score to match clusters in the innate immune set and the generic disordered set, we inserted a random matching function. The resulting distribution clearly indicates that the number of co-occurring Pfam domains identified by Gaussian similarity is highly significant.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2957690&req=5

Figure 5: Background distribution of Pfam domain co-occurrence. Instead of using the Gaussian similarity score to match clusters in the innate immune set and the generic disordered set, we inserted a random matching function. The resulting distribution clearly indicates that the number of co-occurring Pfam domains identified by Gaussian similarity is highly significant.

Mentions: In order to estimate the significance of the observed number of shared Pfam domains identified by the histogram similarity score, we replaced the maximization of the Gaussian similarity score (used to match clusters in the innate immune set with clusters in the Non-immune set) with a random pairing of clusters. There are many possible combinations of pairs, so we repeated the random pairing a total of 9000 times and obtained a background distribution of Pfam domain co-occurrence (figure 5). The maximum value in this exercise was 47, corresponding to a p-value of 0.01, based on direct integration of the frequency distribution. Therefore, we can say with a high degree of confidence that the co-occurrence of 51 Pfam domains is not due to chance (with a p-value << 0.01), and thus there is a bias for similar IDDs to be associated with specific Pfam domains.


Intrinsically disordered domains deviate significantly from random sequences in mammalian proteins.

Teraguchi S, Patil A, Standley DM - BMC Bioinformatics (2010)

Background distribution of Pfam domain co-occurrence. Instead of using the Gaussian similarity score to match clusters in the innate immune set and the generic disordered set, we inserted a random matching function. The resulting distribution clearly indicates that the number of co-occurring Pfam domains identified by Gaussian similarity is highly significant.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2957690&req=5

Figure 5: Background distribution of Pfam domain co-occurrence. Instead of using the Gaussian similarity score to match clusters in the innate immune set and the generic disordered set, we inserted a random matching function. The resulting distribution clearly indicates that the number of co-occurring Pfam domains identified by Gaussian similarity is highly significant.
Mentions: In order to estimate the significance of the observed number of shared Pfam domains identified by the histogram similarity score, we replaced the maximization of the Gaussian similarity score (used to match clusters in the innate immune set with clusters in the Non-immune set) with a random pairing of clusters. There are many possible combinations of pairs, so we repeated the random pairing a total of 9000 times and obtained a background distribution of Pfam domain co-occurrence (figure 5). The maximum value in this exercise was 47, corresponding to a p-value of 0.01, based on direct integration of the frequency distribution. Therefore, we can say with a high degree of confidence that the co-occurrence of 51 Pfam domains is not due to chance (with a p-value << 0.01), and thus there is a bias for similar IDDs to be associated with specific Pfam domains.

Bottom Line: These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions.The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01).This is most likely a result of biophysical restraints that have yet to be elucidated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Host Defense, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan. teraguch@ifrec.osaka-u.ac.jp

ABSTRACT

Background: In order to characterize mammalian intrinsically disordered domains (IDDs) we examined the patterns in their amino acid abundance as well as overrepresented local sequence motifs. We considered IDDs from mouse proteins associated with innate immune responses as well as a set of generic human genes. These sets were compared with artificially generated random sequences with the same overall amino acid abundance and length distributions. IDDs were then clustered by amino acid abundance, and further analyzed in terms of co-occurrence of clusters with functionally characterized Pfam domains.

Results: Overall, IDDs were very different from randomly generated sequences. The deviation from random distributions was at least as great as that for ordered domains, for which the deviation can be rationalized in terms of strong evolutionary pressure for structure and function. The co-occurrence of certain Pfam domains with specific IDD clusters was found to be significant (p-value < 0.01). Local sequence motifs that were over-represented in the innate immune set consisted mostly of low complexity fragments, primarily characterized by amino acid repeats, and could not be assigned an obvious functional role.

Conclusions: Our results suggest that IDDs are constrained within a narrow subset of possible sequences. This is most likely a result of biophysical restraints that have yet to be elucidated. More detailed examination of the functional relationship between the IDDs and associated Pfam domains is one possible avenue of investigation.

Show MeSH
Related in: MedlinePlus