Limits...
Decomposing the space of protein quaternary structures with the interface fragment pair library.

Xie ZR, Chen J, Zhao Y, Wu Y - BMC Bioinformatics (2015)

Bottom Line: After structural-based clustering, we found that more than 90% of these interface fragment pairs can be represented by a limited number of highly abundant motifs.Our study therefore presents supportive evidences that the space of protein quaternary structures can be represented by the combination of a small set of secondary-structure-based packing at binding interfaces.Finally, after future improvements such as adding sequence profiles, we expect this new library will be useful to predict structures of unknown protein-protein interactions.

View Article: PubMed Central - PubMed

Affiliation: Department of Systems and Computational Biology, Albert Einstein College of Medicine of Yeshiva University, 1300 Morris Park Avenue, Bronx, NY, 10461, USA. Zhong-Ru.Xie@einstein.yu.edu.

ABSTRACT

Background: The physical interactions between proteins constitute the basis of protein quaternary structures. They dominate many biological processes in living cells. Deciphering the structural features of interacting proteins is essential to understand their cellular functions. Similar to the space of protein tertiary structures in which discrete patterns are clearly observed on fold or sub-fold motif levels, it has been found that the space of protein quaternary structures is highly degenerate due to the packing of compact secondary structure elements at interfaces. Therefore, it is necessary to further decompose the protein quaternary structural space into a more local representation.

Results: Here we constructed an interface fragment pair library from the current structure database of protein complexes. After structural-based clustering, we found that more than 90% of these interface fragment pairs can be represented by a limited number of highly abundant motifs. These motifs were further used to guide complex assembly. A large-scale benchmark test shows that the native-like binding is highly likely in the structural ensemble of modeled protein complexes that were built through the library.

Conclusions: Our study therefore presents supportive evidences that the space of protein quaternary structures can be represented by the combination of a small set of secondary-structure-based packing at binding interfaces. Finally, after future improvements such as adding sequence profiles, we expect this new library will be useful to predict structures of unknown protein-protein interactions.

Show MeSH

Related in: MedlinePlus

We clustered all the 153127 interface fragment pairs in the 3did database using different values of RMSD cutoff. The derived numbers of clusters under different RMSD values are plotted in (a). Based on these statistical results, a cutoff value of 4 Angstrom was chosen in the following studies. This results in a totle number of 2135 clusters. We counted the number of fragment pairs in these 2135 clusters and ranked them in decreasing order (b). We show that fragment pairs are not uniformly distributed in all clusters. A small number of clusters are highly abundant. As a result, only clusters with more than 20 members were considered, leading to the library of 459 highly representative entries. These 459 clusters cover more than 90% interface fragment pairs from the whole database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4384354&req=5

Fig3: We clustered all the 153127 interface fragment pairs in the 3did database using different values of RMSD cutoff. The derived numbers of clusters under different RMSD values are plotted in (a). Based on these statistical results, a cutoff value of 4 Angstrom was chosen in the following studies. This results in a totle number of 2135 clusters. We counted the number of fragment pairs in these 2135 clusters and ranked them in decreasing order (b). We show that fragment pairs are not uniformly distributed in all clusters. A small number of clusters are highly abundant. As a result, only clusters with more than 20 members were considered, leading to the library of 459 highly representative entries. These 459 clusters cover more than 90% interface fragment pairs from the whole database.

Mentions: We collected 153127 interface fragment pairs from 3did, a database of interacting protein domains. These interface fragment pairs belong to 4960 interacting domains. Each interacting domain is under one specific ID of 3did. The interactions are either formed as homo-dimer or hetero-dimer by different proteins, or formed by different domains in a same protein. The 153127 pairs were classified based on their structural similarity. The criteria of structural similarity are based on calculating the RMSD of Cα atoms between the two comparing fragment pairs. In this study, the cutoff value of RMSD was given empirically. Consequently, the results of classification depended on the determination of this RMSD cutoff. In order to systematically test the RMSD dependence of clustering results, we changed the cutoff values from 1Ǻ to 50Ǻ. For each cutoff value, clustering was carried out over all the 153127 pairs in the database. Figure 3a gives the derived number of clusters under different value of RMSD cutoff. The plot shows that the total number of clusters decreases fast when the cutoff value becomes larger. When the cutoff equals to 1Ǻ, there are 153127 clusters, indicating that no more than one interface fragment pairs can be clustered into the same group. This gives the minimal resolution for the structural difference between interface fragment pairs. When the cutoff equals to 4Ǻ, the cluster number reduces to 2135. Finally, when the cutoff increases to 50Ǻ, there is only one cluster. This indicates that all pairs were clustered into the same group, suggesting that clustering will lose sensitivity under large cutoff value. Based on these statistical results, a cutoff value of 4Ǻ was chosen in the following studies.Figure 3


Decomposing the space of protein quaternary structures with the interface fragment pair library.

Xie ZR, Chen J, Zhao Y, Wu Y - BMC Bioinformatics (2015)

We clustered all the 153127 interface fragment pairs in the 3did database using different values of RMSD cutoff. The derived numbers of clusters under different RMSD values are plotted in (a). Based on these statistical results, a cutoff value of 4 Angstrom was chosen in the following studies. This results in a totle number of 2135 clusters. We counted the number of fragment pairs in these 2135 clusters and ranked them in decreasing order (b). We show that fragment pairs are not uniformly distributed in all clusters. A small number of clusters are highly abundant. As a result, only clusters with more than 20 members were considered, leading to the library of 459 highly representative entries. These 459 clusters cover more than 90% interface fragment pairs from the whole database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4384354&req=5

Fig3: We clustered all the 153127 interface fragment pairs in the 3did database using different values of RMSD cutoff. The derived numbers of clusters under different RMSD values are plotted in (a). Based on these statistical results, a cutoff value of 4 Angstrom was chosen in the following studies. This results in a totle number of 2135 clusters. We counted the number of fragment pairs in these 2135 clusters and ranked them in decreasing order (b). We show that fragment pairs are not uniformly distributed in all clusters. A small number of clusters are highly abundant. As a result, only clusters with more than 20 members were considered, leading to the library of 459 highly representative entries. These 459 clusters cover more than 90% interface fragment pairs from the whole database.
Mentions: We collected 153127 interface fragment pairs from 3did, a database of interacting protein domains. These interface fragment pairs belong to 4960 interacting domains. Each interacting domain is under one specific ID of 3did. The interactions are either formed as homo-dimer or hetero-dimer by different proteins, or formed by different domains in a same protein. The 153127 pairs were classified based on their structural similarity. The criteria of structural similarity are based on calculating the RMSD of Cα atoms between the two comparing fragment pairs. In this study, the cutoff value of RMSD was given empirically. Consequently, the results of classification depended on the determination of this RMSD cutoff. In order to systematically test the RMSD dependence of clustering results, we changed the cutoff values from 1Ǻ to 50Ǻ. For each cutoff value, clustering was carried out over all the 153127 pairs in the database. Figure 3a gives the derived number of clusters under different value of RMSD cutoff. The plot shows that the total number of clusters decreases fast when the cutoff value becomes larger. When the cutoff equals to 1Ǻ, there are 153127 clusters, indicating that no more than one interface fragment pairs can be clustered into the same group. This gives the minimal resolution for the structural difference between interface fragment pairs. When the cutoff equals to 4Ǻ, the cluster number reduces to 2135. Finally, when the cutoff increases to 50Ǻ, there is only one cluster. This indicates that all pairs were clustered into the same group, suggesting that clustering will lose sensitivity under large cutoff value. Based on these statistical results, a cutoff value of 4Ǻ was chosen in the following studies.Figure 3

Bottom Line: After structural-based clustering, we found that more than 90% of these interface fragment pairs can be represented by a limited number of highly abundant motifs.Our study therefore presents supportive evidences that the space of protein quaternary structures can be represented by the combination of a small set of secondary-structure-based packing at binding interfaces.Finally, after future improvements such as adding sequence profiles, we expect this new library will be useful to predict structures of unknown protein-protein interactions.

View Article: PubMed Central - PubMed

Affiliation: Department of Systems and Computational Biology, Albert Einstein College of Medicine of Yeshiva University, 1300 Morris Park Avenue, Bronx, NY, 10461, USA. Zhong-Ru.Xie@einstein.yu.edu.

ABSTRACT

Background: The physical interactions between proteins constitute the basis of protein quaternary structures. They dominate many biological processes in living cells. Deciphering the structural features of interacting proteins is essential to understand their cellular functions. Similar to the space of protein tertiary structures in which discrete patterns are clearly observed on fold or sub-fold motif levels, it has been found that the space of protein quaternary structures is highly degenerate due to the packing of compact secondary structure elements at interfaces. Therefore, it is necessary to further decompose the protein quaternary structural space into a more local representation.

Results: Here we constructed an interface fragment pair library from the current structure database of protein complexes. After structural-based clustering, we found that more than 90% of these interface fragment pairs can be represented by a limited number of highly abundant motifs. These motifs were further used to guide complex assembly. A large-scale benchmark test shows that the native-like binding is highly likely in the structural ensemble of modeled protein complexes that were built through the library.

Conclusions: Our study therefore presents supportive evidences that the space of protein quaternary structures can be represented by the combination of a small set of secondary-structure-based packing at binding interfaces. Finally, after future improvements such as adding sequence profiles, we expect this new library will be useful to predict structures of unknown protein-protein interactions.

Show MeSH
Related in: MedlinePlus