Limits...
Defining structural and evolutionary modules in proteins: a community detection approach to explore sub-domain architecture.

Hleap JS, Susko E, Blouin C - BMC Struct. Biol. (2013)

Bottom Line: A domain compartmentalization can be found and described in correlation space.Most attempts made focus on sequence motifs of protein-protein interactions, binding sites, or sequence conservancy.We also described the evolutionary sub-domain architecture of the α-amylase catalytic domain, identifying the already reported minimum functional TIM barrel.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, B3H 4R2, Canada. jshleap@dal.ca.

ABSTRACT

Background: Assessing protein modularity is important to understand protein evolution. Still the question of the existence of a sub-domain modular architecture remains. We propose a graph-theory approach with significance and power testing to identify modules in protein structures. In the first step, clusters are determined by optimizing the partition that maximizes the modularity score. Second, each cluster is tested for significance. Significant clusters are referred to as modules. Evolutionary modules are identified by analyzing homologous structures. Dynamic modules are inferred from sets of snapshots of molecular simulations. We present here a methodology to identify sub-domain architecture robustly, biologically meaningful, and statistically supported.

Results: The robustness of this new method is tested using simulated data with known modularity. Modules are correctly identified even when there is a low correlation between landmarks within a module. We also analyzed the evolutionary modularity of a data set of α-amylase catalytic domain homologs, and the dynamic modularity of the Niemann-Pick C1 (NPC1) protein N-terminal domain.The α-amylase contains an (α/β)8 barrel (TIM barrel) with the polysaccharides cleavage site and a calcium-binding domain. In this data set we identified four robust evolutionary modules, one of which forms the minimal functional TIM barrel topology.The NPC1 protein is involved in the intracellular lipid metabolism coordinating sterol trafficking. NPC1 N-terminus is the first luminal domain which binds to cholesterol and its oxygenated derivatives. Our inferred dynamic modules in the protein NPC1 are also shown to match functional components of the protein related to the NPC1 disease.

Conclusions: A domain compartmentalization can be found and described in correlation space. To our knowledge, there is no other method attempting to identify sub-domain architecture from the correlation among residues. Most attempts made focus on sequence motifs of protein-protein interactions, binding sites, or sequence conservancy. We were able to describe functional/structural sub-domain architecture related to key residues for starch cleavage, calcium, and chloride binding sites in the α-amylase, and sterol opening-defining modules and disease-related residues in the NPC1. We also described the evolutionary sub-domain architecture of the α-amylase catalytic domain, identifying the already reported minimum functional TIM barrel.

Show MeSH

Related in: MedlinePlus

Behavior of the estimated PVP in different sample sizes. PVP values (y axis) against intracorrelations (x axis) for the simulated data. Simulations were run with 85 observations (A), 1000 observations (B), and 5000 observations (C). The star represents the true cluster, the cross represents the grouped singletons, and the rest of the markers represent other singleton clusters recovered as false positives (and therefore low PVP values).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4016585&req=5

Figure 1: Behavior of the estimated PVP in different sample sizes. PVP values (y axis) against intracorrelations (x axis) for the simulated data. Simulations were run with 85 observations (A), 1000 observations (B), and 5000 observations (C). The star represents the true cluster, the cross represents the grouped singletons, and the rest of the markers represent other singleton clusters recovered as false positives (and therefore low PVP values).

Mentions: Here PV PC is the estimated PVP which should be distinguished from the true PVP, that arises when the estimated rij in equation 4 are replaced by the actual ρij. PV PC provides a qualitative information to help interpret the results given the used sample size. Figure1 shows the behavior of the PVP in the intracorrelations evaluated for 85 (Figure1A), 1000 (Figure1B), and 5000 (Figure1C) observations. Even in simulated data, PVP deviates from the possible values of 0.0 and 1.0 when the number of observations is small.


Defining structural and evolutionary modules in proteins: a community detection approach to explore sub-domain architecture.

Hleap JS, Susko E, Blouin C - BMC Struct. Biol. (2013)

Behavior of the estimated PVP in different sample sizes. PVP values (y axis) against intracorrelations (x axis) for the simulated data. Simulations were run with 85 observations (A), 1000 observations (B), and 5000 observations (C). The star represents the true cluster, the cross represents the grouped singletons, and the rest of the markers represent other singleton clusters recovered as false positives (and therefore low PVP values).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4016585&req=5

Figure 1: Behavior of the estimated PVP in different sample sizes. PVP values (y axis) against intracorrelations (x axis) for the simulated data. Simulations were run with 85 observations (A), 1000 observations (B), and 5000 observations (C). The star represents the true cluster, the cross represents the grouped singletons, and the rest of the markers represent other singleton clusters recovered as false positives (and therefore low PVP values).
Mentions: Here PV PC is the estimated PVP which should be distinguished from the true PVP, that arises when the estimated rij in equation 4 are replaced by the actual ρij. PV PC provides a qualitative information to help interpret the results given the used sample size. Figure1 shows the behavior of the PVP in the intracorrelations evaluated for 85 (Figure1A), 1000 (Figure1B), and 5000 (Figure1C) observations. Even in simulated data, PVP deviates from the possible values of 0.0 and 1.0 when the number of observations is small.

Bottom Line: A domain compartmentalization can be found and described in correlation space.Most attempts made focus on sequence motifs of protein-protein interactions, binding sites, or sequence conservancy.We also described the evolutionary sub-domain architecture of the α-amylase catalytic domain, identifying the already reported minimum functional TIM barrel.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, B3H 4R2, Canada. jshleap@dal.ca.

ABSTRACT

Background: Assessing protein modularity is important to understand protein evolution. Still the question of the existence of a sub-domain modular architecture remains. We propose a graph-theory approach with significance and power testing to identify modules in protein structures. In the first step, clusters are determined by optimizing the partition that maximizes the modularity score. Second, each cluster is tested for significance. Significant clusters are referred to as modules. Evolutionary modules are identified by analyzing homologous structures. Dynamic modules are inferred from sets of snapshots of molecular simulations. We present here a methodology to identify sub-domain architecture robustly, biologically meaningful, and statistically supported.

Results: The robustness of this new method is tested using simulated data with known modularity. Modules are correctly identified even when there is a low correlation between landmarks within a module. We also analyzed the evolutionary modularity of a data set of α-amylase catalytic domain homologs, and the dynamic modularity of the Niemann-Pick C1 (NPC1) protein N-terminal domain.The α-amylase contains an (α/β)8 barrel (TIM barrel) with the polysaccharides cleavage site and a calcium-binding domain. In this data set we identified four robust evolutionary modules, one of which forms the minimal functional TIM barrel topology.The NPC1 protein is involved in the intracellular lipid metabolism coordinating sterol trafficking. NPC1 N-terminus is the first luminal domain which binds to cholesterol and its oxygenated derivatives. Our inferred dynamic modules in the protein NPC1 are also shown to match functional components of the protein related to the NPC1 disease.

Conclusions: A domain compartmentalization can be found and described in correlation space. To our knowledge, there is no other method attempting to identify sub-domain architecture from the correlation among residues. Most attempts made focus on sequence motifs of protein-protein interactions, binding sites, or sequence conservancy. We were able to describe functional/structural sub-domain architecture related to key residues for starch cleavage, calcium, and chloride binding sites in the α-amylase, and sterol opening-defining modules and disease-related residues in the NPC1. We also described the evolutionary sub-domain architecture of the α-amylase catalytic domain, identifying the already reported minimum functional TIM barrel.

Show MeSH
Related in: MedlinePlus