Unmasking determinants of specificity in the human kinome.
Here, we systematically discover several DoS and experimentally validate three of them, named the αC1, αC3, and APE-7 residues.We demonstrate that DoS form sparse networks of non-conserved residues spanning distant regions.Our results reveal a likely role for inter-residue allostery in specificity and an evolutionary decoupling of kinase activity and specificity, which appear loaded on independent groups of residues.
Affiliation: Department of Systems Biology, Technical University of Denmark, 2800 Lyngby, Denmark. Electronic address: firstname.lastname@example.org.
- Protein Kinases/chemistry*/metabolism*
- Computational Biology
- Models, Molecular
- Substrate Specificity
- src Homology Domains
© Copyright Policy
- CC BY
figs3: Related to Figure 3(A and B) The DoS Identified by KINspect Cannot Be Explained Simply By Conservation (A) or Alignment Gaps (B). In order to refute the possibility that we are simply identifying the most conserved positions in our alignment or that gaps in our alignment bias substantially our results, we plotted alignment column entropy (A) and the gap fraction (B) versus the position score. These results confirm that neither conservation (i.e., lower entropy) nor alignment gaps would directly explain our findings, thereby demonstrating the robustness of our method to such potential artifacts.(C and D) Randomized or Uniform Versions of Our Sequence-Specificity Sets Do Not Result In Optimized Convergent Results. In order to confirm that our results are not a result of intrinsic properties of our method or somehow uncoupled from our data, we produced two control set; one with all specificity profiles set to the same uniform matrix (Uniform set) and a second one, where the linkage kinase-specificity profile was randomized (Randomized set). Neither of these two control sets leads to an optimization process (i.e., decrease in fitness landscape terms) similar to the one observed for the actual KINspect set, represented in blue. Note that the uniform set does not effectively represent a predictive challenge for the method, which explains why the fitness remains at 0.0 for all iterations. In addition to this marked decrease in optimization potential, unlike in the actual set, the two control sets do not lead to convergent masks either (i.e., the dissimilarity between the masks is kept high along the optimization process), as observed in (D).(E) Enrichment in Previously Reported DoS. In order to investigate whether the specificity mask identified previously described determinants of specificity, we curated from the literature a number of determinants identified using different means and in different species (Table S1). Next, we compared the KINspect score obtained by this group of previously described determinants (top) as well as all other residues (bottom) at the beginning of the evaluation run (left, before optimization) and after KINspect was optimized (right). Marked different distribution trajectories can be observed between both groups, with most residues tending toward zero at the bottom, while a much larger fraction of residues previously identified as determinants score higher at the top, illustrating an enrichment of previously reported DoS (Fisher’s exact test one-sided, p = 8.4x10−7). Interestingly, several additional DoS were identified by KINspect (bottom right) and some of the reported DoS did not obtain a high KINspect score (some of which had been reported in non-human species).(F) Comparison to Previous Methods. Whereas a global comparison to previous methods would be unfeasible due to the highly limited coverage of human kinases that previous methods utilized, we were able to employ the gold standard set used in the DREAM challenge on peptide specificity (Ellis and Kobe, 2011). While KINspect performed similarly poorly on CaMKK2 (a kinase with very distinct specificity), we could confirm in this limited test set that KINspect outperforms previous structure-based methods (Brinkworth et al., 2003, Ellis and Kobe, 2011) in its ability to predict PSSMs that are close to the experimentally observed ones (p = 2.20x10−47, 4.58x10−27 and 3.08x10−04 for MELK, BIKE and CaMKK2, respectively).
Since our method contains stochastic aspects (such as the starting set of random masks and the generation of new masks by mutation and cross-over), one initial question that must be addressed is whether the method is robust to this initial stochasticity, i.e., whether one would obtain similar results if the process was started with arbitrary initial conditions and evaluated independently several times. To this end, we compared the fitness evolution of ten independent KINspect evaluations and found highly comparable fitness trajectories, as well as increasing similarity between the best-performing masks at each generation (Figure S2; Data S1, S2, and S3). Moreover, we confirmed that the results are not simply due to trivial technical factors, such as residue conservation or alignment gaps (Figure S3), and that similar results could not be obtained using uniform or randomized sets (Figure S3). Taken together, these results demonstrate that KINspect is robust to arbitrary initial conditions and converges to a limited set of highly similar solutions (specificity masks, Figure S3).