Limits...
Novel Bayesian classification models for predicting compounds blocking hERG potassium channels.

Liu LL, Lu J, Lu Y, Zheng MY, Luo XM, Zhu WL, Jiang HL, Chen KX - Acta Pharmacol. Sin. (2014)

Bottom Line: The models were internally validated with the training set of compounds, and then applied to the test set for validation.Doddareddy's experimentally validated dataset with 60 compounds was used for external test set validation.A Bayesian classification model considering the effects of four molecular properties (Mw, PPSA, ALogP and pKa_basic) as well as extended-connectivity fingerprints (ECFP_14) exhibited a global accuracy (91%), parameter sensitivity (90%) and specificity (92%) in the test set validation, and a global accuracy (58%), parameter sensitivity (61%) and specificity (57%) in the external test set validation.

View Article: PubMed Central - PubMed

Affiliation: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China.

ABSTRACT

Aim: A large number of drug-induced long QT syndromes are ascribed to blockage of hERG potassium channels. The aim of this study was to construct novel computational models to predict compounds blocking hERG channels.

Methods: Doddareddy's hERG blockage data containing 2644 compounds were used, which divided into training (2389) and test (255) sets. Laplacian-corrected Bayesian classification models were constructed using Discovery Studio. The models were internally validated with the training set of compounds, and then applied to the test set for validation. Doddareddy's experimentally validated dataset with 60 compounds was used for external test set validation.

Results: A Bayesian classification model considering the effects of four molecular properties (Mw, PPSA, ALogP and pKa_basic) as well as extended-connectivity fingerprints (ECFP_14) exhibited a global accuracy (91%), parameter sensitivity (90%) and specificity (92%) in the test set validation, and a global accuracy (58%), parameter sensitivity (61%) and specificity (57%) in the external test set validation.

Conclusion: The novel model is better than those in the literatures for predicting compounds blocking hERG channels, and can be used for large-scale prediction.

Show MeSH

Related in: MedlinePlus

The good features (A) generated in the model. For the annotation of each fragment, G (good features) are bin IDs, the string of numbers show the bin value defining the fingerprints feature, the second line shows the frequencies this feature occurred in “good” samples and in the entire data set. The Bayesian score shows the final contribution of the feature to the model prediction.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4125710&req=5

fig4A: The good features (A) generated in the model. For the annotation of each fragment, G (good features) are bin IDs, the string of numbers show the bin value defining the fingerprints feature, the second line shows the frequencies this feature occurred in “good” samples and in the entire data set. The Bayesian score shows the final contribution of the feature to the model prediction.

Mentions: During Laplacian-modified Bayesian modeling, the following statistics for each feature were calculated: the number of molecules in which the feature occurred, the number of those molecules that were blockers, and the measure of how different this is from the hit rate as a whole (the ratio that would be expected if the feature was occurring randomly across the blockers and nonblockers). The Bayesian score represents the final contribution of a feature to the model prediction, which takes into account the total number of occurrences of the feature, ensuring more weight is placed on features that occur more often and little weight on those for which there are very few occurrences. The top 20 fingerprint features that made the most positive contribution to the model and the top 20 that made the most negative contribution were generated as good and bad features. These fingerprint features are shown in Figure 4. The top 10 good features have a Bayesian score of 0.636, which matches to all 37 blockers (Figure 4A). These molecules are a series of 1,2,4-triazol-3-yl-thiopropyl-tetrapydrobenzazepines, which are potent and selective dopamine D3 receptor antagonists. It has been reported that D3 receptor antagonists tend to exhibit hERG toxicity53. The top 11–20 of the good features are from sertindole analogues, Wombat-PK database molecules and h5-HT2A receptor antagonists, which are mostly strong hERG blockers48,54,55. The good features can be used as structural alerts for hERG toxicity. The top 20 bad fingerprint features are listed in Figure 4B: all of these features only occurred in the nonblockers. From the prediction results we can see that the prediction for the nonblockers has been improved significantly when fingerprints were added into the descriptors: this may play important role in identifying nonblockers. For example, a series of sertindole analogues were included in our data set50, in which the most inactive analogue (compound 5) contains the fragment B13 (Figure 5B), which occurred 104 times in the whole data set but only 2 times in the blockers. As seen from the Figure 6, the two molecules (compound 4, 5) are different from each other only at one group, but the hERG channel IC50 values show a 100-fold difference. Thus, it can be hypothesized that the existence of the B13 fragment regulates the molecule to avoid the binding of the hERG channel. These fragments may affect the hERG binding affinity of compounds by changing their entire molecular shape and other physicochemical properties. Combined with the good features, the bad features may act as regulatory factors for the structural alerts listed in Figure 4A, which increases the overall accuracy of hERG toxicity prediction.


Novel Bayesian classification models for predicting compounds blocking hERG potassium channels.

Liu LL, Lu J, Lu Y, Zheng MY, Luo XM, Zhu WL, Jiang HL, Chen KX - Acta Pharmacol. Sin. (2014)

The good features (A) generated in the model. For the annotation of each fragment, G (good features) are bin IDs, the string of numbers show the bin value defining the fingerprints feature, the second line shows the frequencies this feature occurred in “good” samples and in the entire data set. The Bayesian score shows the final contribution of the feature to the model prediction.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4125710&req=5

fig4A: The good features (A) generated in the model. For the annotation of each fragment, G (good features) are bin IDs, the string of numbers show the bin value defining the fingerprints feature, the second line shows the frequencies this feature occurred in “good” samples and in the entire data set. The Bayesian score shows the final contribution of the feature to the model prediction.
Mentions: During Laplacian-modified Bayesian modeling, the following statistics for each feature were calculated: the number of molecules in which the feature occurred, the number of those molecules that were blockers, and the measure of how different this is from the hit rate as a whole (the ratio that would be expected if the feature was occurring randomly across the blockers and nonblockers). The Bayesian score represents the final contribution of a feature to the model prediction, which takes into account the total number of occurrences of the feature, ensuring more weight is placed on features that occur more often and little weight on those for which there are very few occurrences. The top 20 fingerprint features that made the most positive contribution to the model and the top 20 that made the most negative contribution were generated as good and bad features. These fingerprint features are shown in Figure 4. The top 10 good features have a Bayesian score of 0.636, which matches to all 37 blockers (Figure 4A). These molecules are a series of 1,2,4-triazol-3-yl-thiopropyl-tetrapydrobenzazepines, which are potent and selective dopamine D3 receptor antagonists. It has been reported that D3 receptor antagonists tend to exhibit hERG toxicity53. The top 11–20 of the good features are from sertindole analogues, Wombat-PK database molecules and h5-HT2A receptor antagonists, which are mostly strong hERG blockers48,54,55. The good features can be used as structural alerts for hERG toxicity. The top 20 bad fingerprint features are listed in Figure 4B: all of these features only occurred in the nonblockers. From the prediction results we can see that the prediction for the nonblockers has been improved significantly when fingerprints were added into the descriptors: this may play important role in identifying nonblockers. For example, a series of sertindole analogues were included in our data set50, in which the most inactive analogue (compound 5) contains the fragment B13 (Figure 5B), which occurred 104 times in the whole data set but only 2 times in the blockers. As seen from the Figure 6, the two molecules (compound 4, 5) are different from each other only at one group, but the hERG channel IC50 values show a 100-fold difference. Thus, it can be hypothesized that the existence of the B13 fragment regulates the molecule to avoid the binding of the hERG channel. These fragments may affect the hERG binding affinity of compounds by changing their entire molecular shape and other physicochemical properties. Combined with the good features, the bad features may act as regulatory factors for the structural alerts listed in Figure 4A, which increases the overall accuracy of hERG toxicity prediction.

Bottom Line: The models were internally validated with the training set of compounds, and then applied to the test set for validation.Doddareddy's experimentally validated dataset with 60 compounds was used for external test set validation.A Bayesian classification model considering the effects of four molecular properties (Mw, PPSA, ALogP and pKa_basic) as well as extended-connectivity fingerprints (ECFP_14) exhibited a global accuracy (91%), parameter sensitivity (90%) and specificity (92%) in the test set validation, and a global accuracy (58%), parameter sensitivity (61%) and specificity (57%) in the external test set validation.

View Article: PubMed Central - PubMed

Affiliation: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China.

ABSTRACT

Aim: A large number of drug-induced long QT syndromes are ascribed to blockage of hERG potassium channels. The aim of this study was to construct novel computational models to predict compounds blocking hERG channels.

Methods: Doddareddy's hERG blockage data containing 2644 compounds were used, which divided into training (2389) and test (255) sets. Laplacian-corrected Bayesian classification models were constructed using Discovery Studio. The models were internally validated with the training set of compounds, and then applied to the test set for validation. Doddareddy's experimentally validated dataset with 60 compounds was used for external test set validation.

Results: A Bayesian classification model considering the effects of four molecular properties (Mw, PPSA, ALogP and pKa_basic) as well as extended-connectivity fingerprints (ECFP_14) exhibited a global accuracy (91%), parameter sensitivity (90%) and specificity (92%) in the test set validation, and a global accuracy (58%), parameter sensitivity (61%) and specificity (57%) in the external test set validation.

Conclusion: The novel model is better than those in the literatures for predicting compounds blocking hERG channels, and can be used for large-scale prediction.

Show MeSH
Related in: MedlinePlus