Limits...
A chemogenomics view on protein-ligand spaces.

Strömbergsson H, Kleywegt GJ - BMC Bioinformatics (2009)

Bottom Line: The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB).Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets.We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden. helena.strombergsson@lcb.uu.se

ABSTRACT

Background: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces.

Results: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets.

Conclusion: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

Show MeSH
DrugBank cross-interaction study. The percentage captured cross interactions is plotted against the number of checked neighbours. The blue data series was computed from the protein-ligand PCA model and the red series was computed from the protein PCA model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2697636&req=5

Figure 3: DrugBank cross-interaction study. The percentage captured cross interactions is plotted against the number of checked neighbours. The blue data series was computed from the protein-ligand PCA model and the red series was computed from the protein PCA model.

Mentions: More than half of the drugs in DrugBank (59%) interact with more than one drug target. To investigate whether our modelling approach is able to detect known drug target cross interactions, the nearest neighbours (NNs) of each DrugBank complex were analysed. For each complex in DrugBank, whose ligand has at least one known cross interaction, the 25 NNs were computed from all ten extracted components in the protein-ligand PCA model. Figure 3 plots the percentage complexes for which at least one known drug target was found among the NNs, against the number of checked NNs. The figure shows that the protein-ligand PCA model is much better at capturing known protein cross interactions than the PCA model based only on protein descriptors. This shows that our PCA modelling approach is able to capture a large fraction of the known cross interactions, which suggests that the model will also be able to capture as yet unknown cross interactions with any protein-ligand interaction dataset.


A chemogenomics view on protein-ligand spaces.

Strömbergsson H, Kleywegt GJ - BMC Bioinformatics (2009)

DrugBank cross-interaction study. The percentage captured cross interactions is plotted against the number of checked neighbours. The blue data series was computed from the protein-ligand PCA model and the red series was computed from the protein PCA model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2697636&req=5

Figure 3: DrugBank cross-interaction study. The percentage captured cross interactions is plotted against the number of checked neighbours. The blue data series was computed from the protein-ligand PCA model and the red series was computed from the protein PCA model.
Mentions: More than half of the drugs in DrugBank (59%) interact with more than one drug target. To investigate whether our modelling approach is able to detect known drug target cross interactions, the nearest neighbours (NNs) of each DrugBank complex were analysed. For each complex in DrugBank, whose ligand has at least one known cross interaction, the 25 NNs were computed from all ten extracted components in the protein-ligand PCA model. Figure 3 plots the percentage complexes for which at least one known drug target was found among the NNs, against the number of checked NNs. The figure shows that the protein-ligand PCA model is much better at capturing known protein cross interactions than the PCA model based only on protein descriptors. This shows that our PCA modelling approach is able to capture a large fraction of the known cross interactions, which suggests that the model will also be able to capture as yet unknown cross interactions with any protein-ligand interaction dataset.

Bottom Line: The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB).Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets.We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden. helena.strombergsson@lcb.uu.se

ABSTRACT

Background: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces.

Results: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets.

Conclusion: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

Show MeSH