Limits...
A chemogenomics view on protein-ligand spaces.

Strömbergsson H, Kleywegt GJ - BMC Bioinformatics (2009)

Bottom Line: The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB).Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets.We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden. helena.strombergsson@lcb.uu.se

ABSTRACT

Background: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces.

Results: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets.

Conclusion: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

Show MeSH

Related in: MedlinePlus

A cross interaction case study of P41594 in complex with acamprosate. The five nearest neighbours of the complex of human metabotropic gluatamate receptor 5 (P41594) and acamprosate, according to our pretein-ligand model. The protein name, percentage sequence identity to P41594, Tanimoto score of its ligand and acamprosate as well as the nearest neighbour distance between the two complexes is reported for each neighbour.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2697636&req=5

Figure 4: A cross interaction case study of P41594 in complex with acamprosate. The five nearest neighbours of the complex of human metabotropic gluatamate receptor 5 (P41594) and acamprosate, according to our pretein-ligand model. The protein name, percentage sequence identity to P41594, Tanimoto score of its ligand and acamprosate as well as the nearest neighbour distance between the two complexes is reported for each neighbour.

Mentions: Acamprosate (calcium acetylaminopropane sulfonate; Campral®) is used to treat alcohol dependence [34]. Its chemical structure is similar to that of gamma-aminobutyric acid, and it is thought to act through several mechanisms affecting multiple neurotransmitter systems. Serious side effects include allergic reactions, irregular heartbeat, and low or high blood pressure, while less serious side effects include headaches, insomnia, and impotence. DrugBank lists five protein targets that interact with acamprosate (P41594, Q13255, Q14416, Q14832, Q8TCU5). The targets are all in the human glutamate receptor family, and protein 3D structures are not available for any of the targets. The five nearest neighbours of glutamate receptor 5 (P41594) were computed from all extracted components of the protein-ligand PCA model. Figure 4 shows a ModBase [35] homology model of glutamate receptor 5, and the acamprosate ligand structure, together with information on the five nearest neighbours, in the merged PDB and DrugBank datasets. The first neighbour is an acetyltransferase component of a pyruvate dehydrogenase complex(1Y8NB) in complex with its co-factor lipoic acid (LPA). The second neighbour is a porin protein from the outer membrane (1IXWC) in complex with a colicin inhibitor (OES). The third neighbouring complex is human carbonic anhydrase I (P00915, PDB code 1AZMA) and the drug Levetiracetam (DB01202) that is used to treat epilepsy. The fourth neighbour is Hepatitis A virus proteinase C (2H9HA) in complex with a peptide-based ketone inhibitor (EPQ). The last neighbour is a glutamate receptor that is a known cross interaction target, and its putative structure is shown as a homology model obtained from ModBase [35]. Interestingly, in terms of protein sequence similarity and Tanimoto score, the last neighbour is the most similar to the P41594-acamprosate complex. This is probably due the fact that both protein and ligand descriptors capture general features, such a molecular weight or percentage charged amino acid residues. The major advantages with the protein descriptors are that they are easy to interpret, fast to implement, and allow for generalized comparisons of a large set of heterogeneous proteins. However, the descriptors do not reflect sequence length or any structural properties, and like many QSAR descriptors, describe each protein as single large molecule. Therefore, a protein and its nearest neighbours will not necessarily display a high degree of sequence similarity. Similarly, the ligand descriptors selected for this study are very easy to interpret and describe properties important for drug development. They do not, however, describe any structural properties which explain the low Tanimoto scores between acamprosate and the five neighbouring ligands. Despite the above mentioned limitations, this example demonstrates that it is still possible to identify one of the known acamprosate cross interactions. Considering that this method captures a large part of the known DrugBank cross interactions (shown in Figure 3), it is reasonable to assume that the subset of nearest neighbours may be used in, for instance, focused screening approaches that aim to design selective drugs, or as inspiration in the search of novel drug scaffolds.


A chemogenomics view on protein-ligand spaces.

Strömbergsson H, Kleywegt GJ - BMC Bioinformatics (2009)

A cross interaction case study of P41594 in complex with acamprosate. The five nearest neighbours of the complex of human metabotropic gluatamate receptor 5 (P41594) and acamprosate, according to our pretein-ligand model. The protein name, percentage sequence identity to P41594, Tanimoto score of its ligand and acamprosate as well as the nearest neighbour distance between the two complexes is reported for each neighbour.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2697636&req=5

Figure 4: A cross interaction case study of P41594 in complex with acamprosate. The five nearest neighbours of the complex of human metabotropic gluatamate receptor 5 (P41594) and acamprosate, according to our pretein-ligand model. The protein name, percentage sequence identity to P41594, Tanimoto score of its ligand and acamprosate as well as the nearest neighbour distance between the two complexes is reported for each neighbour.
Mentions: Acamprosate (calcium acetylaminopropane sulfonate; Campral®) is used to treat alcohol dependence [34]. Its chemical structure is similar to that of gamma-aminobutyric acid, and it is thought to act through several mechanisms affecting multiple neurotransmitter systems. Serious side effects include allergic reactions, irregular heartbeat, and low or high blood pressure, while less serious side effects include headaches, insomnia, and impotence. DrugBank lists five protein targets that interact with acamprosate (P41594, Q13255, Q14416, Q14832, Q8TCU5). The targets are all in the human glutamate receptor family, and protein 3D structures are not available for any of the targets. The five nearest neighbours of glutamate receptor 5 (P41594) were computed from all extracted components of the protein-ligand PCA model. Figure 4 shows a ModBase [35] homology model of glutamate receptor 5, and the acamprosate ligand structure, together with information on the five nearest neighbours, in the merged PDB and DrugBank datasets. The first neighbour is an acetyltransferase component of a pyruvate dehydrogenase complex(1Y8NB) in complex with its co-factor lipoic acid (LPA). The second neighbour is a porin protein from the outer membrane (1IXWC) in complex with a colicin inhibitor (OES). The third neighbouring complex is human carbonic anhydrase I (P00915, PDB code 1AZMA) and the drug Levetiracetam (DB01202) that is used to treat epilepsy. The fourth neighbour is Hepatitis A virus proteinase C (2H9HA) in complex with a peptide-based ketone inhibitor (EPQ). The last neighbour is a glutamate receptor that is a known cross interaction target, and its putative structure is shown as a homology model obtained from ModBase [35]. Interestingly, in terms of protein sequence similarity and Tanimoto score, the last neighbour is the most similar to the P41594-acamprosate complex. This is probably due the fact that both protein and ligand descriptors capture general features, such a molecular weight or percentage charged amino acid residues. The major advantages with the protein descriptors are that they are easy to interpret, fast to implement, and allow for generalized comparisons of a large set of heterogeneous proteins. However, the descriptors do not reflect sequence length or any structural properties, and like many QSAR descriptors, describe each protein as single large molecule. Therefore, a protein and its nearest neighbours will not necessarily display a high degree of sequence similarity. Similarly, the ligand descriptors selected for this study are very easy to interpret and describe properties important for drug development. They do not, however, describe any structural properties which explain the low Tanimoto scores between acamprosate and the five neighbouring ligands. Despite the above mentioned limitations, this example demonstrates that it is still possible to identify one of the known acamprosate cross interactions. Considering that this method captures a large part of the known DrugBank cross interactions (shown in Figure 3), it is reasonable to assume that the subset of nearest neighbours may be used in, for instance, focused screening approaches that aim to design selective drugs, or as inspiration in the search of novel drug scaffolds.

Bottom Line: The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB).Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets.We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden. helena.strombergsson@lcb.uu.se

ABSTRACT

Background: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces.

Results: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets.

Conclusion: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

Show MeSH
Related in: MedlinePlus