Limits...
A chemogenomics view on protein-ligand spaces.

Strömbergsson H, Kleywegt GJ - BMC Bioinformatics (2009)

Bottom Line: The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB).Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets.We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden. helena.strombergsson@lcb.uu.se

ABSTRACT

Background: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces.

Results: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets.

Conclusion: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

Show MeSH
Number of PDB chains bound to each ligand. The number of non-redundant PDB chains is plotted for ligand 10–1000 in the structural dataset. All ligands in complex with more than 100 chains (red dotted line) were checked manually.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2697636&req=5

Figure 1: Number of PDB chains bound to each ligand. The number of non-redundant PDB chains is plotted for ligand 10–1000 in the structural dataset. All ligands in complex with more than 100 chains (red dotted line) were checked manually.

Mentions: It is not trivial to determine which ligands in the PDB bind non-specifically. For instance, many commonly occurring carbohydrates can bind specifically to some proteins but may also be additives from experiments. Ligands suspected to be additives, and ligands associated with more than 100 PDB entries were scrutinized using literature searches and discussed with an expert (L. Liljas, Uppsala). Figure 1 shows that the large majority of ligands are associated with fewer than 100 non-redundant PDB chains. However, since only a small fraction of the ligands (~150 out of 6253) were investigated manually, it is likely that there are some non-specific ligands in the final PDB interaction dataset (that is based on 5481 ligands). In addition, the set of 772 removed ligands may well contain a few "true" ligands that bind specifically to their protein target. However, considering the large size of the final PDB interaction dataset (13275 complexes), we assume that the possible inclusion of a few non-specific ligands will not seriously affect the projection of the protein-ligand space.


A chemogenomics view on protein-ligand spaces.

Strömbergsson H, Kleywegt GJ - BMC Bioinformatics (2009)

Number of PDB chains bound to each ligand. The number of non-redundant PDB chains is plotted for ligand 10–1000 in the structural dataset. All ligands in complex with more than 100 chains (red dotted line) were checked manually.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2697636&req=5

Figure 1: Number of PDB chains bound to each ligand. The number of non-redundant PDB chains is plotted for ligand 10–1000 in the structural dataset. All ligands in complex with more than 100 chains (red dotted line) were checked manually.
Mentions: It is not trivial to determine which ligands in the PDB bind non-specifically. For instance, many commonly occurring carbohydrates can bind specifically to some proteins but may also be additives from experiments. Ligands suspected to be additives, and ligands associated with more than 100 PDB entries were scrutinized using literature searches and discussed with an expert (L. Liljas, Uppsala). Figure 1 shows that the large majority of ligands are associated with fewer than 100 non-redundant PDB chains. However, since only a small fraction of the ligands (~150 out of 6253) were investigated manually, it is likely that there are some non-specific ligands in the final PDB interaction dataset (that is based on 5481 ligands). In addition, the set of 772 removed ligands may well contain a few "true" ligands that bind specifically to their protein target. However, considering the large size of the final PDB interaction dataset (13275 complexes), we assume that the possible inclusion of a few non-specific ligands will not seriously affect the projection of the protein-ligand space.

Bottom Line: The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB).Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets.We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden. helena.strombergsson@lcb.uu.se

ABSTRACT

Background: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces.

Results: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets.

Conclusion: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

Show MeSH