Limits...
LigASite--a database of biologically relevant binding sites in proteins with known apo-structures.

Dessailly BH, Lensink MF, Orengo CA, Wodak SJ - Nucleic Acids Res. (2007)

Bottom Line: In defining the binding sites for each protein, information from all holo-structures is combined, considering in each case the quaternary structure defined by the PQS server.LigASite is built using simple criteria and is automatically updated as new structures become available in the PDB, thereby guaranteeing optimal data coverage over time.The datasets can be downloaded from the website as Schema-validated XML files or comma-separated flat files.

View Article: PubMed Central - PubMed

Affiliation: Center for Structural Biology and Bioinformatics, Université Libre de Bruxelles (U. L. B.), Bld du Triomphe - CP 263, 1050 Bruxelles, Belgium.

ABSTRACT
Better characterization of binding sites in proteins and the ability to accurately predict their location and energetic properties are major challenges which, if addressed, would have many valuable practical applications. Unfortunately, reliable benchmark datasets of binding sites in proteins are still sorely lacking. Here, we present LigASite ('LIGand Attachment SITE'), a gold-standard dataset of binding sites in 550 proteins of known structures. LigASite consists exclusively of biologically relevant binding sites in proteins for which at least one apo- and one holo-structure are available. In defining the binding sites for each protein, information from all holo-structures is combined, considering in each case the quaternary structure defined by the PQS server. LigASite is built using simple criteria and is automatically updated as new structures become available in the PDB, thereby guaranteeing optimal data coverage over time. Both a redundant and a culled non-redundant version of the dataset is available at http://www.scmbb.ulb.ac.be/Users/benoit/LigASite. The website interface allows users to search the dataset by PDB identifiers, ligand identifiers, protein names or sequence, and to look for structural matches as defined by the CATH homologous superfamilies. The datasets can be downloaded from the website as Schema-validated XML files or comma-separated flat files.

Show MeSH
Distribution of EC classes among proteins in (a) the non-redundant version of the LigASite dataset (redundancy removed at 25% sequence identity), and in (b) a non-redundant subset of the PDB (redundancy removed at 25% sequence identity), which consists of 5180 PDB entries. EC numbers were obtained from PDBSprotEC, a mapping of PDB entries to EC numbers via SwissProt (26).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2238865&req=5

Figure 3: Distribution of EC classes among proteins in (a) the non-redundant version of the LigASite dataset (redundancy removed at 25% sequence identity), and in (b) a non-redundant subset of the PDB (redundancy removed at 25% sequence identity), which consists of 5180 PDB entries. EC numbers were obtained from PDBSprotEC, a mapping of PDB entries to EC numbers via SwissProt (26).

Mentions: We used the PDBSProtEC mapping (26), in order to obtain the EC numbers for all 286 proteins in the non-redundant version of LigASite, and for all the proteins of a non-redundant version of the PDB [redundancy removed at 25% sequence identity using PISCES (21)] (Figure 3). Only 48 proteins out of 286 (i.e. 17%) in the non-redundant version of LigASite are non-enzymes (26). In the non-redundant version of the PDB, 39% of proteins are non-enzymes. Among enzymes, transferases and hydrolases (EC classes 2. and 3., respectively) are the most common both in LigASite and the PDB.Figure 3.


LigASite--a database of biologically relevant binding sites in proteins with known apo-structures.

Dessailly BH, Lensink MF, Orengo CA, Wodak SJ - Nucleic Acids Res. (2007)

Distribution of EC classes among proteins in (a) the non-redundant version of the LigASite dataset (redundancy removed at 25% sequence identity), and in (b) a non-redundant subset of the PDB (redundancy removed at 25% sequence identity), which consists of 5180 PDB entries. EC numbers were obtained from PDBSprotEC, a mapping of PDB entries to EC numbers via SwissProt (26).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2238865&req=5

Figure 3: Distribution of EC classes among proteins in (a) the non-redundant version of the LigASite dataset (redundancy removed at 25% sequence identity), and in (b) a non-redundant subset of the PDB (redundancy removed at 25% sequence identity), which consists of 5180 PDB entries. EC numbers were obtained from PDBSprotEC, a mapping of PDB entries to EC numbers via SwissProt (26).
Mentions: We used the PDBSProtEC mapping (26), in order to obtain the EC numbers for all 286 proteins in the non-redundant version of LigASite, and for all the proteins of a non-redundant version of the PDB [redundancy removed at 25% sequence identity using PISCES (21)] (Figure 3). Only 48 proteins out of 286 (i.e. 17%) in the non-redundant version of LigASite are non-enzymes (26). In the non-redundant version of the PDB, 39% of proteins are non-enzymes. Among enzymes, transferases and hydrolases (EC classes 2. and 3., respectively) are the most common both in LigASite and the PDB.Figure 3.

Bottom Line: In defining the binding sites for each protein, information from all holo-structures is combined, considering in each case the quaternary structure defined by the PQS server.LigASite is built using simple criteria and is automatically updated as new structures become available in the PDB, thereby guaranteeing optimal data coverage over time.The datasets can be downloaded from the website as Schema-validated XML files or comma-separated flat files.

View Article: PubMed Central - PubMed

Affiliation: Center for Structural Biology and Bioinformatics, Université Libre de Bruxelles (U. L. B.), Bld du Triomphe - CP 263, 1050 Bruxelles, Belgium.

ABSTRACT
Better characterization of binding sites in proteins and the ability to accurately predict their location and energetic properties are major challenges which, if addressed, would have many valuable practical applications. Unfortunately, reliable benchmark datasets of binding sites in proteins are still sorely lacking. Here, we present LigASite ('LIGand Attachment SITE'), a gold-standard dataset of binding sites in 550 proteins of known structures. LigASite consists exclusively of biologically relevant binding sites in proteins for which at least one apo- and one holo-structure are available. In defining the binding sites for each protein, information from all holo-structures is combined, considering in each case the quaternary structure defined by the PQS server. LigASite is built using simple criteria and is automatically updated as new structures become available in the PDB, thereby guaranteeing optimal data coverage over time. Both a redundant and a culled non-redundant version of the dataset is available at http://www.scmbb.ulb.ac.be/Users/benoit/LigASite. The website interface allows users to search the dataset by PDB identifiers, ligand identifiers, protein names or sequence, and to look for structural matches as defined by the CATH homologous superfamilies. The datasets can be downloaded from the website as Schema-validated XML files or comma-separated flat files.

Show MeSH