Limits...
Constructing patch-based ligand-binding pocket database for predicting function of proteins.

Sael L, Kihara D - BMC Bioinformatics (2012)

Bottom Line: Many of solved tertiary structures of unknown functions do not have global sequence and structural similarities to proteins of known function.Patch-Surfer achieved the average enrichment factor at 0.1 percent of over 20.0.The results did not depend on the sequence similarity of the query protein to proteins in the database, indicating that Patch-Surfer can identify correct pockets even in the absence of known homologous structures in the database.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA.

ABSTRACT

Background: Many of solved tertiary structures of unknown functions do not have global sequence and structural similarities to proteins of known function. Often functional clues of unknown proteins can be obtained by predicting small ligand molecules that bind to the proteins.

Methods: In our previous work, we have developed an alignment free local surface-based pocket comparison method, named Patch-Surfer, which predicts ligand molecules that are likely to bind to a protein of interest. Given a query pocket in a protein, Patch-Surfer searches a database of known pockets and finds similar ones to the query. Here, we have extended the database of ligand binding pockets for Patch-Surfer to cover diverse types of binding ligands.

Results and conclusion: We selected 9393 representative pockets with 2707 different ligand types from the Protein Data Bank. We tested Patch-Surfer on the extended pocket database to predict binding ligand of 75 non-homologous proteins that bind one of seven different ligands. Patch-Surfer achieved the average enrichment factor at 0.1 percent of over 20.0. The results did not depend on the sequence similarity of the query protein to proteins in the database, indicating that Patch-Surfer can identify correct pockets even in the absence of known homologous structures in the database.

Show MeSH

Related in: MedlinePlus

Enrichment factor calculated for different percentiles. A, the average EF of 75 query pockets using different combinations of the distance threshold and the surface properties. t0.2, t0.3 shows results using the threshold distance of 0.2 and 0.3, respectively; "no t" shows the result when no threshold is used. Two surface property combinations are used: all four properties, the shape, hydrophobicity, the electrostatic potential, and the visibility, and only using the shape information. B, EF for each of the ligand types in the test dataset using the distance threshold of 0.2 and all the four surface properties.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3375630&req=5

Figure 1: Enrichment factor calculated for different percentiles. A, the average EF of 75 query pockets using different combinations of the distance threshold and the surface properties. t0.2, t0.3 shows results using the threshold distance of 0.2 and 0.3, respectively; "no t" shows the result when no threshold is used. Two surface property combinations are used: all four properties, the shape, hydrophobicity, the electrostatic potential, and the visibility, and only using the shape information. B, EF for each of the ligand types in the test dataset using the distance threshold of 0.2 and all the four surface properties.

Mentions: Patch-Surfer was run with six different settings: using all the four properties or using only the shape information combined with three different distance thresholds for matching patches, 0.2, 0.3, and no threshold for the patch distance (Eqn. 4). Using the threshold value of 0.2, only similar surface patches with the distance closer than 0.2 are matched while the no threshold option matches the maximum number of pairs between two pockets regardless of their distance (i.e. all the patches in the smaller pocket are matched to patches in the larger pocket). The results (Figure 1A) show that first, using all the four properties showed better EF than just using the shape information, and second, using the threshold value of 0.2 performed best among the three choices tested for the distance threshold. The best retrieval was observed when all the patch properties and the threshold of 0.2 were used. Figure 1B shows the EF of each ligand types using Patch-Surfer with the threshold distance of 0.2 and all the four properties. The HEM and the FAD showed very high EF values of over 30 at early ranks. Patch-Surfer performed relatively poorly for GLC. The reason for this is that there are twenty other ligands that are similar to GLC in the database, according to the Tanimoto coefficient (higher than 0.85).


Constructing patch-based ligand-binding pocket database for predicting function of proteins.

Sael L, Kihara D - BMC Bioinformatics (2012)

Enrichment factor calculated for different percentiles. A, the average EF of 75 query pockets using different combinations of the distance threshold and the surface properties. t0.2, t0.3 shows results using the threshold distance of 0.2 and 0.3, respectively; "no t" shows the result when no threshold is used. Two surface property combinations are used: all four properties, the shape, hydrophobicity, the electrostatic potential, and the visibility, and only using the shape information. B, EF for each of the ligand types in the test dataset using the distance threshold of 0.2 and all the four surface properties.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3375630&req=5

Figure 1: Enrichment factor calculated for different percentiles. A, the average EF of 75 query pockets using different combinations of the distance threshold and the surface properties. t0.2, t0.3 shows results using the threshold distance of 0.2 and 0.3, respectively; "no t" shows the result when no threshold is used. Two surface property combinations are used: all four properties, the shape, hydrophobicity, the electrostatic potential, and the visibility, and only using the shape information. B, EF for each of the ligand types in the test dataset using the distance threshold of 0.2 and all the four surface properties.
Mentions: Patch-Surfer was run with six different settings: using all the four properties or using only the shape information combined with three different distance thresholds for matching patches, 0.2, 0.3, and no threshold for the patch distance (Eqn. 4). Using the threshold value of 0.2, only similar surface patches with the distance closer than 0.2 are matched while the no threshold option matches the maximum number of pairs between two pockets regardless of their distance (i.e. all the patches in the smaller pocket are matched to patches in the larger pocket). The results (Figure 1A) show that first, using all the four properties showed better EF than just using the shape information, and second, using the threshold value of 0.2 performed best among the three choices tested for the distance threshold. The best retrieval was observed when all the patch properties and the threshold of 0.2 were used. Figure 1B shows the EF of each ligand types using Patch-Surfer with the threshold distance of 0.2 and all the four properties. The HEM and the FAD showed very high EF values of over 30 at early ranks. Patch-Surfer performed relatively poorly for GLC. The reason for this is that there are twenty other ligands that are similar to GLC in the database, according to the Tanimoto coefficient (higher than 0.85).

Bottom Line: Many of solved tertiary structures of unknown functions do not have global sequence and structural similarities to proteins of known function.Patch-Surfer achieved the average enrichment factor at 0.1 percent of over 20.0.The results did not depend on the sequence similarity of the query protein to proteins in the database, indicating that Patch-Surfer can identify correct pockets even in the absence of known homologous structures in the database.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA.

ABSTRACT

Background: Many of solved tertiary structures of unknown functions do not have global sequence and structural similarities to proteins of known function. Often functional clues of unknown proteins can be obtained by predicting small ligand molecules that bind to the proteins.

Methods: In our previous work, we have developed an alignment free local surface-based pocket comparison method, named Patch-Surfer, which predicts ligand molecules that are likely to bind to a protein of interest. Given a query pocket in a protein, Patch-Surfer searches a database of known pockets and finds similar ones to the query. Here, we have extended the database of ligand binding pockets for Patch-Surfer to cover diverse types of binding ligands.

Results and conclusion: We selected 9393 representative pockets with 2707 different ligand types from the Protein Data Bank. We tested Patch-Surfer on the extended pocket database to predict binding ligand of 75 non-homologous proteins that bind one of seven different ligands. Patch-Surfer achieved the average enrichment factor at 0.1 percent of over 20.0. The results did not depend on the sequence similarity of the query protein to proteins in the database, indicating that Patch-Surfer can identify correct pockets even in the absence of known homologous structures in the database.

Show MeSH
Related in: MedlinePlus