Limits...
Structural descriptor database: a new tool for sequence-based functional site prediction.

Bernardes JS, Fernandez JH, Vasconcelos AT - BMC Bioinformatics (2008)

Bottom Line: For all evaluations, significant improvements were obtained with SDDB.SDDB performed better when trusty training data was available.Nevertheless, by using our prediction method we obtained results with precision above 70%.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratório Nacional de Computação Científica LNCC/MTC, Quitandinha, Petrópolis, RJ, Brazil. julibinho@gmail.com

ABSTRACT

Background: The Structural Descriptor Database (SDDB) is a web-based tool that predicts the function of proteins and functional site positions based on the structural properties of related protein families. Structural alignments and functional residues of a known protein set (defined as the training set) are used to build special Hidden Markov Models (HMM) called HMM descriptors. SDDB uses previously calculated and stored HMM descriptors for predicting active sites, binding residues, and protein function. The database integrates biologically relevant data filtered from several databases such as PDB, PDBSUM, CSA and SCOP. It accepts queries in fasta format and predicts functional residue positions, protein-ligand interactions, and protein function, based on the SCOP database.

Results: To assess the SDDB performance, we used different data sets. The Trypsion-like Serine protease data set assessed how well SDDB predicts functional sites when curated data is available. The SCOP family data set was used to analyze SDDB performance by using training data extracted from PDBSUM (binding sites) and from CSA (active sites). The ATP-binding experiment was used to compare our approach with the most current method. For all evaluations, significant improvements were obtained with SDDB.

Conclusion: SDDB performed better when trusty training data was available. SDDB worked better in predicting active sites rather than binding sites because the former are more conserved than the latter. Nevertheless, by using our prediction method we obtained results with precision above 70%.

Show MeSH
Mapping functional residues to HMM states. Each column in the globin alignment maps to either match or insert state, including the columns that represent functional sites. The columns labeled by As represents active site positions, whereas Bs columns represents binding site positions. Mi, Ii and Di represent match, insert and delete states in HMM architecture, respectively. In this illustration, As1 mapped to M8 state, and the Bs1, Bs2 and Bs3 columns mapped to M2, M13 and I15 states, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2612011&req=5

Figure 4: Mapping functional residues to HMM states. Each column in the globin alignment maps to either match or insert state, including the columns that represent functional sites. The columns labeled by As represents active site positions, whereas Bs columns represents binding site positions. Mi, Ii and Di represent match, insert and delete states in HMM architecture, respectively. In this illustration, As1 mapped to M8 state, and the Bs1, Bs2 and Bs3 columns mapped to M2, M13 and I15 states, respectively.

Mentions: The alignment-columns containing functional sites were mapped to the HMM states, i.e, either match or insert states, as shown in figure 4. The figure shows a partial alignment of proteins for the globin family, in which the column labeled Bs1 that represents a binding site position in the alignment mapped to the match state M2. Similarity, the column labeled As1, which represents an active site position in the alignment, mapped to the match state M8. Note that the column labeled Bs3 mapped an insert state I15. Hence all columns in the alignment were represented by one HMM state. When a protein p is scored by an HMM, the Viterbi algorithm [41] provides both e-value and the best path found involving scored p amino acids through the HMM architecture. This path is the sequence of states by which the p amino acids were recognized for p to be classified by the HMM. We knew which of the states of the HMM represented the functional sites (figure 4), so we were able to determine whether the amino acids of p were recognized by those states, and thus predict the positions of the functional sites of p. For instance, let p = (a1, a2,...,an) be a protein sequence, where ai is a amino acid in the i position, and let HMMf be an HMM that represents an arbitrary family f, then we are interested in Pr[p/HMMf], which describes the probability of observing a protein p within HMMf. The Viterbi algorithm give us both Pr[p/HMMf] and π, where π is the best path of p through HMMf states. If π is given by M1M2I3...Mn, then it means that a1 was recognized by the match state M1, a2 was recognized by the M2, a3 was recognized by the insert state I3, until an was recognized by Mn. Therefore, if M2 is a state that represents a binding site, it is likely that a2 in p is a binding site residue.


Structural descriptor database: a new tool for sequence-based functional site prediction.

Bernardes JS, Fernandez JH, Vasconcelos AT - BMC Bioinformatics (2008)

Mapping functional residues to HMM states. Each column in the globin alignment maps to either match or insert state, including the columns that represent functional sites. The columns labeled by As represents active site positions, whereas Bs columns represents binding site positions. Mi, Ii and Di represent match, insert and delete states in HMM architecture, respectively. In this illustration, As1 mapped to M8 state, and the Bs1, Bs2 and Bs3 columns mapped to M2, M13 and I15 states, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2612011&req=5

Figure 4: Mapping functional residues to HMM states. Each column in the globin alignment maps to either match or insert state, including the columns that represent functional sites. The columns labeled by As represents active site positions, whereas Bs columns represents binding site positions. Mi, Ii and Di represent match, insert and delete states in HMM architecture, respectively. In this illustration, As1 mapped to M8 state, and the Bs1, Bs2 and Bs3 columns mapped to M2, M13 and I15 states, respectively.
Mentions: The alignment-columns containing functional sites were mapped to the HMM states, i.e, either match or insert states, as shown in figure 4. The figure shows a partial alignment of proteins for the globin family, in which the column labeled Bs1 that represents a binding site position in the alignment mapped to the match state M2. Similarity, the column labeled As1, which represents an active site position in the alignment, mapped to the match state M8. Note that the column labeled Bs3 mapped an insert state I15. Hence all columns in the alignment were represented by one HMM state. When a protein p is scored by an HMM, the Viterbi algorithm [41] provides both e-value and the best path found involving scored p amino acids through the HMM architecture. This path is the sequence of states by which the p amino acids were recognized for p to be classified by the HMM. We knew which of the states of the HMM represented the functional sites (figure 4), so we were able to determine whether the amino acids of p were recognized by those states, and thus predict the positions of the functional sites of p. For instance, let p = (a1, a2,...,an) be a protein sequence, where ai is a amino acid in the i position, and let HMMf be an HMM that represents an arbitrary family f, then we are interested in Pr[p/HMMf], which describes the probability of observing a protein p within HMMf. The Viterbi algorithm give us both Pr[p/HMMf] and π, where π is the best path of p through HMMf states. If π is given by M1M2I3...Mn, then it means that a1 was recognized by the match state M1, a2 was recognized by the M2, a3 was recognized by the insert state I3, until an was recognized by Mn. Therefore, if M2 is a state that represents a binding site, it is likely that a2 in p is a binding site residue.

Bottom Line: For all evaluations, significant improvements were obtained with SDDB.SDDB performed better when trusty training data was available.Nevertheless, by using our prediction method we obtained results with precision above 70%.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratório Nacional de Computação Científica LNCC/MTC, Quitandinha, Petrópolis, RJ, Brazil. julibinho@gmail.com

ABSTRACT

Background: The Structural Descriptor Database (SDDB) is a web-based tool that predicts the function of proteins and functional site positions based on the structural properties of related protein families. Structural alignments and functional residues of a known protein set (defined as the training set) are used to build special Hidden Markov Models (HMM) called HMM descriptors. SDDB uses previously calculated and stored HMM descriptors for predicting active sites, binding residues, and protein function. The database integrates biologically relevant data filtered from several databases such as PDB, PDBSUM, CSA and SCOP. It accepts queries in fasta format and predicts functional residue positions, protein-ligand interactions, and protein function, based on the SCOP database.

Results: To assess the SDDB performance, we used different data sets. The Trypsion-like Serine protease data set assessed how well SDDB predicts functional sites when curated data is available. The SCOP family data set was used to analyze SDDB performance by using training data extracted from PDBSUM (binding sites) and from CSA (active sites). The ATP-binding experiment was used to compare our approach with the most current method. For all evaluations, significant improvements were obtained with SDDB.

Conclusion: SDDB performed better when trusty training data was available. SDDB worked better in predicting active sites rather than binding sites because the former are more conserved than the latter. Nevertheless, by using our prediction method we obtained results with precision above 70%.

Show MeSH