Limits...
A dynamic Bayesian network approach to protein secondary structure prediction.

Yao XQ, Zhu H, She ZS - BMC Bioinformatics (2008)

Bottom Line: The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues.In addition, a segment length distribution is introduced for each secondary structure state.The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods.

View Article: PubMed Central - HTML - PubMed

Affiliation: State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, Peking University, Beijing 100871, China. yxq@ctb.pku.edu.cn

ABSTRACT

Background: Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM).

Results: In this paper, we report a new method of probabilistic nature for protein secondary structure prediction, based on dynamic Bayesian networks (DBN). The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues. In addition, a segment length distribution is introduced for each secondary structure state. Tests show that the DBN method has made a significant improvement in the accuracy compared to other pure HMM-type methods. Further improvement is achieved by combining the DBN with an NN, a method called DBNN, which shows better Q3 accuracy than many popular methods and is competitive to the current state-of-the-arts. The most interesting feature of DBN/DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus.

Conclusion: The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods. Owing to their different nature, the DBN and NN combine to form a more accurate method DBNN. Future improvement may be achieved by combining DBNN with a method of SVM type.

Show MeSH
The influence of window sizes on the Q3 of DBN. LAA and LSS are window sizes for profile and secondary structure, respectively. The results are obtained by testing DBNsigmoid on the SD576 dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2266706&req=5

Figure 1: The influence of window sizes on the Q3 of DBN. LAA and LSS are window sizes for profile and secondary structure, respectively. The results are obtained by testing DBNsigmoid on the SD576 dataset.

Mentions: As shown in Fig. 1, Q3 is improved significantly when LSS > 0, and saturated when LSS > 1, which indicates that there is strong short-range dependency between the profile of a residue and the secondary structure states of its neighbors. A similar phenomenon occurs for profiles' dependency of neighboring sites. Note that the model with either LAA = 0 or LSS = 0 is a special case of DBN, in which the distribution of the profile of each residue is independent from neighboring profiles or neighboring secondary structure states, respectively. As a result, its topology is different from that of a full-DBN version (LAA > 0 and LSS > 0) due to the removal of Ri or di nodes (see Fig. 2(c)).


A dynamic Bayesian network approach to protein secondary structure prediction.

Yao XQ, Zhu H, She ZS - BMC Bioinformatics (2008)

The influence of window sizes on the Q3 of DBN. LAA and LSS are window sizes for profile and secondary structure, respectively. The results are obtained by testing DBNsigmoid on the SD576 dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2266706&req=5

Figure 1: The influence of window sizes on the Q3 of DBN. LAA and LSS are window sizes for profile and secondary structure, respectively. The results are obtained by testing DBNsigmoid on the SD576 dataset.
Mentions: As shown in Fig. 1, Q3 is improved significantly when LSS > 0, and saturated when LSS > 1, which indicates that there is strong short-range dependency between the profile of a residue and the secondary structure states of its neighbors. A similar phenomenon occurs for profiles' dependency of neighboring sites. Note that the model with either LAA = 0 or LSS = 0 is a special case of DBN, in which the distribution of the profile of each residue is independent from neighboring profiles or neighboring secondary structure states, respectively. As a result, its topology is different from that of a full-DBN version (LAA > 0 and LSS > 0) due to the removal of Ri or di nodes (see Fig. 2(c)).

Bottom Line: The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues.In addition, a segment length distribution is introduced for each secondary structure state.The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods.

View Article: PubMed Central - HTML - PubMed

Affiliation: State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, Peking University, Beijing 100871, China. yxq@ctb.pku.edu.cn

ABSTRACT

Background: Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM).

Results: In this paper, we report a new method of probabilistic nature for protein secondary structure prediction, based on dynamic Bayesian networks (DBN). The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues. In addition, a segment length distribution is introduced for each secondary structure state. Tests show that the DBN method has made a significant improvement in the accuracy compared to other pure HMM-type methods. Further improvement is achieved by combining the DBN with an NN, a method called DBNN, which shows better Q3 accuracy than many popular methods and is competitive to the current state-of-the-arts. The most interesting feature of DBN/DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus.

Conclusion: The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods. Owing to their different nature, the DBN and NN combine to form a more accurate method DBNN. Future improvement may be achieved by combining DBNN with a method of SVM type.

Show MeSH