Limits...
Predicting the effect of missense mutations on protein function: analysis with Bayesian networks.

Needham CJ, Bradford JR, Bulpitt AJ, Care MA, Westhead DR - BMC Bioinformatics (2006)

Bottom Line: The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used.Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes.With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computing, University of Leeds, Leeds, LS2 9JT, UK. chrisn@comp.leeds.ac.uk

ABSTRACT

Background: A number of methods that use both protein structural and evolutionary information are available to predict the functional consequences of missense mutations. However, many of these methods break down if either one of the two types of data are missing. Furthermore, there is a lack of rigorous assessment of how important the different factors are to prediction.

Results: Here we use Bayesian networks to predict whether or not a missense mutation will affect the function of the protein. Bayesian networks provide a concise representation for inferring models from data, and are known to generalise well to new data. More importantly, they can handle the noisy, incomplete and uncertain nature of biological data. Our Bayesian network achieved comparable performance with previous machine learning methods. The predictive performance of learned model structures was no better than a naïve Bayes classifier. However, analysis of the posterior distribution of model structures allows biologically meaningful interpretation of relationships between the input variables.

Conclusion: The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used. Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes. With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.

Show MeSH
Classifier performance. Performance of naïve Bayes classifier and structure  with parameters learned from incomplete data. The AUC (area under the ROC curve) is plotted against the number of nodes (n) randomly chosen to have missing data within the test examples.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1586217&req=5

Figure 4: Classifier performance. Performance of naïve Bayes classifier and structure with parameters learned from incomplete data. The AUC (area under the ROC curve) is plotted against the number of nodes (n) randomly chosen to have missing data within the test examples.

Mentions: Bayesian networks are capable of learning model parameters from incomplete data. Here we test the tolerance of the Bayesian networks by training on incomplete data. In every training example, we hide n nodes (chosen randomly for each training case). We do this for the naïve Bayes classifier, and the learned structure , and vary n from 0 to 14. The CPTs are learned using the iterative EM algorithm on the missing values. Figure 4 shows the results of homogeneous cross-validation when trained on incomplete data from the 'mixed' dataset, and tested when all nodes are observed. Note that using this method, different sets of n nodes are chosen to have missing data between different training cases, therefore here we were testing the general ability of the Bayesian network to tolerate incomplete data rather than the effect of when certain nodes were missing data in all examples (as in the previous section).


Predicting the effect of missense mutations on protein function: analysis with Bayesian networks.

Needham CJ, Bradford JR, Bulpitt AJ, Care MA, Westhead DR - BMC Bioinformatics (2006)

Classifier performance. Performance of naïve Bayes classifier and structure  with parameters learned from incomplete data. The AUC (area under the ROC curve) is plotted against the number of nodes (n) randomly chosen to have missing data within the test examples.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1586217&req=5

Figure 4: Classifier performance. Performance of naïve Bayes classifier and structure with parameters learned from incomplete data. The AUC (area under the ROC curve) is plotted against the number of nodes (n) randomly chosen to have missing data within the test examples.
Mentions: Bayesian networks are capable of learning model parameters from incomplete data. Here we test the tolerance of the Bayesian networks by training on incomplete data. In every training example, we hide n nodes (chosen randomly for each training case). We do this for the naïve Bayes classifier, and the learned structure , and vary n from 0 to 14. The CPTs are learned using the iterative EM algorithm on the missing values. Figure 4 shows the results of homogeneous cross-validation when trained on incomplete data from the 'mixed' dataset, and tested when all nodes are observed. Note that using this method, different sets of n nodes are chosen to have missing data between different training cases, therefore here we were testing the general ability of the Bayesian network to tolerate incomplete data rather than the effect of when certain nodes were missing data in all examples (as in the previous section).

Bottom Line: The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used.Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes.With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computing, University of Leeds, Leeds, LS2 9JT, UK. chrisn@comp.leeds.ac.uk

ABSTRACT

Background: A number of methods that use both protein structural and evolutionary information are available to predict the functional consequences of missense mutations. However, many of these methods break down if either one of the two types of data are missing. Furthermore, there is a lack of rigorous assessment of how important the different factors are to prediction.

Results: Here we use Bayesian networks to predict whether or not a missense mutation will affect the function of the protein. Bayesian networks provide a concise representation for inferring models from data, and are known to generalise well to new data. More importantly, they can handle the noisy, incomplete and uncertain nature of biological data. Our Bayesian network achieved comparable performance with previous machine learning methods. The predictive performance of learned model structures was no better than a naïve Bayes classifier. However, analysis of the posterior distribution of model structures allows biologically meaningful interpretation of relationships between the input variables.

Conclusion: The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used. Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes. With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.

Show MeSH