Limits...
Predicting the effect of missense mutations on protein function: analysis with Bayesian networks.

Needham CJ, Bradford JR, Bulpitt AJ, Care MA, Westhead DR - BMC Bioinformatics (2006)

Bottom Line: The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used.Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes.With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computing, University of Leeds, Leeds, LS2 9JT, UK. chrisn@comp.leeds.ac.uk

ABSTRACT

Background: A number of methods that use both protein structural and evolutionary information are available to predict the functional consequences of missense mutations. However, many of these methods break down if either one of the two types of data are missing. Furthermore, there is a lack of rigorous assessment of how important the different factors are to prediction.

Results: Here we use Bayesian networks to predict whether or not a missense mutation will affect the function of the protein. Bayesian networks provide a concise representation for inferring models from data, and are known to generalise well to new data. More importantly, they can handle the noisy, incomplete and uncertain nature of biological data. Our Bayesian network achieved comparable performance with previous machine learning methods. The predictive performance of learned model structures was no better than a naïve Bayes classifier. However, analysis of the posterior distribution of model structures allows biologically meaningful interpretation of relationships between the input variables.

Conclusion: The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used. Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes. With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.

Show MeSH

Related in: MedlinePlus

Learned Bayesian network structure . Learned Bayesian network structure  (using greedy search with BIC scoring function from mixed dataset). Key to node labels is shown in Table 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1586217&req=5

Figure 2: Learned Bayesian network structure . Learned Bayesian network structure (using greedy search with BIC scoring function from mixed dataset). Key to node labels is shown in Table 1.

Mentions: Using both the Bayesian and BIC scoring functions employed by the greedy search algorithm we learned structures from lac repressor and lysozyme datasets separately and the two datasets combined ('mixed'). As with the naïve Bayes classifier, we evaluated each structure using both homogeneous ten-fold and heterogeneous cross-validation. There was little significant difference in performance between the two scoring functions, or between structures learned on different datasets. The main difference was in the number of edges in the resulting DAGs. For our mixed dataset, there were 35 edges with BIC, and 48 with full Bayesian scoring. Using Occam's Razor, we prefer the simplest of equally good models, and take the Bayesian network structure learned from the mixed dataset, using the BIC scoring function, as our model structure , which is illustrated in Figure 2. With a harsher penalty for extra edges, the DAG learned using the BIC scoring function, should contain edges which are more likely to be biologically meaningful. It is important to note that the Bayesian networks with learned structure (or structure determined from conditional independence relations identified by an expert) capture the relationships between all the variables, and are not designed solely to discriminate for classification of a single variable based on the other variables. This is a significant advantage of the Bayesian network: we can infer the value of any variable(s) based on the value of any other variable, so we have constructed a model which can not only predict effect/no effect, but can infer the value of any of the attributes.


Predicting the effect of missense mutations on protein function: analysis with Bayesian networks.

Needham CJ, Bradford JR, Bulpitt AJ, Care MA, Westhead DR - BMC Bioinformatics (2006)

Learned Bayesian network structure . Learned Bayesian network structure  (using greedy search with BIC scoring function from mixed dataset). Key to node labels is shown in Table 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1586217&req=5

Figure 2: Learned Bayesian network structure . Learned Bayesian network structure (using greedy search with BIC scoring function from mixed dataset). Key to node labels is shown in Table 1.
Mentions: Using both the Bayesian and BIC scoring functions employed by the greedy search algorithm we learned structures from lac repressor and lysozyme datasets separately and the two datasets combined ('mixed'). As with the naïve Bayes classifier, we evaluated each structure using both homogeneous ten-fold and heterogeneous cross-validation. There was little significant difference in performance between the two scoring functions, or between structures learned on different datasets. The main difference was in the number of edges in the resulting DAGs. For our mixed dataset, there were 35 edges with BIC, and 48 with full Bayesian scoring. Using Occam's Razor, we prefer the simplest of equally good models, and take the Bayesian network structure learned from the mixed dataset, using the BIC scoring function, as our model structure , which is illustrated in Figure 2. With a harsher penalty for extra edges, the DAG learned using the BIC scoring function, should contain edges which are more likely to be biologically meaningful. It is important to note that the Bayesian networks with learned structure (or structure determined from conditional independence relations identified by an expert) capture the relationships between all the variables, and are not designed solely to discriminate for classification of a single variable based on the other variables. This is a significant advantage of the Bayesian network: we can infer the value of any variable(s) based on the value of any other variable, so we have constructed a model which can not only predict effect/no effect, but can infer the value of any of the attributes.

Bottom Line: The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used.Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes.With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computing, University of Leeds, Leeds, LS2 9JT, UK. chrisn@comp.leeds.ac.uk

ABSTRACT

Background: A number of methods that use both protein structural and evolutionary information are available to predict the functional consequences of missense mutations. However, many of these methods break down if either one of the two types of data are missing. Furthermore, there is a lack of rigorous assessment of how important the different factors are to prediction.

Results: Here we use Bayesian networks to predict whether or not a missense mutation will affect the function of the protein. Bayesian networks provide a concise representation for inferring models from data, and are known to generalise well to new data. More importantly, they can handle the noisy, incomplete and uncertain nature of biological data. Our Bayesian network achieved comparable performance with previous machine learning methods. The predictive performance of learned model structures was no better than a naïve Bayes classifier. However, analysis of the posterior distribution of model structures allows biologically meaningful interpretation of relationships between the input variables.

Conclusion: The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used. Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes. With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.

Show MeSH
Related in: MedlinePlus