Limits...
Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity.

Ruklisa D, Ware JS, Walsh R, Balding DJ, Cook SA - Genome Med (2015)

Bottom Line: We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors.The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions.Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making.

View Article: PubMed Central - PubMed

Affiliation: UCL Genetics Institute, London, UK.

ABSTRACT

Background: With the advent of affordable and comprehensive sequencing technologies, access to molecular genetics for clinical diagnostics and research applications is increasing. However, variant interpretation remains challenging, and tools that close the gap between data generation and data interpretation are urgently required. Here we present a transferable approach to help address the limitations in variant annotation.

Methods: We develop a network of Bayesian logistic regression models that integrate multiple lines of evidence to evaluate the probability that a rare variant is the cause of an individual's disease. We present models for genes causing inherited cardiac conditions, though the framework is transferable to other genes and syndromes.

Results: Our models report a probability of pathogenicity, rather than a categorisation into pathogenic or benign, which captures the inherent uncertainty of the prediction. We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors. The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions. Though the probability scale is continuous, and innately interpretable, performance summaries based on thresholds are useful for comparisons. Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making. A web tool APPRAISE [http://www.cardiodb.org/APPRAISE] provides access to these models and predictions.

Conclusions: Our Bayesian framework provides a transparent, flexible and robust framework for the analysis and interpretation of rare genetic variants. Models tailored to specific genes outperform genome-wide approaches, and can be sufficiently accurate to inform clinical decision-making.

No MeSH data available.


Related in: MedlinePlus

Pathogenicity prediction models for Brugada syndrome. The receiver operating characteristic curve for the full model for BrS is shown alongside that for LQTS (as in Figure 3) for comparison. Sensitivity could be improved at low false positive rates by building a combined model, in which some parameters were fit jointly for the LQTS and BrS models (see text for details) to compensate for the smaller BrS training set. Joint fitting does not impede performance of the LQTS model. BrS, Brugada syndrome; LQTS, long QT syndrome.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4308924&req=5

Fig4: Pathogenicity prediction models for Brugada syndrome. The receiver operating characteristic curve for the full model for BrS is shown alongside that for LQTS (as in Figure 3) for comparison. Sensitivity could be improved at low false positive rates by building a combined model, in which some parameters were fit jointly for the LQTS and BrS models (see text for details) to compensate for the smaller BrS training set. Joint fitting does not impede performance of the LQTS model. BrS, Brugada syndrome; LQTS, long QT syndrome.

Mentions: The results in Figure 4 show that the sensitivity attained by the initial full model is close to 65% at a false positive rate of 10%, increasing to 75% at a false positive rate of 20%. Simpler models, excluding gene or domain effects, were inferior as for LQTS, and are not shown. In the magnified panel, the curve is nearly horizontal near the origin (false positive rate 0 - 0.03) due to a few benign variants having P(pathogenic)≈1. A closer inspection revealed that two missense variants had a high probability of pathogenicity due to a specific combination of predictors: large PolyPhen probability, SIFT score close to 0, frequency absent and a moderate to large Grantham score. Also one benign radical variant in SCN5A had P(pathogenic)≈1: there was no frequency data for it.Figure 4


Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity.

Ruklisa D, Ware JS, Walsh R, Balding DJ, Cook SA - Genome Med (2015)

Pathogenicity prediction models for Brugada syndrome. The receiver operating characteristic curve for the full model for BrS is shown alongside that for LQTS (as in Figure 3) for comparison. Sensitivity could be improved at low false positive rates by building a combined model, in which some parameters were fit jointly for the LQTS and BrS models (see text for details) to compensate for the smaller BrS training set. Joint fitting does not impede performance of the LQTS model. BrS, Brugada syndrome; LQTS, long QT syndrome.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4308924&req=5

Fig4: Pathogenicity prediction models for Brugada syndrome. The receiver operating characteristic curve for the full model for BrS is shown alongside that for LQTS (as in Figure 3) for comparison. Sensitivity could be improved at low false positive rates by building a combined model, in which some parameters were fit jointly for the LQTS and BrS models (see text for details) to compensate for the smaller BrS training set. Joint fitting does not impede performance of the LQTS model. BrS, Brugada syndrome; LQTS, long QT syndrome.
Mentions: The results in Figure 4 show that the sensitivity attained by the initial full model is close to 65% at a false positive rate of 10%, increasing to 75% at a false positive rate of 20%. Simpler models, excluding gene or domain effects, were inferior as for LQTS, and are not shown. In the magnified panel, the curve is nearly horizontal near the origin (false positive rate 0 - 0.03) due to a few benign variants having P(pathogenic)≈1. A closer inspection revealed that two missense variants had a high probability of pathogenicity due to a specific combination of predictors: large PolyPhen probability, SIFT score close to 0, frequency absent and a moderate to large Grantham score. Also one benign radical variant in SCN5A had P(pathogenic)≈1: there was no frequency data for it.Figure 4

Bottom Line: We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors.The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions.Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making.

View Article: PubMed Central - PubMed

Affiliation: UCL Genetics Institute, London, UK.

ABSTRACT

Background: With the advent of affordable and comprehensive sequencing technologies, access to molecular genetics for clinical diagnostics and research applications is increasing. However, variant interpretation remains challenging, and tools that close the gap between data generation and data interpretation are urgently required. Here we present a transferable approach to help address the limitations in variant annotation.

Methods: We develop a network of Bayesian logistic regression models that integrate multiple lines of evidence to evaluate the probability that a rare variant is the cause of an individual's disease. We present models for genes causing inherited cardiac conditions, though the framework is transferable to other genes and syndromes.

Results: Our models report a probability of pathogenicity, rather than a categorisation into pathogenic or benign, which captures the inherent uncertainty of the prediction. We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors. The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions. Though the probability scale is continuous, and innately interpretable, performance summaries based on thresholds are useful for comparisons. Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making. A web tool APPRAISE [http://www.cardiodb.org/APPRAISE] provides access to these models and predictions.

Conclusions: Our Bayesian framework provides a transparent, flexible and robust framework for the analysis and interpretation of rare genetic variants. Models tailored to specific genes outperform genome-wide approaches, and can be sufficiently accurate to inform clinical decision-making.

No MeSH data available.


Related in: MedlinePlus