Limits...
Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity.

Ruklisa D, Ware JS, Walsh R, Balding DJ, Cook SA - Genome Med (2015)

Bottom Line: We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors.The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions.Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making.

View Article: PubMed Central - PubMed

Affiliation: UCL Genetics Institute, London, UK.

ABSTRACT

Background: With the advent of affordable and comprehensive sequencing technologies, access to molecular genetics for clinical diagnostics and research applications is increasing. However, variant interpretation remains challenging, and tools that close the gap between data generation and data interpretation are urgently required. Here we present a transferable approach to help address the limitations in variant annotation.

Methods: We develop a network of Bayesian logistic regression models that integrate multiple lines of evidence to evaluate the probability that a rare variant is the cause of an individual's disease. We present models for genes causing inherited cardiac conditions, though the framework is transferable to other genes and syndromes.

Results: Our models report a probability of pathogenicity, rather than a categorisation into pathogenic or benign, which captures the inherent uncertainty of the prediction. We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors. The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions. Though the probability scale is continuous, and innately interpretable, performance summaries based on thresholds are useful for comparisons. Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making. A web tool APPRAISE [http://www.cardiodb.org/APPRAISE] provides access to these models and predictions.

Conclusions: Our Bayesian framework provides a transparent, flexible and robust framework for the analysis and interpretation of rare genetic variants. Models tailored to specific genes outperform genome-wide approaches, and can be sufficiently accurate to inform clinical decision-making.

No MeSH data available.


Related in: MedlinePlus

Comparison of pathogenicity prediction models for LQTS. Receiver operating characteristic curves are shown for four nested LQTS models, as well as for SIFT with and without the addition of prior odds. The inner plot shows the false positive rate from 0 to 0.1, while the axis of the outer plot spans the false positive rate from 0 to 1. See text for explanation of the models. LQTS, long QT syndrome.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4308924&req=5

Fig3: Comparison of pathogenicity prediction models for LQTS. Receiver operating characteristic curves are shown for four nested LQTS models, as well as for SIFT with and without the addition of prior odds. The inner plot shows the false positive rate from 0 to 0.1, while the axis of the outer plot spans the false positive rate from 0 to 1. See text for explanation of the models. LQTS, long QT syndrome.

Mentions: Figure 3 depicts the receiver operating characteristic (ROC) curves for this model, using predictions for 3,200 test variants from the 100 data splits, and Additional file 1: Figure S2 shows alternative representations of model performance. Ranking variants according to their probability of pathogenicity, the top 84 places are taken by pathogenic variants (53% of pathogenic variants). The PPV associated with a particular threshold for P(pathogenic) depends on the prior odds. From Table 1, prior odds were 45 for KCNQ1, 25 for KCNH2 and 4 for SCN5A, giving overall prior odds of 25 for these three genes. If we choose a threshold of P(pathogenic)>0.9, then the combined PPV for KCNQ1, KCNH2 and SCN5A is 0.999. At this threshold, 79% of pathogenic variants in KCNQ1, 82% of KCNH2 and 39% of SCN5A were identified, with a combined sensitivity of 76% across the three genes. For the other genes that much more rarely cause LQTS, sensitivity is lower (25%), but the PPV remains high (1). As the training data are relatively sparse for these genes, predictions are appropriately cautious. Assessing across all LQTS genes, the combined sensitivity is 73%, with PPV >0.999. Performance metrics at other thresholds are shown in Additional file 1: Figure S2 and Table S5.Figure 3


Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity.

Ruklisa D, Ware JS, Walsh R, Balding DJ, Cook SA - Genome Med (2015)

Comparison of pathogenicity prediction models for LQTS. Receiver operating characteristic curves are shown for four nested LQTS models, as well as for SIFT with and without the addition of prior odds. The inner plot shows the false positive rate from 0 to 0.1, while the axis of the outer plot spans the false positive rate from 0 to 1. See text for explanation of the models. LQTS, long QT syndrome.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4308924&req=5

Fig3: Comparison of pathogenicity prediction models for LQTS. Receiver operating characteristic curves are shown for four nested LQTS models, as well as for SIFT with and without the addition of prior odds. The inner plot shows the false positive rate from 0 to 0.1, while the axis of the outer plot spans the false positive rate from 0 to 1. See text for explanation of the models. LQTS, long QT syndrome.
Mentions: Figure 3 depicts the receiver operating characteristic (ROC) curves for this model, using predictions for 3,200 test variants from the 100 data splits, and Additional file 1: Figure S2 shows alternative representations of model performance. Ranking variants according to their probability of pathogenicity, the top 84 places are taken by pathogenic variants (53% of pathogenic variants). The PPV associated with a particular threshold for P(pathogenic) depends on the prior odds. From Table 1, prior odds were 45 for KCNQ1, 25 for KCNH2 and 4 for SCN5A, giving overall prior odds of 25 for these three genes. If we choose a threshold of P(pathogenic)>0.9, then the combined PPV for KCNQ1, KCNH2 and SCN5A is 0.999. At this threshold, 79% of pathogenic variants in KCNQ1, 82% of KCNH2 and 39% of SCN5A were identified, with a combined sensitivity of 76% across the three genes. For the other genes that much more rarely cause LQTS, sensitivity is lower (25%), but the PPV remains high (1). As the training data are relatively sparse for these genes, predictions are appropriately cautious. Assessing across all LQTS genes, the combined sensitivity is 73%, with PPV >0.999. Performance metrics at other thresholds are shown in Additional file 1: Figure S2 and Table S5.Figure 3

Bottom Line: We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors.The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions.Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making.

View Article: PubMed Central - PubMed

Affiliation: UCL Genetics Institute, London, UK.

ABSTRACT

Background: With the advent of affordable and comprehensive sequencing technologies, access to molecular genetics for clinical diagnostics and research applications is increasing. However, variant interpretation remains challenging, and tools that close the gap between data generation and data interpretation are urgently required. Here we present a transferable approach to help address the limitations in variant annotation.

Methods: We develop a network of Bayesian logistic regression models that integrate multiple lines of evidence to evaluate the probability that a rare variant is the cause of an individual's disease. We present models for genes causing inherited cardiac conditions, though the framework is transferable to other genes and syndromes.

Results: Our models report a probability of pathogenicity, rather than a categorisation into pathogenic or benign, which captures the inherent uncertainty of the prediction. We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors. The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions. Though the probability scale is continuous, and innately interpretable, performance summaries based on thresholds are useful for comparisons. Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making. A web tool APPRAISE [http://www.cardiodb.org/APPRAISE] provides access to these models and predictions.

Conclusions: Our Bayesian framework provides a transparent, flexible and robust framework for the analysis and interpretation of rare genetic variants. Models tailored to specific genes outperform genome-wide approaches, and can be sufficiently accurate to inform clinical decision-making.

No MeSH data available.


Related in: MedlinePlus