Limits...
Classifying Variants of Undetermined Significance in BRCA2 with protein likelihood ratios.

Karchin R, Agarwal M, Sali A, Couch F, Beattie MS - Cancer Inform (2008)

Bottom Line: Bioinformatics approaches for predicting the impact of these variants have not yet found their footing in clinical practice because 1) interpreting the medical relevance of predictive scores is difficult; 2) the relationship between bioinformatics "predictors" (sequence conservation, protein structure) and cancer susceptibility is not understood.Protein likelihood ratios are computed for 229 unclassified variants found in individuals from high-risk breast/ovarian cancer families.Preliminary results indicate that our method is less likely to make false positive errors than other bioinformatics methods, which were designed to predict the impact of missense mutations in general.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA. karchin@karchinlab.org

ABSTRACT

Background: Missense (amino-acid changing) variants found in cancer predisposition genes often create difficulties when clinically interpreting genetic testing results. Although bioinformatics has developed approaches to predicting the impact of these variants, many of these approaches have not been readily applicable in the clinical setting. Bioinformatics approaches for predicting the impact of these variants have not yet found their footing in clinical practice because 1) interpreting the medical relevance of predictive scores is difficult; 2) the relationship between bioinformatics "predictors" (sequence conservation, protein structure) and cancer susceptibility is not understood.

Methodology/principal findings: We present a computational method that produces a probabilistic likelihood ratio predictive of whether a missense variant impairs protein function. We apply the method to a tumor suppressor gene, BRCA2, whose loss of function is important to cancer susceptibility. Protein likelihood ratios are computed for 229 unclassified variants found in individuals from high-risk breast/ovarian cancer families. We map the variants onto a protein structure model, and suggest that a cluster of predicted deleterious variants in the BRCA2 OB1 domain may destabilize BRCA2 and a protein binding partner, the small acidic protein DSS1. We compare our predictions with variant "re-classifications" provided by Myriad Genetics, a biotechnology company that holds the patent on BRCA2 genetic testing in the U.S., and with classifications made by an established medical genetics model [1]. Our approach uses bioinformatics data that is independent of these genetics-based classifications and yet shows significant agreement with them. Preliminary results indicate that our method is less likely to make false positive errors than other bioinformatics methods, which were designed to predict the impact of missense mutations in general.

Conclusions/significance: Missense mutations are the most common disease-producing genetic variants. We present a fast, scalable bioinformatics method that integrates information about protein sequence, conservation, and structure in a likelihood ratio that can be integrated with medical genetics likelihood ratios. The protein likelihood ratio, together with medical genetics likelihood ratios, can be used by clinicians and counselors to communicate the relevance of a VUS to the individual who has that VUS. The approach described here is generalizable to regions of any tumor suppressor gene that have been structurally determined by X-ray crystallography or for which a protein homology model can be built.

No MeSH data available.


Related in: MedlinePlus

Figure 3a, 3b. Protein Likelihood Ratios for 223 BIC VUS in the C-terminal DNA binding domains of BRCA2. Protein likelihood ratios are shown on a Log10 scale with classifications of Deleterious, Neutral, or Not Predicted. Variants are classified as Neutral when Protein Likelihood Ratio <= 0.61 (blue dotted line at −0.21 Log scale) and Deleterious when Protein Likelihood Ratio >= 6.8 (red dotted line at 0.8 Log scale). Icons shown above each variant indicate the Protein Likelihood Ratio classification (red, blue, and white circles), Myriad Genetics classification (red and blue M’s), functional data from a Homology Directed Repair Assay [42] (Supplementary Table 2, red and blue test tubes), and the Integrated Likelihood model [1, 8] High Stringency (Deleterious classification requires odds of 1000:1) and Low Stringency (Deleterious classification requires odds of 100:1).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC2587343&req=5

f3-cin-6-0203: Figure 3a, 3b. Protein Likelihood Ratios for 223 BIC VUS in the C-terminal DNA binding domains of BRCA2. Protein likelihood ratios are shown on a Log10 scale with classifications of Deleterious, Neutral, or Not Predicted. Variants are classified as Neutral when Protein Likelihood Ratio <= 0.61 (blue dotted line at −0.21 Log scale) and Deleterious when Protein Likelihood Ratio >= 6.8 (red dotted line at 0.8 Log scale). Icons shown above each variant indicate the Protein Likelihood Ratio classification (red, blue, and white circles), Myriad Genetics classification (red and blue M’s), functional data from a Homology Directed Repair Assay [42] (Supplementary Table 2, red and blue test tubes), and the Integrated Likelihood model [1, 8] High Stringency (Deleterious classification requires odds of 1000:1) and Low Stringency (Deleterious classification requires odds of 100:1).

Mentions: To incorporate our method into the combined odds of causality model that has gained much acceptance in the genetic epidemiology community [1,8] requires the likelihood ratio P(S / D)/P(S / N) for each variant of interest, where S is the discriminant score. Standard machine learning methods can yield posterior probabilities of the form P(D / S) and P(N / S) and thus posterior likelihood ratios P(D / S)/P(N / S). If the prior probability that a variant is deleterious or neutral were known, we could infer this likelihood ratio from the posterior, using Bayes’ Rule. However, these priors are not currently known. Here we use an alternative method to transform discriminant scores into our desired likelihood ratios. We first express the distribution of discriminant scores for deleterious TP53 missense changes as a parameterized probability distribution of known functional form P(S / D, θD) that quantifies the probability of seeing a particular discriminant score S when the mutant induces loss of function. Likewise, we express the distribution of neutral scores in a known functional form P(S/D, θN). The protein likelihood ratio is then calculated as P(S/D, θD)/P(S/N, θN), yielding an odds ratio in favor of loss of function. Histograms of “deleterious” and “neutral” TP53 discriminant scores (Fig. 1) suggest that the scores are distributed as Generalized Extreme Value (GEV) distributions. We use maximum likelihood to fit GEV parameters for deleterious and neutral mutants using the ismev R package [18]. This approach yields GEV parameters for deleterious mutants (θD) −1.5 (location), 0.66 (scale), 0.015 (shape) and GEV parameters for neutral mutants (θN) 0.7 (location), 0.78 (scale), − 0.51 (shape). We assign thresholds for prediction confidence based on available data from medical genetics studies. Confident predictions are those whose likelihood ratios are either 1) larger than the variant with the smallest likelihood ratio but greater than 1.0 that has been reclassified as “Deleterious” or “Suspected Deleterious” by Myriad Genetics or been shown to have an Integrated Likelihood Ratio > 1,000; or 2) smaller than the ratio of the variant with the largest likelihood but less than 1.0 that has been reclassified as neutral or “Polymorphism” by Myriad (Fig. 3a, 3b, Supplementary Table 1). Predictions for VUS that lie between the thresholds are not considered reliable. These thresholds can be modified as new information becomes available.


Classifying Variants of Undetermined Significance in BRCA2 with protein likelihood ratios.

Karchin R, Agarwal M, Sali A, Couch F, Beattie MS - Cancer Inform (2008)

Figure 3a, 3b. Protein Likelihood Ratios for 223 BIC VUS in the C-terminal DNA binding domains of BRCA2. Protein likelihood ratios are shown on a Log10 scale with classifications of Deleterious, Neutral, or Not Predicted. Variants are classified as Neutral when Protein Likelihood Ratio <= 0.61 (blue dotted line at −0.21 Log scale) and Deleterious when Protein Likelihood Ratio >= 6.8 (red dotted line at 0.8 Log scale). Icons shown above each variant indicate the Protein Likelihood Ratio classification (red, blue, and white circles), Myriad Genetics classification (red and blue M’s), functional data from a Homology Directed Repair Assay [42] (Supplementary Table 2, red and blue test tubes), and the Integrated Likelihood model [1, 8] High Stringency (Deleterious classification requires odds of 1000:1) and Low Stringency (Deleterious classification requires odds of 100:1).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC2587343&req=5

f3-cin-6-0203: Figure 3a, 3b. Protein Likelihood Ratios for 223 BIC VUS in the C-terminal DNA binding domains of BRCA2. Protein likelihood ratios are shown on a Log10 scale with classifications of Deleterious, Neutral, or Not Predicted. Variants are classified as Neutral when Protein Likelihood Ratio <= 0.61 (blue dotted line at −0.21 Log scale) and Deleterious when Protein Likelihood Ratio >= 6.8 (red dotted line at 0.8 Log scale). Icons shown above each variant indicate the Protein Likelihood Ratio classification (red, blue, and white circles), Myriad Genetics classification (red and blue M’s), functional data from a Homology Directed Repair Assay [42] (Supplementary Table 2, red and blue test tubes), and the Integrated Likelihood model [1, 8] High Stringency (Deleterious classification requires odds of 1000:1) and Low Stringency (Deleterious classification requires odds of 100:1).
Mentions: To incorporate our method into the combined odds of causality model that has gained much acceptance in the genetic epidemiology community [1,8] requires the likelihood ratio P(S / D)/P(S / N) for each variant of interest, where S is the discriminant score. Standard machine learning methods can yield posterior probabilities of the form P(D / S) and P(N / S) and thus posterior likelihood ratios P(D / S)/P(N / S). If the prior probability that a variant is deleterious or neutral were known, we could infer this likelihood ratio from the posterior, using Bayes’ Rule. However, these priors are not currently known. Here we use an alternative method to transform discriminant scores into our desired likelihood ratios. We first express the distribution of discriminant scores for deleterious TP53 missense changes as a parameterized probability distribution of known functional form P(S / D, θD) that quantifies the probability of seeing a particular discriminant score S when the mutant induces loss of function. Likewise, we express the distribution of neutral scores in a known functional form P(S/D, θN). The protein likelihood ratio is then calculated as P(S/D, θD)/P(S/N, θN), yielding an odds ratio in favor of loss of function. Histograms of “deleterious” and “neutral” TP53 discriminant scores (Fig. 1) suggest that the scores are distributed as Generalized Extreme Value (GEV) distributions. We use maximum likelihood to fit GEV parameters for deleterious and neutral mutants using the ismev R package [18]. This approach yields GEV parameters for deleterious mutants (θD) −1.5 (location), 0.66 (scale), 0.015 (shape) and GEV parameters for neutral mutants (θN) 0.7 (location), 0.78 (scale), − 0.51 (shape). We assign thresholds for prediction confidence based on available data from medical genetics studies. Confident predictions are those whose likelihood ratios are either 1) larger than the variant with the smallest likelihood ratio but greater than 1.0 that has been reclassified as “Deleterious” or “Suspected Deleterious” by Myriad Genetics or been shown to have an Integrated Likelihood Ratio > 1,000; or 2) smaller than the ratio of the variant with the largest likelihood but less than 1.0 that has been reclassified as neutral or “Polymorphism” by Myriad (Fig. 3a, 3b, Supplementary Table 1). Predictions for VUS that lie between the thresholds are not considered reliable. These thresholds can be modified as new information becomes available.

Bottom Line: Bioinformatics approaches for predicting the impact of these variants have not yet found their footing in clinical practice because 1) interpreting the medical relevance of predictive scores is difficult; 2) the relationship between bioinformatics "predictors" (sequence conservation, protein structure) and cancer susceptibility is not understood.Protein likelihood ratios are computed for 229 unclassified variants found in individuals from high-risk breast/ovarian cancer families.Preliminary results indicate that our method is less likely to make false positive errors than other bioinformatics methods, which were designed to predict the impact of missense mutations in general.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA. karchin@karchinlab.org

ABSTRACT

Background: Missense (amino-acid changing) variants found in cancer predisposition genes often create difficulties when clinically interpreting genetic testing results. Although bioinformatics has developed approaches to predicting the impact of these variants, many of these approaches have not been readily applicable in the clinical setting. Bioinformatics approaches for predicting the impact of these variants have not yet found their footing in clinical practice because 1) interpreting the medical relevance of predictive scores is difficult; 2) the relationship between bioinformatics "predictors" (sequence conservation, protein structure) and cancer susceptibility is not understood.

Methodology/principal findings: We present a computational method that produces a probabilistic likelihood ratio predictive of whether a missense variant impairs protein function. We apply the method to a tumor suppressor gene, BRCA2, whose loss of function is important to cancer susceptibility. Protein likelihood ratios are computed for 229 unclassified variants found in individuals from high-risk breast/ovarian cancer families. We map the variants onto a protein structure model, and suggest that a cluster of predicted deleterious variants in the BRCA2 OB1 domain may destabilize BRCA2 and a protein binding partner, the small acidic protein DSS1. We compare our predictions with variant "re-classifications" provided by Myriad Genetics, a biotechnology company that holds the patent on BRCA2 genetic testing in the U.S., and with classifications made by an established medical genetics model [1]. Our approach uses bioinformatics data that is independent of these genetics-based classifications and yet shows significant agreement with them. Preliminary results indicate that our method is less likely to make false positive errors than other bioinformatics methods, which were designed to predict the impact of missense mutations in general.

Conclusions/significance: Missense mutations are the most common disease-producing genetic variants. We present a fast, scalable bioinformatics method that integrates information about protein sequence, conservation, and structure in a likelihood ratio that can be integrated with medical genetics likelihood ratios. The protein likelihood ratio, together with medical genetics likelihood ratios, can be used by clinicians and counselors to communicate the relevance of a VUS to the individual who has that VUS. The approach described here is generalizable to regions of any tumor suppressor gene that have been structurally determined by X-ray crystallography or for which a protein homology model can be built.

No MeSH data available.


Related in: MedlinePlus