Limits...
Estimating Exceptionally Rare Germline and Somatic Mutation Frequencies via Next Generation Sequencing.

Eboreime J, Choi SK, Yoon SR, Arnheim N, Calabrese P - PLoS ONE (2016)

Bottom Line: These rates far exceed the well documented human genome average frequency per base pair (~10-8) suggesting a non-biological explanation for our data.By computational modeling and a new experimental procedure to distinguish between pre-mutagenic lesion base mismatches and a fully mutated base pair in the original DNA molecule, we argue that most of the base-dependent variation in background frequency is due to a mixture of deamination and oxidation during the first two PCR cycles.We also discuss the limits and possibilities of this and other methods to measure exceptionally rare mutation frequencies, and we present calculations for other scientists seeking to design their own such experiments.

View Article: PubMed Central - PubMed

Affiliation: Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA 90089-2910, United States of America.

ABSTRACT
We used targeted next generation deep-sequencing (Safe Sequencing System) to measure ultra-rare de novo mutation frequencies in the human male germline by attaching a unique identifier code to each target DNA molecule. Segments from three different human genes (FGFR3, MECP2 and PTPN11) were studied. Regardless of the gene segment, the particular testis donor or the 73 different testis pieces used, the frequencies for any one of the six different mutation types were consistent. Averaging over the C>T/G>A and G>T/C>A mutation types the background mutation frequency was 2.6x10-5 per base pair, while for the four other mutation types the average background frequency was lower at 1.5x10-6 per base pair. These rates far exceed the well documented human genome average frequency per base pair (~10-8) suggesting a non-biological explanation for our data. By computational modeling and a new experimental procedure to distinguish between pre-mutagenic lesion base mismatches and a fully mutated base pair in the original DNA molecule, we argue that most of the base-dependent variation in background frequency is due to a mixture of deamination and oxidation during the first two PCR cycles. Finally, we looked at a previously studied disease mutation in the PTPN11 gene and could easily distinguish true mutations from the SSS background. We also discuss the limits and possibilities of this and other methods to measure exceptionally rare mutation frequencies, and we present calculations for other scientists seeking to design their own such experiments.

No MeSH data available.


Related in: MedlinePlus

PTPN11 mutation frequencies color-coded by reference base.Each dot represents the average of 9 different libraries from three testis pieces. Red indicates an A or T base, blue indicates a C or G (non-CpG) base, and green indicates a CpG. The mutation frequency is the sum of all mutations at that site, so, e.g., if a site is a C, the mutation frequency is the sum of the C>A, C>G, and C>T frequencies at that site. The 95% confidence interval for each position is also shown. The data on one base pair (at position 80; c.922) has not been included since it has a much greater mutation frequency; for an explanation, see below.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4920415&req=5

pone.0158340.g003: PTPN11 mutation frequencies color-coded by reference base.Each dot represents the average of 9 different libraries from three testis pieces. Red indicates an A or T base, blue indicates a C or G (non-CpG) base, and green indicates a CpG. The mutation frequency is the sum of all mutations at that site, so, e.g., if a site is a C, the mutation frequency is the sum of the C>A, C>G, and C>T frequencies at that site. The 95% confidence interval for each position is also shown. The data on one base pair (at position 80; c.922) has not been included since it has a much greater mutation frequency; for an explanation, see below.

Mentions: We carried out UID analysis on the PTPN11 data. We only considered those base pairs that were not part of the primers, since the estimates at primer sequences would also include the errors introduced during primer synthesis [1]. The average mutation frequency (over all 93 bp and the 3 testis pieces, nine libraries total) at these base pairs was surprisingly high at 3.7x10-5 per site. However, the mutation frequency strongly depended on the identity of the base. For a base pair where the reference was an A or T the average mutation frequency was 8.7x10-6, while for base pairs where the reference was a C or G the average was 8.0x10-5. We observed a similar difference based on the identity of the base when we tested this assay on blood DNA, indicating that this difference is not an artifact of studying germline cells (S1 Table). When we further analyzed data obtained from somatic DNA [1] (kindly provided by Isaac Kinde), we similarly observed a difference (3-fold) in the region of the CTNNB1 gene these researchers studied; this region did not contain any CpG sites, perhaps contributing to the smaller difference. Fig 3 demonstrates this discrepancy for PTPN11. Further investigation revealed that the C>T/G>A mutations and G>T/C>A mutations were responsible for this higher average.


Estimating Exceptionally Rare Germline and Somatic Mutation Frequencies via Next Generation Sequencing.

Eboreime J, Choi SK, Yoon SR, Arnheim N, Calabrese P - PLoS ONE (2016)

PTPN11 mutation frequencies color-coded by reference base.Each dot represents the average of 9 different libraries from three testis pieces. Red indicates an A or T base, blue indicates a C or G (non-CpG) base, and green indicates a CpG. The mutation frequency is the sum of all mutations at that site, so, e.g., if a site is a C, the mutation frequency is the sum of the C>A, C>G, and C>T frequencies at that site. The 95% confidence interval for each position is also shown. The data on one base pair (at position 80; c.922) has not been included since it has a much greater mutation frequency; for an explanation, see below.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4920415&req=5

pone.0158340.g003: PTPN11 mutation frequencies color-coded by reference base.Each dot represents the average of 9 different libraries from three testis pieces. Red indicates an A or T base, blue indicates a C or G (non-CpG) base, and green indicates a CpG. The mutation frequency is the sum of all mutations at that site, so, e.g., if a site is a C, the mutation frequency is the sum of the C>A, C>G, and C>T frequencies at that site. The 95% confidence interval for each position is also shown. The data on one base pair (at position 80; c.922) has not been included since it has a much greater mutation frequency; for an explanation, see below.
Mentions: We carried out UID analysis on the PTPN11 data. We only considered those base pairs that were not part of the primers, since the estimates at primer sequences would also include the errors introduced during primer synthesis [1]. The average mutation frequency (over all 93 bp and the 3 testis pieces, nine libraries total) at these base pairs was surprisingly high at 3.7x10-5 per site. However, the mutation frequency strongly depended on the identity of the base. For a base pair where the reference was an A or T the average mutation frequency was 8.7x10-6, while for base pairs where the reference was a C or G the average was 8.0x10-5. We observed a similar difference based on the identity of the base when we tested this assay on blood DNA, indicating that this difference is not an artifact of studying germline cells (S1 Table). When we further analyzed data obtained from somatic DNA [1] (kindly provided by Isaac Kinde), we similarly observed a difference (3-fold) in the region of the CTNNB1 gene these researchers studied; this region did not contain any CpG sites, perhaps contributing to the smaller difference. Fig 3 demonstrates this discrepancy for PTPN11. Further investigation revealed that the C>T/G>A mutations and G>T/C>A mutations were responsible for this higher average.

Bottom Line: These rates far exceed the well documented human genome average frequency per base pair (~10-8) suggesting a non-biological explanation for our data.By computational modeling and a new experimental procedure to distinguish between pre-mutagenic lesion base mismatches and a fully mutated base pair in the original DNA molecule, we argue that most of the base-dependent variation in background frequency is due to a mixture of deamination and oxidation during the first two PCR cycles.We also discuss the limits and possibilities of this and other methods to measure exceptionally rare mutation frequencies, and we present calculations for other scientists seeking to design their own such experiments.

View Article: PubMed Central - PubMed

Affiliation: Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA 90089-2910, United States of America.

ABSTRACT
We used targeted next generation deep-sequencing (Safe Sequencing System) to measure ultra-rare de novo mutation frequencies in the human male germline by attaching a unique identifier code to each target DNA molecule. Segments from three different human genes (FGFR3, MECP2 and PTPN11) were studied. Regardless of the gene segment, the particular testis donor or the 73 different testis pieces used, the frequencies for any one of the six different mutation types were consistent. Averaging over the C>T/G>A and G>T/C>A mutation types the background mutation frequency was 2.6x10-5 per base pair, while for the four other mutation types the average background frequency was lower at 1.5x10-6 per base pair. These rates far exceed the well documented human genome average frequency per base pair (~10-8) suggesting a non-biological explanation for our data. By computational modeling and a new experimental procedure to distinguish between pre-mutagenic lesion base mismatches and a fully mutated base pair in the original DNA molecule, we argue that most of the base-dependent variation in background frequency is due to a mixture of deamination and oxidation during the first two PCR cycles. Finally, we looked at a previously studied disease mutation in the PTPN11 gene and could easily distinguish true mutations from the SSS background. We also discuss the limits and possibilities of this and other methods to measure exceptionally rare mutation frequencies, and we present calculations for other scientists seeking to design their own such experiments.

No MeSH data available.


Related in: MedlinePlus