Limits...
Computational DNA hole spectroscopy: A new tool to predict mutation hotspots, critical base pairs, and disease 'driver' mutations.

Villagrán MY, Miller JH - Sci Rep (2015)

Bottom Line: Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids.Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome.An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics &Texas Center for Superconductivity, University of Houston, Houston, Texas 77204-5005, USA.

ABSTRACT
We report on a new technique, computational DNA hole spectroscopy, which creates spectra of electron hole probabilities vs. nucleotide position. A hole is a site of positive charge created when an electron is removed. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of mitochondrial DNA reveal a correlation between L-strand hole spectrum peaks and spikes in the human mutation spectrum. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential disease 'driver' mutations. Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome. An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands.

No MeSH data available.


H- and L-strand hole spectra and mtDNA mutation spectrum.Blue: H-strand hole probabilities vs. nucleotide position. Green: L-strand hole probabilities. N = P/Pave is the number of holes at each site, where P is the computed hole probability and Pave is the average hole probability. Red-orange: GenBank (GB) frequencies of human mtDNA variants. H-strand peaks at 3415 and 3895 are both at the 1st codon position for conserved proline’s (P’s)33. L-strand hole peaks at positions 3437, 3665, and 3915 are discussed in the text and also labeled in Fig. 3.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4550865&req=5

f2: H- and L-strand hole spectra and mtDNA mutation spectrum.Blue: H-strand hole probabilities vs. nucleotide position. Green: L-strand hole probabilities. N = P/Pave is the number of holes at each site, where P is the computed hole probability and Pave is the average hole probability. Red-orange: GenBank (GB) frequencies of human mtDNA variants. H-strand peaks at 3415 and 3895 are both at the 1st codon position for conserved proline’s (P’s)33. L-strand hole peaks at positions 3437, 3665, and 3915 are discussed in the text and also labeled in Fig. 3.

Mentions: Figure 2 shows ND1 hole spectra for the H-strand (blue) and L-strand (green), and the GB frequency (red-orange) vs. nucleotide position, depicted for the L-strand by convention34. In Fig. 2, the hole spectra are normalized such that N would be one hole per base if uniformly distributed among both strands (Methods). The ten largest H-strand hole peaks (blue) dominate the two spectra, and correspond to guanine triplets, quadruplets, or quintuplets. Due the repeated cytosine’s, complementary to guanine, on the reference strand (L-strand), all ten of these peaks correspond to encoded prolines. The two H-strand peaks at 3415 and 3895 correspond to prolines completely conserved among at least 24 species of bacteria and eukaryotes33, for which any non-synonymous mutations would likely be deleterious or lethal, and the GB frequency is zero at both of these sites. Most of the largest GB frequency peaks are synonymous mutations—benign but providing no advantage. A notable exception is the mutation T4216C, which encodes Y304H (tyrosine → histidine) and has been found to be a possible high-altitude adaptation among Sherpas30. Several of the L-strand hole peaks, on cytosine’s, correlate with the large H-strand peaks due to holes hopping from H-strand guanine’s. However, the L-strand peaks are often shifted and/or enhanced by one or more guanine’s directly on the L-strand. The three largest L-strand peaks, at 3437, 3665, and 3915, result from L-strand guanine triplets and a quadruplet (3915 peak). They all correlate with spikes or clusters in variant frequency, as can be better seen in the magnified L-strand plot of Fig. 3. The large L-strand (green) hole peak at 3915, for example, is one of several that engulf entire clusters of mutation spectrum peaks.


Computational DNA hole spectroscopy: A new tool to predict mutation hotspots, critical base pairs, and disease 'driver' mutations.

Villagrán MY, Miller JH - Sci Rep (2015)

H- and L-strand hole spectra and mtDNA mutation spectrum.Blue: H-strand hole probabilities vs. nucleotide position. Green: L-strand hole probabilities. N = P/Pave is the number of holes at each site, where P is the computed hole probability and Pave is the average hole probability. Red-orange: GenBank (GB) frequencies of human mtDNA variants. H-strand peaks at 3415 and 3895 are both at the 1st codon position for conserved proline’s (P’s)33. L-strand hole peaks at positions 3437, 3665, and 3915 are discussed in the text and also labeled in Fig. 3.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4550865&req=5

f2: H- and L-strand hole spectra and mtDNA mutation spectrum.Blue: H-strand hole probabilities vs. nucleotide position. Green: L-strand hole probabilities. N = P/Pave is the number of holes at each site, where P is the computed hole probability and Pave is the average hole probability. Red-orange: GenBank (GB) frequencies of human mtDNA variants. H-strand peaks at 3415 and 3895 are both at the 1st codon position for conserved proline’s (P’s)33. L-strand hole peaks at positions 3437, 3665, and 3915 are discussed in the text and also labeled in Fig. 3.
Mentions: Figure 2 shows ND1 hole spectra for the H-strand (blue) and L-strand (green), and the GB frequency (red-orange) vs. nucleotide position, depicted for the L-strand by convention34. In Fig. 2, the hole spectra are normalized such that N would be one hole per base if uniformly distributed among both strands (Methods). The ten largest H-strand hole peaks (blue) dominate the two spectra, and correspond to guanine triplets, quadruplets, or quintuplets. Due the repeated cytosine’s, complementary to guanine, on the reference strand (L-strand), all ten of these peaks correspond to encoded prolines. The two H-strand peaks at 3415 and 3895 correspond to prolines completely conserved among at least 24 species of bacteria and eukaryotes33, for which any non-synonymous mutations would likely be deleterious or lethal, and the GB frequency is zero at both of these sites. Most of the largest GB frequency peaks are synonymous mutations—benign but providing no advantage. A notable exception is the mutation T4216C, which encodes Y304H (tyrosine → histidine) and has been found to be a possible high-altitude adaptation among Sherpas30. Several of the L-strand hole peaks, on cytosine’s, correlate with the large H-strand peaks due to holes hopping from H-strand guanine’s. However, the L-strand peaks are often shifted and/or enhanced by one or more guanine’s directly on the L-strand. The three largest L-strand peaks, at 3437, 3665, and 3915, result from L-strand guanine triplets and a quadruplet (3915 peak). They all correlate with spikes or clusters in variant frequency, as can be better seen in the magnified L-strand plot of Fig. 3. The large L-strand (green) hole peak at 3915, for example, is one of several that engulf entire clusters of mutation spectrum peaks.

Bottom Line: Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids.Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome.An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics &Texas Center for Superconductivity, University of Houston, Houston, Texas 77204-5005, USA.

ABSTRACT
We report on a new technique, computational DNA hole spectroscopy, which creates spectra of electron hole probabilities vs. nucleotide position. A hole is a site of positive charge created when an electron is removed. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of mitochondrial DNA reveal a correlation between L-strand hole spectrum peaks and spikes in the human mutation spectrum. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with disease-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential disease 'driver' mutations. Such integration of DNA hole and variance spectra could ultimately prove invaluable for pinpointing critical regions of the vast non-protein-coding genome. An observed asymmetry in correlations, between the spectrum of human mtDNA variations and the L- and H-strand hole spectra, is attributed to asymmetric DNA replication processes that occur for the leading and lagging strands.

No MeSH data available.