Limits...
Detecting rare structural variation in evolving microbial populations from new sequence junctions using breseq.

Deatherage DE, Traverse CC, Wolf LN, Barrick JE - Front Genet (2015)

Bottom Line: Overall, SV accounted for ~25% of the genetic diversity found in these samples.We also found, unexpectedly, that mutations in two genes that rose to prominence at these early time points always went extinct in the long term.We anticipate that this functionality of breseq will be useful for providing a more complete picture of genome dynamics during evolution experiments with haploid microorganisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Center for Computational Biology and Bioinformatics, Institute for Cellular and Molecular Biology, The University of Texas at Austin Austin, TX, USA.

ABSTRACT
New mutations leading to structural variation (SV) in genomes-in the form of mobile element insertions, large deletions, gene duplications, and other chromosomal rearrangements-can play a key role in microbial evolution. Yet, SV is considerably more difficult to predict from short-read genome resequencing data than single-nucleotide substitutions and indels (SN), so it is not yet routinely identified in studies that profile population-level genetic diversity over time in evolution experiments. We implemented an algorithm for detecting polymorphic SV as part of the breseq computational pipeline. This procedure examines split-read alignments, in which the two ends of a single sequencing read match disjoint locations in the reference genome, in order to detect structural variants and estimate their frequencies within a sample. We tested our algorithm using simulated Escherichia coli data and then applied it to 500- and 1000-generation population samples from the Lenski E. coli long-term evolution experiment (LTEE). Knowledge of genes that are targets of selection in the LTEE and mutations present in previously analyzed clonal isolates allowed us to evaluate the accuracy of our procedure. Overall, SV accounted for ~25% of the genetic diversity found in these samples. By profiling rare SV, we were able to identify many cases where alternative mutations in key genes transiently competed within a single population. We also found, unexpectedly, that mutations in two genes that rose to prominence at these early time points always went extinct in the long term. Because it is not limited by the base-calling error rate of the sequencing technology, our approach for identifying rare SV in whole-population samples may have a lower detection limit than similar predictions of SNs in these data sets. We anticipate that this functionality of breseq will be useful for providing a more complete picture of genome dynamics during evolution experiments with haploid microorganisms.

No MeSH data available.


Related in: MedlinePlus

Population-wide genetic diversity early in the LTEE. Symbols indicate whether a mutation is a single-base substitution, insertion, or deletion (SN); or structural variant (SV). Colors indicate mutations in key genes, as discussed in the main text. The “Fix” column shows which alleles were present in clones sampled from these populations much later in the experiment (≥20,000 generations), presumably because they eventually swept to fixation and were present in 100% of the surviving population. Fixation data was only available for eight of the populations.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4301190&req=5

Figure 8: Population-wide genetic diversity early in the LTEE. Symbols indicate whether a mutation is a single-base substitution, insertion, or deletion (SN); or structural variant (SV). Colors indicate mutations in key genes, as discussed in the main text. The “Fix” column shows which alleles were present in clones sampled from these populations much later in the experiment (≥20,000 generations), presumably because they eventually swept to fixation and were present in 100% of the surviving population. Fixation data was only available for eight of the populations.

Mentions: Tracking the early dynamics of all alleles in these populations reveals new information about how different combinations and orderings of mutations competed early in the LTEE (Figure 8). For eight of the populations, we were able to determine the long-term fates of these early alleles as extinction or fixation from whole-genome sequencing data from later clonal isolates (Materials and Methods). With respect to evaluating the performance of the SV prediction algorithm, we did not see any obvious cases of false-positive predictions that were incompatible with the expected asexual evolutionary dynamics. Mutations in diverged lineages are unable recombine into the same genetic background, so they should be consistent with perfectly nested sets of mutations. Thus, a false-positive prediction would be evident, for example, if one mutation were present at 50% frequency at both 500 and 1000 generations while a different mutation increased from 10% to 90% frequency. We also do not see any glaring omissions in the mutations predicted. For example, there are not any populations completely lacking high-frequency mutations early in the evolution experiment, which might have indicated that there were missing mutation predictions (i.e., false-negatives). Therefore, we believe that the polymorphic SV prediction procedure implemented in breseq accurately reveals all, or at least most, of the dynamics of important mutations in the LTEE population samples.


Detecting rare structural variation in evolving microbial populations from new sequence junctions using breseq.

Deatherage DE, Traverse CC, Wolf LN, Barrick JE - Front Genet (2015)

Population-wide genetic diversity early in the LTEE. Symbols indicate whether a mutation is a single-base substitution, insertion, or deletion (SN); or structural variant (SV). Colors indicate mutations in key genes, as discussed in the main text. The “Fix” column shows which alleles were present in clones sampled from these populations much later in the experiment (≥20,000 generations), presumably because they eventually swept to fixation and were present in 100% of the surviving population. Fixation data was only available for eight of the populations.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4301190&req=5

Figure 8: Population-wide genetic diversity early in the LTEE. Symbols indicate whether a mutation is a single-base substitution, insertion, or deletion (SN); or structural variant (SV). Colors indicate mutations in key genes, as discussed in the main text. The “Fix” column shows which alleles were present in clones sampled from these populations much later in the experiment (≥20,000 generations), presumably because they eventually swept to fixation and were present in 100% of the surviving population. Fixation data was only available for eight of the populations.
Mentions: Tracking the early dynamics of all alleles in these populations reveals new information about how different combinations and orderings of mutations competed early in the LTEE (Figure 8). For eight of the populations, we were able to determine the long-term fates of these early alleles as extinction or fixation from whole-genome sequencing data from later clonal isolates (Materials and Methods). With respect to evaluating the performance of the SV prediction algorithm, we did not see any obvious cases of false-positive predictions that were incompatible with the expected asexual evolutionary dynamics. Mutations in diverged lineages are unable recombine into the same genetic background, so they should be consistent with perfectly nested sets of mutations. Thus, a false-positive prediction would be evident, for example, if one mutation were present at 50% frequency at both 500 and 1000 generations while a different mutation increased from 10% to 90% frequency. We also do not see any glaring omissions in the mutations predicted. For example, there are not any populations completely lacking high-frequency mutations early in the evolution experiment, which might have indicated that there were missing mutation predictions (i.e., false-negatives). Therefore, we believe that the polymorphic SV prediction procedure implemented in breseq accurately reveals all, or at least most, of the dynamics of important mutations in the LTEE population samples.

Bottom Line: Overall, SV accounted for ~25% of the genetic diversity found in these samples.We also found, unexpectedly, that mutations in two genes that rose to prominence at these early time points always went extinct in the long term.We anticipate that this functionality of breseq will be useful for providing a more complete picture of genome dynamics during evolution experiments with haploid microorganisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Center for Computational Biology and Bioinformatics, Institute for Cellular and Molecular Biology, The University of Texas at Austin Austin, TX, USA.

ABSTRACT
New mutations leading to structural variation (SV) in genomes-in the form of mobile element insertions, large deletions, gene duplications, and other chromosomal rearrangements-can play a key role in microbial evolution. Yet, SV is considerably more difficult to predict from short-read genome resequencing data than single-nucleotide substitutions and indels (SN), so it is not yet routinely identified in studies that profile population-level genetic diversity over time in evolution experiments. We implemented an algorithm for detecting polymorphic SV as part of the breseq computational pipeline. This procedure examines split-read alignments, in which the two ends of a single sequencing read match disjoint locations in the reference genome, in order to detect structural variants and estimate their frequencies within a sample. We tested our algorithm using simulated Escherichia coli data and then applied it to 500- and 1000-generation population samples from the Lenski E. coli long-term evolution experiment (LTEE). Knowledge of genes that are targets of selection in the LTEE and mutations present in previously analyzed clonal isolates allowed us to evaluate the accuracy of our procedure. Overall, SV accounted for ~25% of the genetic diversity found in these samples. By profiling rare SV, we were able to identify many cases where alternative mutations in key genes transiently competed within a single population. We also found, unexpectedly, that mutations in two genes that rose to prominence at these early time points always went extinct in the long term. Because it is not limited by the base-calling error rate of the sequencing technology, our approach for identifying rare SV in whole-population samples may have a lower detection limit than similar predictions of SNs in these data sets. We anticipate that this functionality of breseq will be useful for providing a more complete picture of genome dynamics during evolution experiments with haploid microorganisms.

No MeSH data available.


Related in: MedlinePlus