Limits...
Sequence characteristics of T4-like bacteriophage IME08 benome termini revealed by high throughput sequencing.

Jiang X, Jiang H, Li C, Wang S, Mi Z, An X, Chen J, Tong Y - Virol. J. (2011)

Bottom Line: The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner; genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed; we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A.The presence of a consensus sequence TTGGA/G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner.Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly.

View Article: PubMed Central - HTML - PubMed

Affiliation: Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China.

ABSTRACT

Background: T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner;

Methods: genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed;

Results: we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A. The presence of a consensus sequence TTGGA/G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner. Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly. The sequence-preferred cleavage may produce sticky-ends, but with each end being packaged with different efficiencies;

Conclusions: this study illustrates how high throughput sequencing can be used to probe replication and packaging mechanisms in bacteriophages and/or viruses.

Show MeSH

Related in: MedlinePlus

Frequency distribution of the top 20 high frequency sequences (HFSs) in the original sequencing data files.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105952&req=5

Figure 2: Frequency distribution of the top 20 high frequency sequences (HFSs) in the original sequencing data files.

Mentions: It may be possible to find which reads are located at the genome termini, given the large volume of sequence data. If genome termini are generated randomly, there should not be a very high frequency of individual read sequences. To determine sequence frequency, raw read files (1.fq and 2.fq) were sorted by sequence and the occurrence of each unique sequence was calculated. Frequency statistics (Figure 1) demonstrated that most sequences in both raw files have 6-22 occurrences, with the most frequent occurrence at 13. The read sequences that occurred 6-22 times comprise about 70% of all sequences, excluding single occurrences. Single read sequences probably contain many sequences derived from quasi-species (with single nucleotide polymorphisms, SNPs), or from sequencing errors (with one or more base-calling errors). In contrast, although the average occurrence is 13, high frequency sequences (HFSs) had up to 400 occurrences in a single raw sequencing file (Table 1 and Figure 2). Further analysis showed that the HFSs in 1.fq and 2.fq read files contain identical sequences. These sequences are unique since they occur only once in the assembled genome. BLAST analysis revealed that these sequences have homology with unique sequences of other evolutionally related T4-like bacteriophages (T4, JS98 and JS10). There is no evidence of endogenous plasmid contamination in the raw sequencing data, making it unlikely that these HFSs arise from contaminated multi-copy plasmids. Further analysis demonstrated that the paired end sequences of these HFSs occurred at normal frequencies (data not shown), indicating that they were not produced by PCR amplification prior to cluster generation during the Solexa sequencing process. The read sequences (forward) within a few bases upstream or downstream of the HFSs in the genome occurred at normal frequencies, which again suggests that the HFS reads were not sequence-independently generated from particular vicinities.


Sequence characteristics of T4-like bacteriophage IME08 benome termini revealed by high throughput sequencing.

Jiang X, Jiang H, Li C, Wang S, Mi Z, An X, Chen J, Tong Y - Virol. J. (2011)

Frequency distribution of the top 20 high frequency sequences (HFSs) in the original sequencing data files.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105952&req=5

Figure 2: Frequency distribution of the top 20 high frequency sequences (HFSs) in the original sequencing data files.
Mentions: It may be possible to find which reads are located at the genome termini, given the large volume of sequence data. If genome termini are generated randomly, there should not be a very high frequency of individual read sequences. To determine sequence frequency, raw read files (1.fq and 2.fq) were sorted by sequence and the occurrence of each unique sequence was calculated. Frequency statistics (Figure 1) demonstrated that most sequences in both raw files have 6-22 occurrences, with the most frequent occurrence at 13. The read sequences that occurred 6-22 times comprise about 70% of all sequences, excluding single occurrences. Single read sequences probably contain many sequences derived from quasi-species (with single nucleotide polymorphisms, SNPs), or from sequencing errors (with one or more base-calling errors). In contrast, although the average occurrence is 13, high frequency sequences (HFSs) had up to 400 occurrences in a single raw sequencing file (Table 1 and Figure 2). Further analysis showed that the HFSs in 1.fq and 2.fq read files contain identical sequences. These sequences are unique since they occur only once in the assembled genome. BLAST analysis revealed that these sequences have homology with unique sequences of other evolutionally related T4-like bacteriophages (T4, JS98 and JS10). There is no evidence of endogenous plasmid contamination in the raw sequencing data, making it unlikely that these HFSs arise from contaminated multi-copy plasmids. Further analysis demonstrated that the paired end sequences of these HFSs occurred at normal frequencies (data not shown), indicating that they were not produced by PCR amplification prior to cluster generation during the Solexa sequencing process. The read sequences (forward) within a few bases upstream or downstream of the HFSs in the genome occurred at normal frequencies, which again suggests that the HFS reads were not sequence-independently generated from particular vicinities.

Bottom Line: The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner; genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed; we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A.The presence of a consensus sequence TTGGA/G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner.Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly.

View Article: PubMed Central - HTML - PubMed

Affiliation: Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China.

ABSTRACT

Background: T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner;

Methods: genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed;

Results: we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A. The presence of a consensus sequence TTGGA/G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner. Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly. The sequence-preferred cleavage may produce sticky-ends, but with each end being packaged with different efficiencies;

Conclusions: this study illustrates how high throughput sequencing can be used to probe replication and packaging mechanisms in bacteriophages and/or viruses.

Show MeSH
Related in: MedlinePlus