Limits...
Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis.

Tu J, Guo J, Li J, Gao S, Yao B, Lu Z - PLoS ONE (2015)

Bottom Line: The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt.The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt.Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information.

View Article: PubMed Central - PubMed

Affiliation: State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.

ABSTRACT

Background: The chimeric sequences produced by phi29 DNA polymerase, which are named as chimeras, influence the performance of the multiple displacement amplification (MDA) and also increase the difficulty of sequence data process. Despite several articles have reported the existence of chimeric sequence, there was only one research focusing on the structure and generation mechanism of chimeras, and it was merely based on hundreds of chimeras found in the sequence data of E. coli genome.

Method: We finished data mining towards a series of Next Generation Sequencing (NGS) reads which were used for whole genome haplotype assembling in a primary study. We established a bioinformatics pipeline based on subsection alignment strategy to discover all the chimeras inside and achieve their structural visualization. Then, we artificially defined two statistical indexes (the chimeric distance and the overlap length), and their regular abundance distribution helped illustrate of the structural characteristics of the chimeras. Finally we analyzed the relationship between the chimera type and the average insertion size, so that illustrate a method to decrease the proportion of wasted data in the procedure of DNA library construction.

Results/conclusion: 131.4 Gb pair-end (PE) sequence data was reanalyzed for the chimeras. Totally, 40,259,438 read pairs (6.19%) with chimerism were discovered among 650,430,811 read pairs. The chimeric sequences are consisted of two or more parts which locate inconsecutively but adjacently on the chromosome. The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt. The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt. Moreover, unmapped chimeras, which were classified as the wasted data, could be reduced by properly increasing the length of the insertion segment size through a linear correlation analysis.

Significance: This study exhibited the profile of the phi29MDA chimeras by tens of millions of chimeric sequences, and helped understand the amplification mechanism of the phi29 DNA polymerase. Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information.

No MeSH data available.


Related in: MedlinePlus

The curve of the 18 subsamples about the relationship between the average insertion size and the ratio of the insertion chimeras to the pair-end chimeras.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4595205&req=5

pone.0139857.g005: The curve of the 18 subsamples about the relationship between the average insertion size and the ratio of the insertion chimeras to the pair-end chimeras.

Mentions: All the chimeras we obtained could be alternatively separated into two kinds: insertion chimeras and pair-end chimeras. Under normal circumstances, insertion chimeras could be regarded as available data because their reads are completely mapped to the reference. However, the pair-end chimeras tend to be classified into the wasted data due to their inconformity with the genome. As the production of chimeras could not be avoided by using phi29MDA, it is necessary to investigate which factor directly influences the ratio of the insertion chimeras to the pair-end chimeras. For this purpose, we firstly hypothesized that the chimeric probability happening on each nucleotide of the PE reads was equally distributed. Since the sequenced length of the PE reads were constant (202 nt), theoretically the average insertion size was the principal parameter. The sequence data of the sample SRX247249 was divided into 18 parts according to the run number. Then their average insertion sizes and the ratios of the insertion chimeras to the pair-end chimeras were calculated (Table A in S1 file). The results revealed the significant positive correlation (R2 = 0.9691) between the two indexes (Fig 5).


Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis.

Tu J, Guo J, Li J, Gao S, Yao B, Lu Z - PLoS ONE (2015)

The curve of the 18 subsamples about the relationship between the average insertion size and the ratio of the insertion chimeras to the pair-end chimeras.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4595205&req=5

pone.0139857.g005: The curve of the 18 subsamples about the relationship between the average insertion size and the ratio of the insertion chimeras to the pair-end chimeras.
Mentions: All the chimeras we obtained could be alternatively separated into two kinds: insertion chimeras and pair-end chimeras. Under normal circumstances, insertion chimeras could be regarded as available data because their reads are completely mapped to the reference. However, the pair-end chimeras tend to be classified into the wasted data due to their inconformity with the genome. As the production of chimeras could not be avoided by using phi29MDA, it is necessary to investigate which factor directly influences the ratio of the insertion chimeras to the pair-end chimeras. For this purpose, we firstly hypothesized that the chimeric probability happening on each nucleotide of the PE reads was equally distributed. Since the sequenced length of the PE reads were constant (202 nt), theoretically the average insertion size was the principal parameter. The sequence data of the sample SRX247249 was divided into 18 parts according to the run number. Then their average insertion sizes and the ratios of the insertion chimeras to the pair-end chimeras were calculated (Table A in S1 file). The results revealed the significant positive correlation (R2 = 0.9691) between the two indexes (Fig 5).

Bottom Line: The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt.The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt.Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information.

View Article: PubMed Central - PubMed

Affiliation: State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.

ABSTRACT

Background: The chimeric sequences produced by phi29 DNA polymerase, which are named as chimeras, influence the performance of the multiple displacement amplification (MDA) and also increase the difficulty of sequence data process. Despite several articles have reported the existence of chimeric sequence, there was only one research focusing on the structure and generation mechanism of chimeras, and it was merely based on hundreds of chimeras found in the sequence data of E. coli genome.

Method: We finished data mining towards a series of Next Generation Sequencing (NGS) reads which were used for whole genome haplotype assembling in a primary study. We established a bioinformatics pipeline based on subsection alignment strategy to discover all the chimeras inside and achieve their structural visualization. Then, we artificially defined two statistical indexes (the chimeric distance and the overlap length), and their regular abundance distribution helped illustrate of the structural characteristics of the chimeras. Finally we analyzed the relationship between the chimera type and the average insertion size, so that illustrate a method to decrease the proportion of wasted data in the procedure of DNA library construction.

Results/conclusion: 131.4 Gb pair-end (PE) sequence data was reanalyzed for the chimeras. Totally, 40,259,438 read pairs (6.19%) with chimerism were discovered among 650,430,811 read pairs. The chimeric sequences are consisted of two or more parts which locate inconsecutively but adjacently on the chromosome. The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt. The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt. Moreover, unmapped chimeras, which were classified as the wasted data, could be reduced by properly increasing the length of the insertion segment size through a linear correlation analysis.

Significance: This study exhibited the profile of the phi29MDA chimeras by tens of millions of chimeric sequences, and helped understand the amplification mechanism of the phi29 DNA polymerase. Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information.

No MeSH data available.


Related in: MedlinePlus