Limits...
Improved assembly procedure of viral RNA genomes amplified with Phi29 polymerase from new generation sequencing data

View Article: PubMed Central - PubMed

ABSTRACT

Background: New sequencing technologies have opened the way to the discovery and the characterization of pathogenic viruses in clinical samples. However, the use of these new methods can require an amplification of viral RNA prior to the sequencing. Among all the available methods, the procedure based on the use of Phi29 polymerase produces a huge amount of amplified DNA. However, its major disadvantage is to generate a large number of chimeric sequences which can affect the assembly step. The pre-process method proposed in this study strongly limits the negative impact of chimeric reads in order to obtain the full-length of viral genomes.

Findings: Three different assembly softwares (ABySS, Ray and SPAdes) were tested for their ability to correctly assemble the full-length of viral genomes. Although in all cases, our pre-processed method improved genome assembly, only its combination with the use of SPAdes allowed us to obtain the full-length of the viral genomes tested in one contig.

Conclusions: The proposed pipeline is able to overcome drawbacks due to the generation of chimeric reads during the amplification of viral RNA which considerably improves the assembling of full-length viral genomes.

Electronic supplementary material: The online version of this article (doi:10.1186/s40659-016-0099-y) contains supplementary material, which is available to authorized users.

No MeSH data available.


Figure describing the main steps of retrotranscription, amplification of RNA and sequencing (a) and the viral reads’ filtering method (b). This method is divided in different parts. The first part obtains all reads in Fasta format after different types of filtration steps. The second step aims at selecting only the viral part in each read using a similarity-based approach. Finally, the last step is to perform assembly using different algorithms with targeted sequences. HTS high throughput sequencing; cDNA complementary DNA; ssDNA single strand DNA
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC5015205&req=5

Fig1: Figure describing the main steps of retrotranscription, amplification of RNA and sequencing (a) and the viral reads’ filtering method (b). This method is divided in different parts. The first part obtains all reads in Fasta format after different types of filtration steps. The second step aims at selecting only the viral part in each read using a similarity-based approach. Finally, the last step is to perform assembly using different algorithms with targeted sequences. HTS high throughput sequencing; cDNA complementary DNA; ssDNA single strand DNA

Mentions: The quality of the reads was initially assessed by FastQC. The mouse genome sequence was filtered by mapping the selected reads on the Mus musculus Mn10 sequence using Bowtie 2.0 software with the “very sensitive” flag option [7]. All remaining reads corresponding to viral sequences were obtained based on “similarity-based” approach and used BLASTN and BLASTX with a defined number of targeted sequences available in sequence databanks (L22089, DQ294633.1 and KF680222.1). All viral reads were selected according to the percentage of identity (a minimum of 75 %) between the reads and reference sequences and a minimum alignment length of 60 bases including indel. In order to improve the assemblage quality of viral genomes, only the region of each read matching BLAST results was selected and kept (Fig. 1). This way, all non-viral sequences potentially associated with a viral sequence inside the same read generated during the retrotranscription step were removed. The selected reads were assembled with different software, such as ABySS, Ray and SPAdes (version 3.0; 3.5 and 3.6) with different k values used to build the Bruijn graph [8, 9]. All genome assemblies were evaluated using the QUAST tool such as the number of obtained contigs, the size of the largest contig, the N50 and L50 and finally, the coverage of the genome obtained [10]. The proportion of reads which unmapped on generated contig(s) for each set of data was determined by mapping, by using Bowtie 2.0 software with the “very sensitive” flag option and “End to End” as the alignment type in the Geneious R9 software. All chimeric reads were identified from a tabular output of a BLAST generated file which contained matching positions from reads against BLAST hits. A read was considered to be chimeric if its entire sequence did not belong to the alignment.Fig. 1


Improved assembly procedure of viral RNA genomes amplified with Phi29 polymerase from new generation sequencing data
Figure describing the main steps of retrotranscription, amplification of RNA and sequencing (a) and the viral reads’ filtering method (b). This method is divided in different parts. The first part obtains all reads in Fasta format after different types of filtration steps. The second step aims at selecting only the viral part in each read using a similarity-based approach. Finally, the last step is to perform assembly using different algorithms with targeted sequences. HTS high throughput sequencing; cDNA complementary DNA; ssDNA single strand DNA
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC5015205&req=5

Fig1: Figure describing the main steps of retrotranscription, amplification of RNA and sequencing (a) and the viral reads’ filtering method (b). This method is divided in different parts. The first part obtains all reads in Fasta format after different types of filtration steps. The second step aims at selecting only the viral part in each read using a similarity-based approach. Finally, the last step is to perform assembly using different algorithms with targeted sequences. HTS high throughput sequencing; cDNA complementary DNA; ssDNA single strand DNA
Mentions: The quality of the reads was initially assessed by FastQC. The mouse genome sequence was filtered by mapping the selected reads on the Mus musculus Mn10 sequence using Bowtie 2.0 software with the “very sensitive” flag option [7]. All remaining reads corresponding to viral sequences were obtained based on “similarity-based” approach and used BLASTN and BLASTX with a defined number of targeted sequences available in sequence databanks (L22089, DQ294633.1 and KF680222.1). All viral reads were selected according to the percentage of identity (a minimum of 75 %) between the reads and reference sequences and a minimum alignment length of 60 bases including indel. In order to improve the assemblage quality of viral genomes, only the region of each read matching BLAST results was selected and kept (Fig. 1). This way, all non-viral sequences potentially associated with a viral sequence inside the same read generated during the retrotranscription step were removed. The selected reads were assembled with different software, such as ABySS, Ray and SPAdes (version 3.0; 3.5 and 3.6) with different k values used to build the Bruijn graph [8, 9]. All genome assemblies were evaluated using the QUAST tool such as the number of obtained contigs, the size of the largest contig, the N50 and L50 and finally, the coverage of the genome obtained [10]. The proportion of reads which unmapped on generated contig(s) for each set of data was determined by mapping, by using Bowtie 2.0 software with the “very sensitive” flag option and “End to End” as the alignment type in the Geneious R9 software. All chimeric reads were identified from a tabular output of a BLAST generated file which contained matching positions from reads against BLAST hits. A read was considered to be chimeric if its entire sequence did not belong to the alignment.Fig. 1

View Article: PubMed Central - PubMed

ABSTRACT

Background: New sequencing technologies have opened the way to the discovery and the characterization of pathogenic viruses in clinical samples. However, the use of these new methods can require an amplification of viral RNA prior to the sequencing. Among all the available methods, the procedure based on the use of Phi29 polymerase produces a huge amount of amplified DNA. However, its major disadvantage is to generate a large number of chimeric sequences which can affect the assembly step. The pre-process method proposed in this study strongly limits the negative impact of chimeric reads in order to obtain the full-length of viral genomes.

Findings: Three different assembly softwares (ABySS, Ray and SPAdes) were tested for their ability to correctly assemble the full-length of viral genomes. Although in all cases, our pre-processed method improved genome assembly, only its combination with the use of SPAdes allowed us to obtain the full-length of the viral genomes tested in one contig.

Conclusions: The proposed pipeline is able to overcome drawbacks due to the generation of chimeric reads during the amplification of viral RNA which considerably improves the assembling of full-length viral genomes.

Electronic supplementary material: The online version of this article (doi:10.1186/s40659-016-0099-y) contains supplementary material, which is available to authorized users.

No MeSH data available.