Limits...
Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool.

Jérôme M, Noirot C, Klopp C - BMC Res Notes (2011)

Bottom Line: Read cleaning has always been an important step in sequence analysis.The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning.It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Plate-forme bio-informatique Genotoul, INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire, BP 52627, 31326 Castanet-Tolosan Cedex, France. Jerome.Mariette@toulouse.inra.fr.

ABSTRACT

Background: Roche 454 pyrosequencing platform is often considered the most versatile of the Next Generation Sequencing technology platforms, permitting the sequencing of large genomes, the analysis of variations or the study of transcriptomes. A recent reported bias leads to the production of multiple reads for a unique DNA fragment in a random manner within a run. This bias has a direct impact on the quality of the measurement of the representation of the fragments using the reads. Other cleaning steps are usually performed on the reads before assembly or alignment.

Findings: PyroCleaner is a software module intended to clean 454 pyrosequencing reads in order to ease the assembly process. This program is a free software and is distributed under the terms of the GNU General Public License as published by the Free Software Foundation. It implements several filters using criteria such as read duplication, length, complexity, base-pair quality and number of undetermined bases. It also permits to clean flowgram files (.sff) of paired-end sequences generating on one hand validated paired-ends file and the other hand single read file.

Conclusions: Read cleaning has always been an important step in sequence analysis. The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning. It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.

No MeSH data available.


Paired-end cleaning strategy. Reads having no linker (a) are retained as single reads. If multiple linkers are present (b) in the same read, the read is discarded. In cases where the linker is partially found, meaning that the number of mismatches is lower than a threshold, only reads where the linker is located at the beginning or at the end (c) are saved as single reads, others (d) are deleted. Reads where the entire linker is present and not to closely located to one end (e) are saved as paired-end reads. In other cases, sequences are saved as single reads only if the linker is located far enough from one end (g), while others (f) are deleted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117718&req=5

Figure 1: Paired-end cleaning strategy. Reads having no linker (a) are retained as single reads. If multiple linkers are present (b) in the same read, the read is discarded. In cases where the linker is partially found, meaning that the number of mismatches is lower than a threshold, only reads where the linker is located at the beginning or at the end (c) are saved as single reads, others (d) are deleted. Reads where the entire linker is present and not to closely located to one end (e) are saved as paired-end reads. In other cases, sequences are saved as single reads only if the linker is located far enough from one end (g), while others (f) are deleted.

Mentions: The module also provides an option to filter paired-end reads:--clean-pairends. A 454 paired-end read should be composed of the sequence of one end of the DNA fragment, a linker sequence and the sequence of the other end of the DNA fragment. Unfortunately in some cases the linker is missing. In other cases the linker is too close to the end of the read and therefore the mate-pair cannot be used to bridge contigs in an assembly process. Cleaning paired-end reads relies on seeking this linker. The Roche platform uses three different linkers depending on the chemistry, one for GSFLX and two others for Titanium. Using the option generates a local similarity search which is performed between input sequences and 454 linkers using cross_match [9]. It leads to the generation of two output files using the strategy presented in Figure 1. The first file will contain all good quality paired-end reads. The second one gathers all reads in which the linker was missing, the linker location too close to one end or the linker sequence quality too low. In the last two cases the reads are clipped in order to keep the longest subsequence without linker. Thus, all reads from the second file can be used as single reads in the assembly.


Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool.

Jérôme M, Noirot C, Klopp C - BMC Res Notes (2011)

Paired-end cleaning strategy. Reads having no linker (a) are retained as single reads. If multiple linkers are present (b) in the same read, the read is discarded. In cases where the linker is partially found, meaning that the number of mismatches is lower than a threshold, only reads where the linker is located at the beginning or at the end (c) are saved as single reads, others (d) are deleted. Reads where the entire linker is present and not to closely located to one end (e) are saved as paired-end reads. In other cases, sequences are saved as single reads only if the linker is located far enough from one end (g), while others (f) are deleted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117718&req=5

Figure 1: Paired-end cleaning strategy. Reads having no linker (a) are retained as single reads. If multiple linkers are present (b) in the same read, the read is discarded. In cases where the linker is partially found, meaning that the number of mismatches is lower than a threshold, only reads where the linker is located at the beginning or at the end (c) are saved as single reads, others (d) are deleted. Reads where the entire linker is present and not to closely located to one end (e) are saved as paired-end reads. In other cases, sequences are saved as single reads only if the linker is located far enough from one end (g), while others (f) are deleted.
Mentions: The module also provides an option to filter paired-end reads:--clean-pairends. A 454 paired-end read should be composed of the sequence of one end of the DNA fragment, a linker sequence and the sequence of the other end of the DNA fragment. Unfortunately in some cases the linker is missing. In other cases the linker is too close to the end of the read and therefore the mate-pair cannot be used to bridge contigs in an assembly process. Cleaning paired-end reads relies on seeking this linker. The Roche platform uses three different linkers depending on the chemistry, one for GSFLX and two others for Titanium. Using the option generates a local similarity search which is performed between input sequences and 454 linkers using cross_match [9]. It leads to the generation of two output files using the strategy presented in Figure 1. The first file will contain all good quality paired-end reads. The second one gathers all reads in which the linker was missing, the linker location too close to one end or the linker sequence quality too low. In the last two cases the reads are clipped in order to keep the longest subsequence without linker. Thus, all reads from the second file can be used as single reads in the assembly.

Bottom Line: Read cleaning has always been an important step in sequence analysis.The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning.It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Plate-forme bio-informatique Genotoul, INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire, BP 52627, 31326 Castanet-Tolosan Cedex, France. Jerome.Mariette@toulouse.inra.fr.

ABSTRACT

Background: Roche 454 pyrosequencing platform is often considered the most versatile of the Next Generation Sequencing technology platforms, permitting the sequencing of large genomes, the analysis of variations or the study of transcriptomes. A recent reported bias leads to the production of multiple reads for a unique DNA fragment in a random manner within a run. This bias has a direct impact on the quality of the measurement of the representation of the fragments using the reads. Other cleaning steps are usually performed on the reads before assembly or alignment.

Findings: PyroCleaner is a software module intended to clean 454 pyrosequencing reads in order to ease the assembly process. This program is a free software and is distributed under the terms of the GNU General Public License as published by the Free Software Foundation. It implements several filters using criteria such as read duplication, length, complexity, base-pair quality and number of undetermined bases. It also permits to clean flowgram files (.sff) of paired-end sequences generating on one hand validated paired-ends file and the other hand single read file.

Conclusions: Read cleaning has always been an important step in sequence analysis. The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning. It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.

No MeSH data available.