Limits...
GBSX: a toolkit for experimental design and demultiplexing genotyping by sequencing experiments.

Herten K, Hestand MS, Vermeesch JR, Van Houdt JK - BMC Bioinformatics (2015)

Bottom Line: To reduce costs, sequencing of restriction enzyme based reduced representation libraries can be utilized.Here we demonstrate the usability of the GBSX toolkit and demonstrate improved in-line barcode demultiplexing and trimming performance compared to existing tools.GBSX provides an easy to use suite of tools for designing and demultiplexing of GBS experiments.

View Article: PubMed Central - PubMed

Affiliation: Center for Human Genetics, KU Leuven, Herestraat 49, Leuven, 3000, Belgium. koen.herten@med.kuleuven.be.

ABSTRACT

Background: Massive parallel sequencing is a powerful tool for variant discovery and genotyping. To reduce costs, sequencing of restriction enzyme based reduced representation libraries can be utilized. This technology is generally referred to as Genotyping By Sequencing (GBS). To deal with GBS experimental design and initial processing specific bioinformatic tools are needed.

Results: GBSX is a package that assists in selecting the appropriate enzyme and the design of compatible in-line barcodes. Post sequencing, it performs optimized demultiplexing using these barcodes to create fastq files per barcode which can easily be plugged into existing variant analysis pipelines. Here we demonstrate the usability of the GBSX toolkit and demonstrate improved in-line barcode demultiplexing and trimming performance compared to existing tools.

Conclusions: GBSX provides an easy to use suite of tools for designing and demultiplexing of GBS experiments.

Show MeSH
The importance of a good barcode design. This image shows 2 barcodes, completed with the restriction enzyme recognition site of ApeKI. If these barcodes are used for the demultiplexing of GBS or RAD data with the ApeKI enzyme, the software will recognize the correct barcodes (sample) because of the Hamming distance between the barcodes. When the barcodes are used with another or no enzyme, the barcodes will have a different distance. This could result in misdemultiplexing of the reads.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359581&req=5

Fig2: The importance of a good barcode design. This image shows 2 barcodes, completed with the restriction enzyme recognition site of ApeKI. If these barcodes are used for the demultiplexing of GBS or RAD data with the ApeKI enzyme, the software will recognize the correct barcodes (sample) because of the Hamming distance between the barcodes. When the barcodes are used with another or no enzyme, the barcodes will have a different distance. This could result in misdemultiplexing of the reads.

Mentions: The barcode generator designs random self-correcting barcodes based on Hamming codes as described in Bystrykh et al. [16]. Generated barcodes vary in length and have an equal representation of the different nucleotides at every position in order to mitigate possible problems due to a low diversity of the library. The algorithm creates a random barcode set given the restriction enzyme of choice and the desired number of barcodes. Our implementation uses Hamming (15,11) codes (11 data bits and 4 parity bits). During the design process shorter barcodes are extended with a polyA sequence in order to be compliant with the Hamming code. For the experimental use of the barcodes the polyA sequences are removed. As such the self-correcting nature of the Hamming code is retained. The barcodes differ by at least a Hamming distance of three and up to one substitution error can be corrected. Additional constraints are that barcodes cannot contain restriction enzyme recognition sites and that shorter barcodes cannot be identical to a partial sequence of a longer barcode. The combination of the smallest barcode and restriction enzyme recognition site must have a Hamming distance of 3 or more compared to the start of all other barcodes and restriction enzyme recognition sites. Figure 2 illustrates two barcodes designed with a restriction site. Due to the design constraints these two barcodes have a Hamming distance of 4. If the same barcodes would be used without the enzyme, or without the constraints, both barcodes will be demultiplexed as being the shortest barcode (Hamming distance of 1). Using another restriction enzyme could introduce a smaller Hamming distance, resulting in the same misassignment. Hence, for optimal usage, the designed barcodes can only be used in combination with the corresponding restriction enzyme. Using different or no restriction enzyme in combination with these barcodes will result in incorrect demultiplexing, and hence incorrect results (Figure 2).Figure 2


GBSX: a toolkit for experimental design and demultiplexing genotyping by sequencing experiments.

Herten K, Hestand MS, Vermeesch JR, Van Houdt JK - BMC Bioinformatics (2015)

The importance of a good barcode design. This image shows 2 barcodes, completed with the restriction enzyme recognition site of ApeKI. If these barcodes are used for the demultiplexing of GBS or RAD data with the ApeKI enzyme, the software will recognize the correct barcodes (sample) because of the Hamming distance between the barcodes. When the barcodes are used with another or no enzyme, the barcodes will have a different distance. This could result in misdemultiplexing of the reads.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359581&req=5

Fig2: The importance of a good barcode design. This image shows 2 barcodes, completed with the restriction enzyme recognition site of ApeKI. If these barcodes are used for the demultiplexing of GBS or RAD data with the ApeKI enzyme, the software will recognize the correct barcodes (sample) because of the Hamming distance between the barcodes. When the barcodes are used with another or no enzyme, the barcodes will have a different distance. This could result in misdemultiplexing of the reads.
Mentions: The barcode generator designs random self-correcting barcodes based on Hamming codes as described in Bystrykh et al. [16]. Generated barcodes vary in length and have an equal representation of the different nucleotides at every position in order to mitigate possible problems due to a low diversity of the library. The algorithm creates a random barcode set given the restriction enzyme of choice and the desired number of barcodes. Our implementation uses Hamming (15,11) codes (11 data bits and 4 parity bits). During the design process shorter barcodes are extended with a polyA sequence in order to be compliant with the Hamming code. For the experimental use of the barcodes the polyA sequences are removed. As such the self-correcting nature of the Hamming code is retained. The barcodes differ by at least a Hamming distance of three and up to one substitution error can be corrected. Additional constraints are that barcodes cannot contain restriction enzyme recognition sites and that shorter barcodes cannot be identical to a partial sequence of a longer barcode. The combination of the smallest barcode and restriction enzyme recognition site must have a Hamming distance of 3 or more compared to the start of all other barcodes and restriction enzyme recognition sites. Figure 2 illustrates two barcodes designed with a restriction site. Due to the design constraints these two barcodes have a Hamming distance of 4. If the same barcodes would be used without the enzyme, or without the constraints, both barcodes will be demultiplexed as being the shortest barcode (Hamming distance of 1). Using another restriction enzyme could introduce a smaller Hamming distance, resulting in the same misassignment. Hence, for optimal usage, the designed barcodes can only be used in combination with the corresponding restriction enzyme. Using different or no restriction enzyme in combination with these barcodes will result in incorrect demultiplexing, and hence incorrect results (Figure 2).Figure 2

Bottom Line: To reduce costs, sequencing of restriction enzyme based reduced representation libraries can be utilized.Here we demonstrate the usability of the GBSX toolkit and demonstrate improved in-line barcode demultiplexing and trimming performance compared to existing tools.GBSX provides an easy to use suite of tools for designing and demultiplexing of GBS experiments.

View Article: PubMed Central - PubMed

Affiliation: Center for Human Genetics, KU Leuven, Herestraat 49, Leuven, 3000, Belgium. koen.herten@med.kuleuven.be.

ABSTRACT

Background: Massive parallel sequencing is a powerful tool for variant discovery and genotyping. To reduce costs, sequencing of restriction enzyme based reduced representation libraries can be utilized. This technology is generally referred to as Genotyping By Sequencing (GBS). To deal with GBS experimental design and initial processing specific bioinformatic tools are needed.

Results: GBSX is a package that assists in selecting the appropriate enzyme and the design of compatible in-line barcodes. Post sequencing, it performs optimized demultiplexing using these barcodes to create fastq files per barcode which can easily be plugged into existing variant analysis pipelines. Here we demonstrate the usability of the GBSX toolkit and demonstrate improved in-line barcode demultiplexing and trimming performance compared to existing tools.

Conclusions: GBSX provides an easy to use suite of tools for designing and demultiplexing of GBS experiments.

Show MeSH