Limits...
Targeted sequencing of large genomic regions with CATCH-Seq.

Day K, Song J, Absher D - PLoS ONE (2014)

Bottom Line: Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing.We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb.Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

View Article: PubMed Central - PubMed

Affiliation: HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America.

ABSTRACT
Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq) procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

Show MeSH
Capture efficiency in a sample representing the median coverage among all sequenced samples shown by the percent of total targeted bases covered at particular coverage depths in a chromosome 11 target.(A) Percent of targeted bases covered using various thresholds of repeat masking (A) by size, or (B) (SW) scores. (C) Percent of targeted bases covered based on masking of percent GC content extremes. Upper panels show coverage by CATCH-Seq within a sample that showed median coverage among all other samples used in the capture. (D–F) Lower panels show coverage within the corresponding captured region for the same number of merged reads analyzed for CATCH-Seq under the same repeat masking or percent GC content thresholds from 15 individuals sequenced for the 1000 genomes project (merged WGS).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4214737&req=5

pone-0111756-g003: Capture efficiency in a sample representing the median coverage among all sequenced samples shown by the percent of total targeted bases covered at particular coverage depths in a chromosome 11 target.(A) Percent of targeted bases covered using various thresholds of repeat masking (A) by size, or (B) (SW) scores. (C) Percent of targeted bases covered based on masking of percent GC content extremes. Upper panels show coverage by CATCH-Seq within a sample that showed median coverage among all other samples used in the capture. (D–F) Lower panels show coverage within the corresponding captured region for the same number of merged reads analyzed for CATCH-Seq under the same repeat masking or percent GC content thresholds from 15 individuals sequenced for the 1000 genomes project (merged WGS).

Mentions: Blocking repetitive sites is crucial for solution hybridization based capture systems, as the inclusion of repetitive sequences in a capture probe set can lead to contamination of the final sequence reads with off-target repeats [5]. For CATCH-Seq, blocking of repeats is essential because many of probes synthesized from BAC templates contain repeat regions. Based on our weaker coverage of repeats, we were interested in how both the levels of repeat divergence and repeat size influence enrichment and uniformity of coverage within target template regions. Typical commercial platforms avoid synthesis of probes within repeat regions, and usually only consider uniformity of coverage within non-repetitive sites. We were interested in the uniformity of coverage of both repetitive and non-repetitive sequences as repeats represent a considerable proportion of the contiguous regions we targeted. We specifically analyzed a region on chromosome 11 that is one target within a composite capture of ten targets and selected the sample that represented median coverage among all of the samples we sequenced (Figure 2, Table 1). To understand the influence of repeat structures on target capture uniformity, we compared the base coverage across all targeted bases within our chromosome 11 site after repeat masking the target with increasing threshold values of repeat lengths or Smith-Waterman (SW) scores (Table 1). The variation in repeat masking thresholds gave us an indication of the proportion of our captured sequences that were uniquely mapping to repetitive versus non-repetitive regions within the target site. A mask of repeat sizes below 250 bp in length from target coverage calculations did not proportionally alter coverage rates compared to the total unmasked target, suggesting that small repeats were covered effectively. Repeat mask of sizes ranging between 250 bp and 500 bp increased our relative coverage rate, and was similar to the effect of masking all repeats less than 500 bp (Figure 3A). Masking of all repeats, regardless of size, demonstrated that non-repetitive sequences represented the majority of our capture, and indicated that our blocking approach was highly effective. Repeat masking by SW scores produced a similar trend as repeat size. Capture of repeats above a score of 600 became less efficient (Figure 3B). Extremes in GC content are also known to influence coverage of targeted bases in solution hybridization-based exon capture platforms [6]. We also masked coverage by GC content extremes in 400 bp intervals containing high (>65%) and low GC (<35%) percentages (Figure 2, Table 1). Masking stretches of extreme GC percentages did not alter the relative coverage rate (Figure 3C).


Targeted sequencing of large genomic regions with CATCH-Seq.

Day K, Song J, Absher D - PLoS ONE (2014)

Capture efficiency in a sample representing the median coverage among all sequenced samples shown by the percent of total targeted bases covered at particular coverage depths in a chromosome 11 target.(A) Percent of targeted bases covered using various thresholds of repeat masking (A) by size, or (B) (SW) scores. (C) Percent of targeted bases covered based on masking of percent GC content extremes. Upper panels show coverage by CATCH-Seq within a sample that showed median coverage among all other samples used in the capture. (D–F) Lower panels show coverage within the corresponding captured region for the same number of merged reads analyzed for CATCH-Seq under the same repeat masking or percent GC content thresholds from 15 individuals sequenced for the 1000 genomes project (merged WGS).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4214737&req=5

pone-0111756-g003: Capture efficiency in a sample representing the median coverage among all sequenced samples shown by the percent of total targeted bases covered at particular coverage depths in a chromosome 11 target.(A) Percent of targeted bases covered using various thresholds of repeat masking (A) by size, or (B) (SW) scores. (C) Percent of targeted bases covered based on masking of percent GC content extremes. Upper panels show coverage by CATCH-Seq within a sample that showed median coverage among all other samples used in the capture. (D–F) Lower panels show coverage within the corresponding captured region for the same number of merged reads analyzed for CATCH-Seq under the same repeat masking or percent GC content thresholds from 15 individuals sequenced for the 1000 genomes project (merged WGS).
Mentions: Blocking repetitive sites is crucial for solution hybridization based capture systems, as the inclusion of repetitive sequences in a capture probe set can lead to contamination of the final sequence reads with off-target repeats [5]. For CATCH-Seq, blocking of repeats is essential because many of probes synthesized from BAC templates contain repeat regions. Based on our weaker coverage of repeats, we were interested in how both the levels of repeat divergence and repeat size influence enrichment and uniformity of coverage within target template regions. Typical commercial platforms avoid synthesis of probes within repeat regions, and usually only consider uniformity of coverage within non-repetitive sites. We were interested in the uniformity of coverage of both repetitive and non-repetitive sequences as repeats represent a considerable proportion of the contiguous regions we targeted. We specifically analyzed a region on chromosome 11 that is one target within a composite capture of ten targets and selected the sample that represented median coverage among all of the samples we sequenced (Figure 2, Table 1). To understand the influence of repeat structures on target capture uniformity, we compared the base coverage across all targeted bases within our chromosome 11 site after repeat masking the target with increasing threshold values of repeat lengths or Smith-Waterman (SW) scores (Table 1). The variation in repeat masking thresholds gave us an indication of the proportion of our captured sequences that were uniquely mapping to repetitive versus non-repetitive regions within the target site. A mask of repeat sizes below 250 bp in length from target coverage calculations did not proportionally alter coverage rates compared to the total unmasked target, suggesting that small repeats were covered effectively. Repeat mask of sizes ranging between 250 bp and 500 bp increased our relative coverage rate, and was similar to the effect of masking all repeats less than 500 bp (Figure 3A). Masking of all repeats, regardless of size, demonstrated that non-repetitive sequences represented the majority of our capture, and indicated that our blocking approach was highly effective. Repeat masking by SW scores produced a similar trend as repeat size. Capture of repeats above a score of 600 became less efficient (Figure 3B). Extremes in GC content are also known to influence coverage of targeted bases in solution hybridization-based exon capture platforms [6]. We also masked coverage by GC content extremes in 400 bp intervals containing high (>65%) and low GC (<35%) percentages (Figure 2, Table 1). Masking stretches of extreme GC percentages did not alter the relative coverage rate (Figure 3C).

Bottom Line: Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing.We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb.Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

View Article: PubMed Central - PubMed

Affiliation: HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America.

ABSTRACT
Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq) procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

Show MeSH