Limits...
Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales.

Bi K, Vanderpool D, Singhal S, Linderoth T, Moritz C, Good JM - BMC Genomics (2012)

Bottom Line: There was no decrease in coverage among chipmunk species, which showed up to 1.5% sequence divergence in coding regions.Final assemblies yielded over ten thousand orthologous loci (~3.6 Mb) with thousands of fixed and polymorphic SNPs among species identified.Our study demonstrates the potential of a transcriptome-enabled, multiplexed, exon capture method to create thousands of informative markers for population genomic and phylogenetic studies in non-model species across the tree of life.

View Article: PubMed Central - HTML - PubMed

Affiliation: Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720-3160, USA. kebi@berkeley.edu

ABSTRACT

Background: To date, exon capture has largely been restricted to species with fully sequenced genomes, which has precluded its application to lineages that lack high quality genomic resources. We developed a novel strategy for designing array-based exon capture in chipmunks (Tamias) based on de novo transcriptome assemblies. We evaluated the performance of our approach across specimens from four chipmunk species.

Results: We selectively targeted 11,975 exons (~4 Mb) on custom capture arrays, and enriched over 99% of the targets in all libraries. The percentage of aligned reads was highly consistent (24.4-29.1%) across all specimens, including in multiplexing up to 20 barcoded individuals on a single array. Base coverage among specimens and within targets in each species library was uniform, and the performance of targets among independent exon captures was highly reproducible. There was no decrease in coverage among chipmunk species, which showed up to 1.5% sequence divergence in coding regions. We did observe a decline in capture performance of a subset of targets designed from a much more divergent ground squirrel genome (30 My), however, over 90% of the targets were also recovered. Final assemblies yielded over ten thousand orthologous loci (~3.6 Mb) with thousands of fixed and polymorphic SNPs among species identified.

Conclusions: Our study demonstrates the potential of a transcriptome-enabled, multiplexed, exon capture method to create thousands of informative markers for population genomic and phylogenetic studies in non-model species across the tree of life.

Show MeSH
Capture efficiency vs. sequence divergence. The captured reads from all species libraries (Tamias alpinus, T. amoenus, T. ruficaudus, and T. striatus) derived from T. alpinus exon and Ictidomys tridecemlineatus genomic interval targets were combined to generate the plot. Outliers are not shown in the plot. Capture efficiency is represented by normalized base coverage. Sequence divergence between the targets and the corresponding in-target assemblies (X-axis) were placed in 1% bins.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3472323&req=5

Figure 5: Capture efficiency vs. sequence divergence. The captured reads from all species libraries (Tamias alpinus, T. amoenus, T. ruficaudus, and T. striatus) derived from T. alpinus exon and Ictidomys tridecemlineatus genomic interval targets were combined to generate the plot. Outliers are not shown in the plot. Capture efficiency is represented by normalized base coverage. Sequence divergence between the targets and the corresponding in-target assemblies (X-axis) were placed in 1% bins.

Mentions: As expected, there is an abrupt reduction in capture efficiency when using a divergent genome for capture array design. The sequence divergence between targeted I. tridecemlineatus genomic (presumably largely non-coding) intervals and genomes of the four Tamias species ranged from 8.76 to 8.98%. The results showed a 3 to 4-fold decrease in average sequence coverage among these regions ( Additional file 3: Figure S7), which support the finding by Vallender [25] that the level of coverage starts to decrease rapidly when the divergence becomes greater than 5% or more (Figure 5). Note that at least some of the un-captured I. tridecemlineatus regions are likely to be completely absent from the Tamias genome. Moreover, the sequence alignments between Tamias assemblies and the corresponding I. tridecemlineatus genomic intervals are dominated by extensive indels that would reduce the mapping efficiency around such regions, which could amplify the effect of local nucleotide difference on the level of coverage. Nevertheless, 90% of the I. tridecemlineatus intervals were still covered by reads at a mean coverage of 3-4X. The divergence between coding regions in Tamias and in U. beldingi, a close relative to I. tridecemlineatus, is around 5%. Non-coding regions are expected to be less conserved than protein-coding regions on average. Therefore, if only orthologous exons of I. tridecemlineatus were targeted we would reasonably expect an elevated capture efficiency with sequence coverage falling into the range of 4 to 12X along with higher sensitivity (>90%).


Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales.

Bi K, Vanderpool D, Singhal S, Linderoth T, Moritz C, Good JM - BMC Genomics (2012)

Capture efficiency vs. sequence divergence. The captured reads from all species libraries (Tamias alpinus, T. amoenus, T. ruficaudus, and T. striatus) derived from T. alpinus exon and Ictidomys tridecemlineatus genomic interval targets were combined to generate the plot. Outliers are not shown in the plot. Capture efficiency is represented by normalized base coverage. Sequence divergence between the targets and the corresponding in-target assemblies (X-axis) were placed in 1% bins.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3472323&req=5

Figure 5: Capture efficiency vs. sequence divergence. The captured reads from all species libraries (Tamias alpinus, T. amoenus, T. ruficaudus, and T. striatus) derived from T. alpinus exon and Ictidomys tridecemlineatus genomic interval targets were combined to generate the plot. Outliers are not shown in the plot. Capture efficiency is represented by normalized base coverage. Sequence divergence between the targets and the corresponding in-target assemblies (X-axis) were placed in 1% bins.
Mentions: As expected, there is an abrupt reduction in capture efficiency when using a divergent genome for capture array design. The sequence divergence between targeted I. tridecemlineatus genomic (presumably largely non-coding) intervals and genomes of the four Tamias species ranged from 8.76 to 8.98%. The results showed a 3 to 4-fold decrease in average sequence coverage among these regions ( Additional file 3: Figure S7), which support the finding by Vallender [25] that the level of coverage starts to decrease rapidly when the divergence becomes greater than 5% or more (Figure 5). Note that at least some of the un-captured I. tridecemlineatus regions are likely to be completely absent from the Tamias genome. Moreover, the sequence alignments between Tamias assemblies and the corresponding I. tridecemlineatus genomic intervals are dominated by extensive indels that would reduce the mapping efficiency around such regions, which could amplify the effect of local nucleotide difference on the level of coverage. Nevertheless, 90% of the I. tridecemlineatus intervals were still covered by reads at a mean coverage of 3-4X. The divergence between coding regions in Tamias and in U. beldingi, a close relative to I. tridecemlineatus, is around 5%. Non-coding regions are expected to be less conserved than protein-coding regions on average. Therefore, if only orthologous exons of I. tridecemlineatus were targeted we would reasonably expect an elevated capture efficiency with sequence coverage falling into the range of 4 to 12X along with higher sensitivity (>90%).

Bottom Line: There was no decrease in coverage among chipmunk species, which showed up to 1.5% sequence divergence in coding regions.Final assemblies yielded over ten thousand orthologous loci (~3.6 Mb) with thousands of fixed and polymorphic SNPs among species identified.Our study demonstrates the potential of a transcriptome-enabled, multiplexed, exon capture method to create thousands of informative markers for population genomic and phylogenetic studies in non-model species across the tree of life.

View Article: PubMed Central - HTML - PubMed

Affiliation: Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720-3160, USA. kebi@berkeley.edu

ABSTRACT

Background: To date, exon capture has largely been restricted to species with fully sequenced genomes, which has precluded its application to lineages that lack high quality genomic resources. We developed a novel strategy for designing array-based exon capture in chipmunks (Tamias) based on de novo transcriptome assemblies. We evaluated the performance of our approach across specimens from four chipmunk species.

Results: We selectively targeted 11,975 exons (~4 Mb) on custom capture arrays, and enriched over 99% of the targets in all libraries. The percentage of aligned reads was highly consistent (24.4-29.1%) across all specimens, including in multiplexing up to 20 barcoded individuals on a single array. Base coverage among specimens and within targets in each species library was uniform, and the performance of targets among independent exon captures was highly reproducible. There was no decrease in coverage among chipmunk species, which showed up to 1.5% sequence divergence in coding regions. We did observe a decline in capture performance of a subset of targets designed from a much more divergent ground squirrel genome (30 My), however, over 90% of the targets were also recovered. Final assemblies yielded over ten thousand orthologous loci (~3.6 Mb) with thousands of fixed and polymorphic SNPs among species identified.

Conclusions: Our study demonstrates the potential of a transcriptome-enabled, multiplexed, exon capture method to create thousands of informative markers for population genomic and phylogenetic studies in non-model species across the tree of life.

Show MeSH