Limits...
A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-seq.

Wang L, Si Y, Dedow LK, Shao Y, Liu P, Brutnell TP - PLoS ONE (2011)

Bottom Line: Our data supports novel gene models and can be used to improve current rice genome annotation.Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events.Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

View Article: PubMed Central - PubMed

Affiliation: Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
The emergence of NextGen sequencing technology has generated much interest in the exploration of transcriptomes. Currently, Illumina Inc. (San Diego, CA) provides one of the most widely utilized sequencing platforms for gene expression analysis. While Illumina reagents and protocols perform adequately in RNA-sequencing (RNA-seq), alternative reagents and protocols promise a higher throughput at a much lower cost. We have developed a low-cost and robust protocol to produce Illumina-compatible (GAIIx and HiSeq2000 platforms) RNA-seq libraries by combining several recent improvements. First, we designed balanced adapter sequences for multiplexing of samples; second, dUTP incorporation in 2(nd) strand synthesis was used to enforce strand-specificity; third, we simplified RNA purification, fragmentation and library size-selection steps thus drastically reducing the time and increasing throughput of library construction; fourth, we included an RNA spike-in control for validation and normalization purposes. To streamline informatics analysis for the community, we established a pipeline within the iPlant Collaborative. These scripts are easily customized to meet specific research needs and improve on existing informatics and statistical treatments of RNA-seq data. In particular, we apply significance tests for determining differential gene expression and intron retention events. To demonstrate the potential of both the library-construction protocol and data-analysis pipeline, we characterized the transcriptome of the rice leaf. Our data supports novel gene models and can be used to improve current rice genome annotation. Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events. Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

Show MeSH

Related in: MedlinePlus

Enforcing strand specificity using dUTP.dTTP is substituted with dUTP during second strand cDNA synthesis. Y-shaped (partial-complementary) adapters are ligated and the dUTP-marked strand is digested with uracil-DNA gylcosylase (UDG). PCR amplification of this single strand confers strand specificity.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3198403&req=5

pone-0026426-g002: Enforcing strand specificity using dUTP.dTTP is substituted with dUTP during second strand cDNA synthesis. Y-shaped (partial-complementary) adapters are ligated and the dUTP-marked strand is digested with uracil-DNA gylcosylase (UDG). PCR amplification of this single strand confers strand specificity.

Mentions: The primary workflow of our improved RNA-seq library construction protocol does not diverge significantly from the standard Illumina library construction procedures illustrated in Figure 1. However, we have implemented a number of key improvements at steps marked with red asterisks (Fig. 1). Most importantly, we incorporated a step that preserves the strand-specific nature of mRNA molecules. In addition, we incorporated aRNA spike-in controls added to each RNA input before fragmentation. The aRNA spike-in controls are synthesized in vitro from four distinct human cDNA sources that have no homology to plant species (e.g. maize, rice, Arabidopsis, Setaria, Brachypodium, Barley, potato and tomato). The added aRNA spike-in control was used to validate sequencing results and provided an alternative parameter for normalization. However, as shown in Figure S1, spike-in based normalization underperforms when compared to other methods of normalization. The principle of how strand-specific information is retained is illustrated in Figure 2 and is an adaptation of a robust technique where the second-strand cDNA is marked with deoxyuridine triphosphate (dUTP) in place of deoxythymidine triphosphate (dTTP) [13], [21]. We also simplified the fragmentation procedure for the RNA input: instead of using a specific fragmentation buffer, we opted to use reverse transcription (RT) first-strand buffer (Invitrogen, CA) directly, which eliminated the need to purify fragmented RNA. The average size of the RT-buffer fragmented RNA is approximately 200 bps with a 5 minute treatment at 94 degrees as measured by the Agilent Bioanalyzer (Figure S2), which is the suggested size distribution for RNA-seq library on Illumina platform.


A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-seq.

Wang L, Si Y, Dedow LK, Shao Y, Liu P, Brutnell TP - PLoS ONE (2011)

Enforcing strand specificity using dUTP.dTTP is substituted with dUTP during second strand cDNA synthesis. Y-shaped (partial-complementary) adapters are ligated and the dUTP-marked strand is digested with uracil-DNA gylcosylase (UDG). PCR amplification of this single strand confers strand specificity.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3198403&req=5

pone-0026426-g002: Enforcing strand specificity using dUTP.dTTP is substituted with dUTP during second strand cDNA synthesis. Y-shaped (partial-complementary) adapters are ligated and the dUTP-marked strand is digested with uracil-DNA gylcosylase (UDG). PCR amplification of this single strand confers strand specificity.
Mentions: The primary workflow of our improved RNA-seq library construction protocol does not diverge significantly from the standard Illumina library construction procedures illustrated in Figure 1. However, we have implemented a number of key improvements at steps marked with red asterisks (Fig. 1). Most importantly, we incorporated a step that preserves the strand-specific nature of mRNA molecules. In addition, we incorporated aRNA spike-in controls added to each RNA input before fragmentation. The aRNA spike-in controls are synthesized in vitro from four distinct human cDNA sources that have no homology to plant species (e.g. maize, rice, Arabidopsis, Setaria, Brachypodium, Barley, potato and tomato). The added aRNA spike-in control was used to validate sequencing results and provided an alternative parameter for normalization. However, as shown in Figure S1, spike-in based normalization underperforms when compared to other methods of normalization. The principle of how strand-specific information is retained is illustrated in Figure 2 and is an adaptation of a robust technique where the second-strand cDNA is marked with deoxyuridine triphosphate (dUTP) in place of deoxythymidine triphosphate (dTTP) [13], [21]. We also simplified the fragmentation procedure for the RNA input: instead of using a specific fragmentation buffer, we opted to use reverse transcription (RT) first-strand buffer (Invitrogen, CA) directly, which eliminated the need to purify fragmented RNA. The average size of the RT-buffer fragmented RNA is approximately 200 bps with a 5 minute treatment at 94 degrees as measured by the Agilent Bioanalyzer (Figure S2), which is the suggested size distribution for RNA-seq library on Illumina platform.

Bottom Line: Our data supports novel gene models and can be used to improve current rice genome annotation.Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events.Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

View Article: PubMed Central - PubMed

Affiliation: Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
The emergence of NextGen sequencing technology has generated much interest in the exploration of transcriptomes. Currently, Illumina Inc. (San Diego, CA) provides one of the most widely utilized sequencing platforms for gene expression analysis. While Illumina reagents and protocols perform adequately in RNA-sequencing (RNA-seq), alternative reagents and protocols promise a higher throughput at a much lower cost. We have developed a low-cost and robust protocol to produce Illumina-compatible (GAIIx and HiSeq2000 platforms) RNA-seq libraries by combining several recent improvements. First, we designed balanced adapter sequences for multiplexing of samples; second, dUTP incorporation in 2(nd) strand synthesis was used to enforce strand-specificity; third, we simplified RNA purification, fragmentation and library size-selection steps thus drastically reducing the time and increasing throughput of library construction; fourth, we included an RNA spike-in control for validation and normalization purposes. To streamline informatics analysis for the community, we established a pipeline within the iPlant Collaborative. These scripts are easily customized to meet specific research needs and improve on existing informatics and statistical treatments of RNA-seq data. In particular, we apply significance tests for determining differential gene expression and intron retention events. To demonstrate the potential of both the library-construction protocol and data-analysis pipeline, we characterized the transcriptome of the rice leaf. Our data supports novel gene models and can be used to improve current rice genome annotation. Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events. Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

Show MeSH
Related in: MedlinePlus