Limits...
A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-seq.

Wang L, Si Y, Dedow LK, Shao Y, Liu P, Brutnell TP - PLoS ONE (2011)

Bottom Line: Our data supports novel gene models and can be used to improve current rice genome annotation.Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events.Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

View Article: PubMed Central - PubMed

Affiliation: Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
The emergence of NextGen sequencing technology has generated much interest in the exploration of transcriptomes. Currently, Illumina Inc. (San Diego, CA) provides one of the most widely utilized sequencing platforms for gene expression analysis. While Illumina reagents and protocols perform adequately in RNA-sequencing (RNA-seq), alternative reagents and protocols promise a higher throughput at a much lower cost. We have developed a low-cost and robust protocol to produce Illumina-compatible (GAIIx and HiSeq2000 platforms) RNA-seq libraries by combining several recent improvements. First, we designed balanced adapter sequences for multiplexing of samples; second, dUTP incorporation in 2(nd) strand synthesis was used to enforce strand-specificity; third, we simplified RNA purification, fragmentation and library size-selection steps thus drastically reducing the time and increasing throughput of library construction; fourth, we included an RNA spike-in control for validation and normalization purposes. To streamline informatics analysis for the community, we established a pipeline within the iPlant Collaborative. These scripts are easily customized to meet specific research needs and improve on existing informatics and statistical treatments of RNA-seq data. In particular, we apply significance tests for determining differential gene expression and intron retention events. To demonstrate the potential of both the library-construction protocol and data-analysis pipeline, we characterized the transcriptome of the rice leaf. Our data supports novel gene models and can be used to improve current rice genome annotation. Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events. Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

Show MeSH

Related in: MedlinePlus

Survey of anti-sense alignments.(a) An example of read alignment showing NSS- and SS-derived data for rice gene Os07g36090. The alignment is visualized using IGV (www.broadinstitute.org/igv/). Red and blue colors designate the directionality of reads. (b) Line plot showing percentage of anti-sense reads aligned to the average rice gene body from 5′ to 3′ end. (c) Line plot showing percent T along average rice gene body from 5′ to 3′ end.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3198403&req=5

pone-0026426-g004: Survey of anti-sense alignments.(a) An example of read alignment showing NSS- and SS-derived data for rice gene Os07g36090. The alignment is visualized using IGV (www.broadinstitute.org/igv/). Red and blue colors designate the directionality of reads. (b) Line plot showing percentage of anti-sense reads aligned to the average rice gene body from 5′ to 3′ end. (c) Line plot showing percent T along average rice gene body from 5′ to 3′ end.

Mentions: As previously mentioned, we used a slightly modified version of the dUTP method to enforce the strand specificity in our final libraries [20]. We increased the incubation time with UDG (Uracil-DNA Glycosylase) to 30 minutes to enforce the complete degradation of dUTPs. From the 82 million aligned reads that were generated using the SS protocol, we detected approximately 3.88% anti-sense reads according to the most current rice version 6.1 genome annotation [22]. This is slightly higher than the percentage of antisense reads detected in yeast using multiple strand specific protocols [13]. This discrepancy may reflect a true biological difference or a technical limitation related to the maturity of the genome annotation. That is, annotation for the yeast genome is highly refined, enabling a very accurate mapping of antisense reads to the gene space. Given the fact that the rice genome annotation is still being improved, some of the anti-sense reads are due to incorrectly annotated gene models. Figure 4A shows an example where an incorrectly annotated gene model contributes to the over-estimation of anti-sense coverage. Based on sense-strand alignments, the upper gene model is likely incorrect (Os07g36090.3). The other two gene models (Os07g36080.1, Os07g36090.1) running opposite directions are supported by the aligned reads. In this case, if Os07g36090.3 is used for calculating the anti-sense alignment, a substantial number of reads would align to the opposite strand. This example clearly demonstrates the advantage of utilizing a strand-specific protocol, as it is difficult to resolve the validity of gene models with only NSS reads alignment (compare top and middle panels in Fig. 4A).


A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-seq.

Wang L, Si Y, Dedow LK, Shao Y, Liu P, Brutnell TP - PLoS ONE (2011)

Survey of anti-sense alignments.(a) An example of read alignment showing NSS- and SS-derived data for rice gene Os07g36090. The alignment is visualized using IGV (www.broadinstitute.org/igv/). Red and blue colors designate the directionality of reads. (b) Line plot showing percentage of anti-sense reads aligned to the average rice gene body from 5′ to 3′ end. (c) Line plot showing percent T along average rice gene body from 5′ to 3′ end.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3198403&req=5

pone-0026426-g004: Survey of anti-sense alignments.(a) An example of read alignment showing NSS- and SS-derived data for rice gene Os07g36090. The alignment is visualized using IGV (www.broadinstitute.org/igv/). Red and blue colors designate the directionality of reads. (b) Line plot showing percentage of anti-sense reads aligned to the average rice gene body from 5′ to 3′ end. (c) Line plot showing percent T along average rice gene body from 5′ to 3′ end.
Mentions: As previously mentioned, we used a slightly modified version of the dUTP method to enforce the strand specificity in our final libraries [20]. We increased the incubation time with UDG (Uracil-DNA Glycosylase) to 30 minutes to enforce the complete degradation of dUTPs. From the 82 million aligned reads that were generated using the SS protocol, we detected approximately 3.88% anti-sense reads according to the most current rice version 6.1 genome annotation [22]. This is slightly higher than the percentage of antisense reads detected in yeast using multiple strand specific protocols [13]. This discrepancy may reflect a true biological difference or a technical limitation related to the maturity of the genome annotation. That is, annotation for the yeast genome is highly refined, enabling a very accurate mapping of antisense reads to the gene space. Given the fact that the rice genome annotation is still being improved, some of the anti-sense reads are due to incorrectly annotated gene models. Figure 4A shows an example where an incorrectly annotated gene model contributes to the over-estimation of anti-sense coverage. Based on sense-strand alignments, the upper gene model is likely incorrect (Os07g36090.3). The other two gene models (Os07g36080.1, Os07g36090.1) running opposite directions are supported by the aligned reads. In this case, if Os07g36090.3 is used for calculating the anti-sense alignment, a substantial number of reads would align to the opposite strand. This example clearly demonstrates the advantage of utilizing a strand-specific protocol, as it is difficult to resolve the validity of gene models with only NSS reads alignment (compare top and middle panels in Fig. 4A).

Bottom Line: Our data supports novel gene models and can be used to improve current rice genome annotation.Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events.Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

View Article: PubMed Central - PubMed

Affiliation: Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
The emergence of NextGen sequencing technology has generated much interest in the exploration of transcriptomes. Currently, Illumina Inc. (San Diego, CA) provides one of the most widely utilized sequencing platforms for gene expression analysis. While Illumina reagents and protocols perform adequately in RNA-sequencing (RNA-seq), alternative reagents and protocols promise a higher throughput at a much lower cost. We have developed a low-cost and robust protocol to produce Illumina-compatible (GAIIx and HiSeq2000 platforms) RNA-seq libraries by combining several recent improvements. First, we designed balanced adapter sequences for multiplexing of samples; second, dUTP incorporation in 2(nd) strand synthesis was used to enforce strand-specificity; third, we simplified RNA purification, fragmentation and library size-selection steps thus drastically reducing the time and increasing throughput of library construction; fourth, we included an RNA spike-in control for validation and normalization purposes. To streamline informatics analysis for the community, we established a pipeline within the iPlant Collaborative. These scripts are easily customized to meet specific research needs and improve on existing informatics and statistical treatments of RNA-seq data. In particular, we apply significance tests for determining differential gene expression and intron retention events. To demonstrate the potential of both the library-construction protocol and data-analysis pipeline, we characterized the transcriptome of the rice leaf. Our data supports novel gene models and can be used to improve current rice genome annotation. Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events. Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

Show MeSH
Related in: MedlinePlus