Limits...
A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-seq.

Wang L, Si Y, Dedow LK, Shao Y, Liu P, Brutnell TP - PLoS ONE (2011)

Bottom Line: Our data supports novel gene models and can be used to improve current rice genome annotation.Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events.Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

View Article: PubMed Central - PubMed

Affiliation: Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
The emergence of NextGen sequencing technology has generated much interest in the exploration of transcriptomes. Currently, Illumina Inc. (San Diego, CA) provides one of the most widely utilized sequencing platforms for gene expression analysis. While Illumina reagents and protocols perform adequately in RNA-sequencing (RNA-seq), alternative reagents and protocols promise a higher throughput at a much lower cost. We have developed a low-cost and robust protocol to produce Illumina-compatible (GAIIx and HiSeq2000 platforms) RNA-seq libraries by combining several recent improvements. First, we designed balanced adapter sequences for multiplexing of samples; second, dUTP incorporation in 2(nd) strand synthesis was used to enforce strand-specificity; third, we simplified RNA purification, fragmentation and library size-selection steps thus drastically reducing the time and increasing throughput of library construction; fourth, we included an RNA spike-in control for validation and normalization purposes. To streamline informatics analysis for the community, we established a pipeline within the iPlant Collaborative. These scripts are easily customized to meet specific research needs and improve on existing informatics and statistical treatments of RNA-seq data. In particular, we apply significance tests for determining differential gene expression and intron retention events. To demonstrate the potential of both the library-construction protocol and data-analysis pipeline, we characterized the transcriptome of the rice leaf. Our data supports novel gene models and can be used to improve current rice genome annotation. Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events. Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

Show MeSH

Related in: MedlinePlus

Average multiplex read distribution.Bar plot of the read distribution among the eleven indices used for this study. Y axis represents the average percentage of indexed reads relative to the total number of reads from each lane. X axis shows the index sequences. Data is averaged from six lanes of data with standard error shown.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3198403&req=5

pone-0026426-g005: Average multiplex read distribution.Bar plot of the read distribution among the eleven indices used for this study. Y axis represents the average percentage of indexed reads relative to the total number of reads from each lane. X axis shows the index sequences. Data is averaged from six lanes of data with standard error shown.

Mentions: One of the challenges of multiplexing samples is ensuring an even distribution of read counts across indexed libraries [30]. In the method described here, a combination of five nucleotides serves as the index with a T as the common fifth base pair to minimize biases caused by differences in ligation efficiency. The index itself is incorporated into the adaptor as illustrated in Figure S5. When samples are multiplexed using this design, they can be processed using single-end or pair-end sequencing on the Illumina platform. The read output starts with the index and a T followed by the target sequence. We tested a set of adaptors by multiplexing eleven rice leaf samples in one lane and sequenced the libraries using a total of six lanes on a GAIIx Illumina machine. The average ratio of indexed reads are shown in Figure 5. The percentage of reads derived from each of the eleven indices is relatively uniform, which is a notable improvement to an initial study where index adaptors were used (e.g. [31]) and comparable to Illumina's official multiplexing scheme with a lower cost. It is worth noting that the edit distances, or number of changes to transform one index sequence into another, are at least two among the 11 indices. Thus, with one sequencing error in the first five bps, a read will be assigned to a unique index. This is important, as we have observed higher errors rates at the 5′ and 3′ ends of reads. By including reads with one mismatch to the index, it was possible to reclaim an additional 5% of the total mappable reads (4.8 million reads).


A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-seq.

Wang L, Si Y, Dedow LK, Shao Y, Liu P, Brutnell TP - PLoS ONE (2011)

Average multiplex read distribution.Bar plot of the read distribution among the eleven indices used for this study. Y axis represents the average percentage of indexed reads relative to the total number of reads from each lane. X axis shows the index sequences. Data is averaged from six lanes of data with standard error shown.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3198403&req=5

pone-0026426-g005: Average multiplex read distribution.Bar plot of the read distribution among the eleven indices used for this study. Y axis represents the average percentage of indexed reads relative to the total number of reads from each lane. X axis shows the index sequences. Data is averaged from six lanes of data with standard error shown.
Mentions: One of the challenges of multiplexing samples is ensuring an even distribution of read counts across indexed libraries [30]. In the method described here, a combination of five nucleotides serves as the index with a T as the common fifth base pair to minimize biases caused by differences in ligation efficiency. The index itself is incorporated into the adaptor as illustrated in Figure S5. When samples are multiplexed using this design, they can be processed using single-end or pair-end sequencing on the Illumina platform. The read output starts with the index and a T followed by the target sequence. We tested a set of adaptors by multiplexing eleven rice leaf samples in one lane and sequenced the libraries using a total of six lanes on a GAIIx Illumina machine. The average ratio of indexed reads are shown in Figure 5. The percentage of reads derived from each of the eleven indices is relatively uniform, which is a notable improvement to an initial study where index adaptors were used (e.g. [31]) and comparable to Illumina's official multiplexing scheme with a lower cost. It is worth noting that the edit distances, or number of changes to transform one index sequence into another, are at least two among the 11 indices. Thus, with one sequencing error in the first five bps, a read will be assigned to a unique index. This is important, as we have observed higher errors rates at the 5′ and 3′ ends of reads. By including reads with one mismatch to the index, it was possible to reclaim an additional 5% of the total mappable reads (4.8 million reads).

Bottom Line: Our data supports novel gene models and can be used to improve current rice genome annotation.Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events.Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

View Article: PubMed Central - PubMed

Affiliation: Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York, United States of America.

ABSTRACT
The emergence of NextGen sequencing technology has generated much interest in the exploration of transcriptomes. Currently, Illumina Inc. (San Diego, CA) provides one of the most widely utilized sequencing platforms for gene expression analysis. While Illumina reagents and protocols perform adequately in RNA-sequencing (RNA-seq), alternative reagents and protocols promise a higher throughput at a much lower cost. We have developed a low-cost and robust protocol to produce Illumina-compatible (GAIIx and HiSeq2000 platforms) RNA-seq libraries by combining several recent improvements. First, we designed balanced adapter sequences for multiplexing of samples; second, dUTP incorporation in 2(nd) strand synthesis was used to enforce strand-specificity; third, we simplified RNA purification, fragmentation and library size-selection steps thus drastically reducing the time and increasing throughput of library construction; fourth, we included an RNA spike-in control for validation and normalization purposes. To streamline informatics analysis for the community, we established a pipeline within the iPlant Collaborative. These scripts are easily customized to meet specific research needs and improve on existing informatics and statistical treatments of RNA-seq data. In particular, we apply significance tests for determining differential gene expression and intron retention events. To demonstrate the potential of both the library-construction protocol and data-analysis pipeline, we characterized the transcriptome of the rice leaf. Our data supports novel gene models and can be used to improve current rice genome annotation. Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events. Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

Show MeSH
Related in: MedlinePlus