Limits...
RNA-seq: technical variability and sampling.

McIntyre LM, Lopiano KK, Morse AM, Amin V, Oberg AL, Young LJ, Nuzhdin SV - BMC Genomics (2011)

Bottom Line: Technical variability results in inconsistent detection of exons at low levels of coverage.This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads.We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida, USA. mcintyre@ufl.edu

ABSTRACT

Background: RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript.

Results: In this study three independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage.

Conclusions: Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.

Show MeSH
Experimental Design. A figure showing the design of the three experiments evaluated here. Biological replicates are separate individuals used for library construction. Technical replicates for the D. melanogaster female heads and D. simulans male heads data are a single library run on multiple lanes. For D. melanogaster c167 cell lines the exact nature of the technical replication is uncertain.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3141664&req=5

Figure 1: Experimental Design. A figure showing the design of the three experiments evaluated here. Biological replicates are separate individuals used for library construction. Technical replicates for the D. melanogaster female heads and D. simulans male heads data are a single library run on multiple lanes. For D. melanogaster c167 cell lines the exact nature of the technical replication is uncertain.

Mentions: Experiment 1: Three independent samples of D. melanogaster female heads were collected with each sample representing a unique pool of biological material. Each sample was prepared according to manufacturer's instructions and then the same library was run on two lanes of a Solexa/Illumina flow cell, resulting in two technical replicates for each biological replicate, runs were 36 base-pair paired end. Experiment 2: Three independent samples of D. simulans male heads were collected with each sample representing a unique pool of biological material. Each sample was prepared according to manufacturer's instructions and then the same library was run on two lanes of a Solexa/Illumina flow cell, resulting in two technical replicates for each biological replicate, runs were 36 base-pair paired end. (Data will be deposited to the SRA) Experiment 3: One sample of D. melanogaster cell line c167 was run on 5 lanes yielding 5 technical replicates for 1 biological replicate (from modENCODE experiment GSE17107) [16]. Data from these experiments were 36 bp reads. The experimental design for experiments 1 - 3 are outlined in Figure 1. All samples were run using Solexa/Illumina paired end procedures. The relationship of lanes (same or independent flow cells) is unknown for all three experiments. The first two experiments were conducted by Joe Dunham and Michele Arbeitman at USC and sequenced at Oregon State core facility while the third experiment was conducted by Susan Celniker at Lawrence Livermore national laboratory (SRA009944).


RNA-seq: technical variability and sampling.

McIntyre LM, Lopiano KK, Morse AM, Amin V, Oberg AL, Young LJ, Nuzhdin SV - BMC Genomics (2011)

Experimental Design. A figure showing the design of the three experiments evaluated here. Biological replicates are separate individuals used for library construction. Technical replicates for the D. melanogaster female heads and D. simulans male heads data are a single library run on multiple lanes. For D. melanogaster c167 cell lines the exact nature of the technical replication is uncertain.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3141664&req=5

Figure 1: Experimental Design. A figure showing the design of the three experiments evaluated here. Biological replicates are separate individuals used for library construction. Technical replicates for the D. melanogaster female heads and D. simulans male heads data are a single library run on multiple lanes. For D. melanogaster c167 cell lines the exact nature of the technical replication is uncertain.
Mentions: Experiment 1: Three independent samples of D. melanogaster female heads were collected with each sample representing a unique pool of biological material. Each sample was prepared according to manufacturer's instructions and then the same library was run on two lanes of a Solexa/Illumina flow cell, resulting in two technical replicates for each biological replicate, runs were 36 base-pair paired end. Experiment 2: Three independent samples of D. simulans male heads were collected with each sample representing a unique pool of biological material. Each sample was prepared according to manufacturer's instructions and then the same library was run on two lanes of a Solexa/Illumina flow cell, resulting in two technical replicates for each biological replicate, runs were 36 base-pair paired end. (Data will be deposited to the SRA) Experiment 3: One sample of D. melanogaster cell line c167 was run on 5 lanes yielding 5 technical replicates for 1 biological replicate (from modENCODE experiment GSE17107) [16]. Data from these experiments were 36 bp reads. The experimental design for experiments 1 - 3 are outlined in Figure 1. All samples were run using Solexa/Illumina paired end procedures. The relationship of lanes (same or independent flow cells) is unknown for all three experiments. The first two experiments were conducted by Joe Dunham and Michele Arbeitman at USC and sequenced at Oregon State core facility while the third experiment was conducted by Susan Celniker at Lawrence Livermore national laboratory (SRA009944).

Bottom Line: Technical variability results in inconsistent detection of exons at low levels of coverage.This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads.We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida, USA. mcintyre@ufl.edu

ABSTRACT

Background: RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript.

Results: In this study three independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage.

Conclusions: Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.

Show MeSH