Limits...
Using mixtures of biological samples as process controls for RNA-sequencing experiments.

Parsons J, Munro S, Pine PS, McDaniel J, Mehaffey M, Salit M - BMC Genomics (2015)

Bottom Line: We describe and evaluate experiments characterizing the performance of RNA-sequencing (RNA-Seq) measurements, and discuss cases where mixtures can serve as effective process controls.Residuals from fitting the model to experimental data can be used as a metric for evaluating the effect that an individual step in an experimental process has on the linear response function and precision of the underlying measurement while identifying signals affected by interference from other sources.Our mixture analysis method also enables estimation of the proportions of an unknown mixture, even when component-specific markers are not previously known, whenever pure components are measured alongside the mixture.

View Article: PubMed Central - PubMed

Affiliation: Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899, USA. jerod.parsons@nist.gov.

ABSTRACT

Background: Genome-scale "-omics" measurements are challenging to benchmark due to the enormous variety of unique biological molecules involved. Mixtures of previously-characterized samples can be used to benchmark repeatability and reproducibility using component proportions as truth for the measurement. We describe and evaluate experiments characterizing the performance of RNA-sequencing (RNA-Seq) measurements, and discuss cases where mixtures can serve as effective process controls.

Results: We apply a linear model to total RNA mixture samples in RNA-seq experiments. This model provides a context for performance benchmarking. The parameters of the model fit to experimental results can be evaluated to assess bias and variability of the measurement of a mixture. A linear model describes the behavior of mixture expression measures and provides a context for performance benchmarking. Residuals from fitting the model to experimental data can be used as a metric for evaluating the effect that an individual step in an experimental process has on the linear response function and precision of the underlying measurement while identifying signals affected by interference from other sources. Effective benchmarking requires well-defined mixtures, which for RNA-Seq requires knowledge of the post-enrichment 'target RNA' content of the individual total RNA components. We demonstrate and evaluate an experimental method suitable for use in genome-scale process control and lay out a method utilizing spike-in controls to determine enriched RNA content of total RNA in samples.

Conclusions: Genome-scale process controls can be derived from mixtures. These controls relate prior knowledge of individual components to a complex mixture, allowing assessment of measurement performance. The target RNA fraction accounts for differential selection of RNA out of variable total RNA samples. Spike-in controls can be utilized to measure this relationship between target RNA content and input total RNA. Our mixture analysis method also enables estimation of the proportions of an unknown mixture, even when component-specific markers are not previously known, whenever pure components are measured alongside the mixture.

No MeSH data available.


Related in: MedlinePlus

Mixture proportion (Φ) estimates for samples A in SEQC-C and SEQC-D. The mean (black hollow circle) and standard deviation (error bars) of four individual replicates (colored) of the Φ estimate for each sample are shown. The nominal mixture proportions are grey points at the center of the target. Circles centered at that nominal ratio with radii in multiples of .025 are included to more easily identify magnitude of total error. LT and ILM tags indicate the manufacturer of the sequencer used at each lab (Life Technologies and Illumina, respectively). Deviations from the target indicate process variability, instrument bias, or errors brought about in these labs
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4574543&req=5

Fig5: Mixture proportion (Φ) estimates for samples A in SEQC-C and SEQC-D. The mean (black hollow circle) and standard deviation (error bars) of four individual replicates (colored) of the Φ estimate for each sample are shown. The nominal mixture proportions are grey points at the center of the target. Circles centered at that nominal ratio with radii in multiples of .025 are included to more easily identify magnitude of total error. LT and ILM tags indicate the manufacturer of the sequencer used at each lab (Life Technologies and Illumina, respectively). Deviations from the target indicate process variability, instrument bias, or errors brought about in these labs

Mentions: To demonstrate the accuracy of this analytical framework of mixtures, the mixture proportions ΦBLM were recalculated for the BLM mixtures BLM-1 and BLM-2. The ρ values and the sequencing expression data Xi were used to solve for the mixture proportions ΦBLM by linear regression to the mixture equation. Figure 3 shows that the experimentally observed counts are highly correlated (R^2 = 0.996) to the equation-solved counts Xi for each transcript. Figure 4 shows the ΦBLM values at which residuals were minimized for the two mixtures for each replicate sample in each laboratory. Estimates of the three component proportions in the two mixtures are consistent with the designed 25:25:50 and 25:50:25 proportions in the two BLM mixtures. Figure 5 shows that the designed proportions of SEQC mixtures across each of nine labs can also be calculated by this equation, returning the 75:25 and 25:75 proportions for mixes C and D, with some variability between labs. Eq. 1, which lacks correction for enrichment fraction, does not return the designed ratios (Additional file 4: Figure S3).Fig. 3


Using mixtures of biological samples as process controls for RNA-sequencing experiments.

Parsons J, Munro S, Pine PS, McDaniel J, Mehaffey M, Salit M - BMC Genomics (2015)

Mixture proportion (Φ) estimates for samples A in SEQC-C and SEQC-D. The mean (black hollow circle) and standard deviation (error bars) of four individual replicates (colored) of the Φ estimate for each sample are shown. The nominal mixture proportions are grey points at the center of the target. Circles centered at that nominal ratio with radii in multiples of .025 are included to more easily identify magnitude of total error. LT and ILM tags indicate the manufacturer of the sequencer used at each lab (Life Technologies and Illumina, respectively). Deviations from the target indicate process variability, instrument bias, or errors brought about in these labs
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4574543&req=5

Fig5: Mixture proportion (Φ) estimates for samples A in SEQC-C and SEQC-D. The mean (black hollow circle) and standard deviation (error bars) of four individual replicates (colored) of the Φ estimate for each sample are shown. The nominal mixture proportions are grey points at the center of the target. Circles centered at that nominal ratio with radii in multiples of .025 are included to more easily identify magnitude of total error. LT and ILM tags indicate the manufacturer of the sequencer used at each lab (Life Technologies and Illumina, respectively). Deviations from the target indicate process variability, instrument bias, or errors brought about in these labs
Mentions: To demonstrate the accuracy of this analytical framework of mixtures, the mixture proportions ΦBLM were recalculated for the BLM mixtures BLM-1 and BLM-2. The ρ values and the sequencing expression data Xi were used to solve for the mixture proportions ΦBLM by linear regression to the mixture equation. Figure 3 shows that the experimentally observed counts are highly correlated (R^2 = 0.996) to the equation-solved counts Xi for each transcript. Figure 4 shows the ΦBLM values at which residuals were minimized for the two mixtures for each replicate sample in each laboratory. Estimates of the three component proportions in the two mixtures are consistent with the designed 25:25:50 and 25:50:25 proportions in the two BLM mixtures. Figure 5 shows that the designed proportions of SEQC mixtures across each of nine labs can also be calculated by this equation, returning the 75:25 and 25:75 proportions for mixes C and D, with some variability between labs. Eq. 1, which lacks correction for enrichment fraction, does not return the designed ratios (Additional file 4: Figure S3).Fig. 3

Bottom Line: We describe and evaluate experiments characterizing the performance of RNA-sequencing (RNA-Seq) measurements, and discuss cases where mixtures can serve as effective process controls.Residuals from fitting the model to experimental data can be used as a metric for evaluating the effect that an individual step in an experimental process has on the linear response function and precision of the underlying measurement while identifying signals affected by interference from other sources.Our mixture analysis method also enables estimation of the proportions of an unknown mixture, even when component-specific markers are not previously known, whenever pure components are measured alongside the mixture.

View Article: PubMed Central - PubMed

Affiliation: Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899, USA. jerod.parsons@nist.gov.

ABSTRACT

Background: Genome-scale "-omics" measurements are challenging to benchmark due to the enormous variety of unique biological molecules involved. Mixtures of previously-characterized samples can be used to benchmark repeatability and reproducibility using component proportions as truth for the measurement. We describe and evaluate experiments characterizing the performance of RNA-sequencing (RNA-Seq) measurements, and discuss cases where mixtures can serve as effective process controls.

Results: We apply a linear model to total RNA mixture samples in RNA-seq experiments. This model provides a context for performance benchmarking. The parameters of the model fit to experimental results can be evaluated to assess bias and variability of the measurement of a mixture. A linear model describes the behavior of mixture expression measures and provides a context for performance benchmarking. Residuals from fitting the model to experimental data can be used as a metric for evaluating the effect that an individual step in an experimental process has on the linear response function and precision of the underlying measurement while identifying signals affected by interference from other sources. Effective benchmarking requires well-defined mixtures, which for RNA-Seq requires knowledge of the post-enrichment 'target RNA' content of the individual total RNA components. We demonstrate and evaluate an experimental method suitable for use in genome-scale process control and lay out a method utilizing spike-in controls to determine enriched RNA content of total RNA in samples.

Conclusions: Genome-scale process controls can be derived from mixtures. These controls relate prior knowledge of individual components to a complex mixture, allowing assessment of measurement performance. The target RNA fraction accounts for differential selection of RNA out of variable total RNA samples. Spike-in controls can be utilized to measure this relationship between target RNA content and input total RNA. Our mixture analysis method also enables estimation of the proportions of an unknown mixture, even when component-specific markers are not previously known, whenever pure components are measured alongside the mixture.

No MeSH data available.


Related in: MedlinePlus