Limits...
The Power Decoder Simulator for the Evaluation of Pooled shRNA Screen Performance.

Stombaugh J, Licon A, Strezoska Ž, Stahl J, Anderson SB, Banos M, van Brabant Smith A, Birmingham A, Vermeulen A - J Biomol Screen (2015)

Bottom Line: Using the negative binomial distribution, it models both the relative abundance of multiple shRNAs within a single screening replicate and the biological noise between replicates for each individual shRNA.We demonstrate that this simulator can successfully model the data from an actual laboratory experiment.The Power Decoder simulator is written in R and Python and is available for download under the GNU General Public License v3.0.

View Article: PubMed Central - PubMed

Affiliation: Dharmacon, part of GE Healthcare, Lafayette, CO, USA.

No MeSH data available.


Modeled next-generation sequencing (NGS) screen data compared with actual experimental NGS screen data. Kernel density estimate plots for the distributions of NGS counts for representative examples of normalized actual (red) and simulated (blue) T0 data generated by fitting parameters to the negative binomial distribution for (A) Screen 100_2x and (C) Screen 500_2x. Cumulative distributions of the same actual and simulated T0 count distributions for (B) Screen 100_2x and (D) Screen 500_2x.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2 - License 3
getmorefigures.php?uid=PMC4543901&req=5

fig2-1087057115576715: Modeled next-generation sequencing (NGS) screen data compared with actual experimental NGS screen data. Kernel density estimate plots for the distributions of NGS counts for representative examples of normalized actual (red) and simulated (blue) T0 data generated by fitting parameters to the negative binomial distribution for (A) Screen 100_2x and (C) Screen 500_2x. Cumulative distributions of the same actual and simulated T0 count distributions for (B) Screen 100_2x and (D) Screen 500_2x.

Mentions: To determine whether a negative binomial distribution can accurately model shRNA counts, we used data from T0, in which all 5635 shRNAs are represented equally on average. For both Screen 100_2x and Screen 500_2x, mean and dispersion parameters for a negative binomial distribution were obtained from the normalized means of the experimentally determined counts for each shRNA, and simulated shRNA count data were then generated by sampling from the NBD with these experimentally determined parameters. We generated Gaussian kernel density estimates (Fig. 2A, C), which can be used to estimate continuous probability density functions of discrete random variables such as count, as well as the related cumulative distributions (Fig. 2B, D), which show the cumulative probability that a random variable such as count will have a value less than or equal to any particular amount. Comparisons of these visualizations demonstrate the similarity between actual and estimated distributions for Screen 100_2x and Screen 500_2x experiments, respectively. The negative binomial models capture the shape of the actual data, although, as expected with simulations, they are not perfect mimics of the experimental data sets (being more smoothly distributed and slightly more leftward-skewed). Given the extremely large sample sizes in each data set (from ~3,000,000 up to ~9,000,000 reads), statistical evaluations of whether two distributions are similar will be able to detect even very small (and possibly inconsequential) differences as statistically significant.22 We performed a Kolmogorov-Smirnov test on each screen/model pair, and, as anticipated, received highly significant p values for both Screen 100_2x and Screen 500_2x, indicating that the modeled and experimental distributions are perceptibly different. However, the D statistic, which measures the actual magnitude of the difference between the distributions and ranges from 0 when distributions are the same to 1 when they are completely dissimilar, averaged less than 0.09 for both screens, with standard deviations for both of less than 0.007 over 900 separate simulations. This demonstrates that the difference between the models and experimental data, although detectable, is small; for comparison, the most closely correlated replicate pairs had average D statistics (across 30 normalizations) of 0.0363 and 0.0692 for Screen 100_2x and Screen 500_2x. Simulated counts can therefore reasonably be used to represent “true” shRNA counts and employed as the basis on which to simulate the biological noise of replicates.


The Power Decoder Simulator for the Evaluation of Pooled shRNA Screen Performance.

Stombaugh J, Licon A, Strezoska Ž, Stahl J, Anderson SB, Banos M, van Brabant Smith A, Birmingham A, Vermeulen A - J Biomol Screen (2015)

Modeled next-generation sequencing (NGS) screen data compared with actual experimental NGS screen data. Kernel density estimate plots for the distributions of NGS counts for representative examples of normalized actual (red) and simulated (blue) T0 data generated by fitting parameters to the negative binomial distribution for (A) Screen 100_2x and (C) Screen 500_2x. Cumulative distributions of the same actual and simulated T0 count distributions for (B) Screen 100_2x and (D) Screen 500_2x.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2 - License 3
Show All Figures
getmorefigures.php?uid=PMC4543901&req=5

fig2-1087057115576715: Modeled next-generation sequencing (NGS) screen data compared with actual experimental NGS screen data. Kernel density estimate plots for the distributions of NGS counts for representative examples of normalized actual (red) and simulated (blue) T0 data generated by fitting parameters to the negative binomial distribution for (A) Screen 100_2x and (C) Screen 500_2x. Cumulative distributions of the same actual and simulated T0 count distributions for (B) Screen 100_2x and (D) Screen 500_2x.
Mentions: To determine whether a negative binomial distribution can accurately model shRNA counts, we used data from T0, in which all 5635 shRNAs are represented equally on average. For both Screen 100_2x and Screen 500_2x, mean and dispersion parameters for a negative binomial distribution were obtained from the normalized means of the experimentally determined counts for each shRNA, and simulated shRNA count data were then generated by sampling from the NBD with these experimentally determined parameters. We generated Gaussian kernel density estimates (Fig. 2A, C), which can be used to estimate continuous probability density functions of discrete random variables such as count, as well as the related cumulative distributions (Fig. 2B, D), which show the cumulative probability that a random variable such as count will have a value less than or equal to any particular amount. Comparisons of these visualizations demonstrate the similarity between actual and estimated distributions for Screen 100_2x and Screen 500_2x experiments, respectively. The negative binomial models capture the shape of the actual data, although, as expected with simulations, they are not perfect mimics of the experimental data sets (being more smoothly distributed and slightly more leftward-skewed). Given the extremely large sample sizes in each data set (from ~3,000,000 up to ~9,000,000 reads), statistical evaluations of whether two distributions are similar will be able to detect even very small (and possibly inconsequential) differences as statistically significant.22 We performed a Kolmogorov-Smirnov test on each screen/model pair, and, as anticipated, received highly significant p values for both Screen 100_2x and Screen 500_2x, indicating that the modeled and experimental distributions are perceptibly different. However, the D statistic, which measures the actual magnitude of the difference between the distributions and ranges from 0 when distributions are the same to 1 when they are completely dissimilar, averaged less than 0.09 for both screens, with standard deviations for both of less than 0.007 over 900 separate simulations. This demonstrates that the difference between the models and experimental data, although detectable, is small; for comparison, the most closely correlated replicate pairs had average D statistics (across 30 normalizations) of 0.0363 and 0.0692 for Screen 100_2x and Screen 500_2x. Simulated counts can therefore reasonably be used to represent “true” shRNA counts and employed as the basis on which to simulate the biological noise of replicates.

Bottom Line: Using the negative binomial distribution, it models both the relative abundance of multiple shRNAs within a single screening replicate and the biological noise between replicates for each individual shRNA.We demonstrate that this simulator can successfully model the data from an actual laboratory experiment.The Power Decoder simulator is written in R and Python and is available for download under the GNU General Public License v3.0.

View Article: PubMed Central - PubMed

Affiliation: Dharmacon, part of GE Healthcare, Lafayette, CO, USA.

No MeSH data available.