Quality control method for RNA-seq using single nucleotide polymorphism allele frequency.
Bottom Line: When we use transcriptome data with whole-genome single nucleotide polymorphism (SNP) variant information, the allele frequency can show the genetic composition of the cell population and/or chromosomal aberrations.Here, I show how SNPs in mRNAs can be used to evaluate RNA-seq experiments by focusing on RNA-seq data based on a recently retracted paper on stimulus-triggered acquisition of pluripotency (STAP) cells.This re-evaluation showed that observing allele frequencies could help in assessing the quality of samples during a study and with retrospective evaluation of experimental quality.
Affiliation: RIKEN Center for Integrative Medical Science (IMS-RIKEN), 1-7-22 Suehiro-Cho, Tsurumi-Ku, Yokohama, Kanagawa, 230-0045, Japan.Show MeSH
Related in: MedlinePlus
Mentions: The simulation in Fig.1A was carried out using conditions where N = 50 and β − α followed a Gaussian distribution, having standard deviation of 0 (no PCR bias) or 1 (high PCR bias). The simulation indicated that the variance of the distribution was highly dependent on PCR bias and that the mode of the distribution corresponded to the composition of SNP alleles. Allele frequencies of several sets of RNA-seq data from various cell types obtained from public databases were examined, and the results agreed with the simulation (Fig.1B). Peaks at 0 and 100% might result from the homozygous SNPs in observed cells. An artificial contaminating situation was also generated with random sampling of RNA-seq datasets from two cell categories, pure C57BL/6 (B6) hematopoietic stem cells (HSCs), and a mixture of 129 and B6 embryonic stem cells (129B6F1 ESCs) at various ratios. The curve shape and peak positions varied along the ratio as shown in the mathematical simulation (Fig.1C, gray line).
Affiliation: RIKEN Center for Integrative Medical Science (IMS-RIKEN), 1-7-22 Suehiro-Cho, Tsurumi-Ku, Yokohama, Kanagawa, 230-0045, Japan.