Limits...
Quality control method for RNA-seq using single nucleotide polymorphism allele frequency.

Endo TA - Genes Cells (2014)

Bottom Line: When we use transcriptome data with whole-genome single nucleotide polymorphism (SNP) variant information, the allele frequency can show the genetic composition of the cell population and/or chromosomal aberrations.Here, I show how SNPs in mRNAs can be used to evaluate RNA-seq experiments by focusing on RNA-seq data based on a recently retracted paper on stimulus-triggered acquisition of pluripotency (STAP) cells.This re-evaluation showed that observing allele frequencies could help in assessing the quality of samples during a study and with retrospective evaluation of experimental quality.

View Article: PubMed Central - PubMed

Affiliation: RIKEN Center for Integrative Medical Science (IMS-RIKEN), 1-7-22 Suehiro-Cho, Tsurumi-Ku, Yokohama, Kanagawa, 230-0045, Japan.

Show MeSH

Related in: MedlinePlus

Allele frequency analysis of RNA-seq data. (A) Simulation of SNP allele frequencies using a modified binomial distribution. Peak position was determined by the composition of two alleles, and variance of the distribution was dependent on sd, standard deviation of simulated PCR bias. (B) SNP distributions in several cell types. ESCs (red, SRR1047502, 129B6F1 background), iPSs derived from fibroblasts (yellow, SRR1047504, 129B6F1), MEFs (blue, SRR104220, 129B6F1), normal fibroblasts (NFs; green, SRR1191170, B6 x BALB/c), cancer-associated fibroblasts (CAFs; purple, SRR1191171, B6 x BALB/c), and HSCs (gray, SRR892995, B6). The number of applied SNPs for each cell type is shown in parentheses in each box. (C) Allele frequency of HSC samples contaminated with different percentages of ESCs as shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4231238&req=5

fig01: Allele frequency analysis of RNA-seq data. (A) Simulation of SNP allele frequencies using a modified binomial distribution. Peak position was determined by the composition of two alleles, and variance of the distribution was dependent on sd, standard deviation of simulated PCR bias. (B) SNP distributions in several cell types. ESCs (red, SRR1047502, 129B6F1 background), iPSs derived from fibroblasts (yellow, SRR1047504, 129B6F1), MEFs (blue, SRR104220, 129B6F1), normal fibroblasts (NFs; green, SRR1191170, B6 x BALB/c), cancer-associated fibroblasts (CAFs; purple, SRR1191171, B6 x BALB/c), and HSCs (gray, SRR892995, B6). The number of applied SNPs for each cell type is shown in parentheses in each box. (C) Allele frequency of HSC samples contaminated with different percentages of ESCs as shown.

Mentions: The simulation in Fig.1A was carried out using conditions where N = 50 and β − α followed a Gaussian distribution, having standard deviation of 0 (no PCR bias) or 1 (high PCR bias). The simulation indicated that the variance of the distribution was highly dependent on PCR bias and that the mode of the distribution corresponded to the composition of SNP alleles. Allele frequencies of several sets of RNA-seq data from various cell types obtained from public databases were examined, and the results agreed with the simulation (Fig.1B). Peaks at 0 and 100% might result from the homozygous SNPs in observed cells. An artificial contaminating situation was also generated with random sampling of RNA-seq datasets from two cell categories, pure C57BL/6 (B6) hematopoietic stem cells (HSCs), and a mixture of 129 and B6 embryonic stem cells (129B6F1 ESCs) at various ratios. The curve shape and peak positions varied along the ratio as shown in the mathematical simulation (Fig.1C, gray line).


Quality control method for RNA-seq using single nucleotide polymorphism allele frequency.

Endo TA - Genes Cells (2014)

Allele frequency analysis of RNA-seq data. (A) Simulation of SNP allele frequencies using a modified binomial distribution. Peak position was determined by the composition of two alleles, and variance of the distribution was dependent on sd, standard deviation of simulated PCR bias. (B) SNP distributions in several cell types. ESCs (red, SRR1047502, 129B6F1 background), iPSs derived from fibroblasts (yellow, SRR1047504, 129B6F1), MEFs (blue, SRR104220, 129B6F1), normal fibroblasts (NFs; green, SRR1191170, B6 x BALB/c), cancer-associated fibroblasts (CAFs; purple, SRR1191171, B6 x BALB/c), and HSCs (gray, SRR892995, B6). The number of applied SNPs for each cell type is shown in parentheses in each box. (C) Allele frequency of HSC samples contaminated with different percentages of ESCs as shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4231238&req=5

fig01: Allele frequency analysis of RNA-seq data. (A) Simulation of SNP allele frequencies using a modified binomial distribution. Peak position was determined by the composition of two alleles, and variance of the distribution was dependent on sd, standard deviation of simulated PCR bias. (B) SNP distributions in several cell types. ESCs (red, SRR1047502, 129B6F1 background), iPSs derived from fibroblasts (yellow, SRR1047504, 129B6F1), MEFs (blue, SRR104220, 129B6F1), normal fibroblasts (NFs; green, SRR1191170, B6 x BALB/c), cancer-associated fibroblasts (CAFs; purple, SRR1191171, B6 x BALB/c), and HSCs (gray, SRR892995, B6). The number of applied SNPs for each cell type is shown in parentheses in each box. (C) Allele frequency of HSC samples contaminated with different percentages of ESCs as shown.
Mentions: The simulation in Fig.1A was carried out using conditions where N = 50 and β − α followed a Gaussian distribution, having standard deviation of 0 (no PCR bias) or 1 (high PCR bias). The simulation indicated that the variance of the distribution was highly dependent on PCR bias and that the mode of the distribution corresponded to the composition of SNP alleles. Allele frequencies of several sets of RNA-seq data from various cell types obtained from public databases were examined, and the results agreed with the simulation (Fig.1B). Peaks at 0 and 100% might result from the homozygous SNPs in observed cells. An artificial contaminating situation was also generated with random sampling of RNA-seq datasets from two cell categories, pure C57BL/6 (B6) hematopoietic stem cells (HSCs), and a mixture of 129 and B6 embryonic stem cells (129B6F1 ESCs) at various ratios. The curve shape and peak positions varied along the ratio as shown in the mathematical simulation (Fig.1C, gray line).

Bottom Line: When we use transcriptome data with whole-genome single nucleotide polymorphism (SNP) variant information, the allele frequency can show the genetic composition of the cell population and/or chromosomal aberrations.Here, I show how SNPs in mRNAs can be used to evaluate RNA-seq experiments by focusing on RNA-seq data based on a recently retracted paper on stimulus-triggered acquisition of pluripotency (STAP) cells.This re-evaluation showed that observing allele frequencies could help in assessing the quality of samples during a study and with retrospective evaluation of experimental quality.

View Article: PubMed Central - PubMed

Affiliation: RIKEN Center for Integrative Medical Science (IMS-RIKEN), 1-7-22 Suehiro-Cho, Tsurumi-Ku, Yokohama, Kanagawa, 230-0045, Japan.

Show MeSH
Related in: MedlinePlus