Limits...
Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes.

González E, Joly S - BMC Res Notes (2013)

Bottom Line: All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes.The effect of paired-end vs. single-end strategy was found to have a much greater impact in terms of false positives than sequence length.In light of false positive rate results, we recommend using paired-end over single-end sequences in differential expression studies, even if the impact is less serious for differential gene expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institut de recherche en biologie végétale, Université de Montréal, 4101 Sherbrooke E, Montréal, H1X 2B2, (QC), Canada. emmanuel.gonzalez@umontreal.ca.

ABSTRACT

Background: High-throughput RNA sequencing studies are becoming increasingly popular and differential expression studies represent an important downstream analysis that often follow de novo transcriptome assembly. If a lot of attention has been given to bioinformatics tools for differential gene expression, little has yet been given to the impact of the sequence data itself used in pipelines.

Results: We tested how using different types of reads from the ones used to assemble a de novo transcriptome (both differing in length and pairing attributes) could potentially affect differential expression (DE) results. To investigate this, we created artificial datasets out of long paired-end RNA-seq datasets initially used to build the assembly. All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes. If the false positive rate for differential gene expression does not seem to be strongly affected by sequencing strategy (max. of 3.5%), it could reach 12.2% or 28.1% for differential isoform expression depending of the pipeline used. The effect of paired-end vs. single-end strategy was found to have a much greater impact in terms of false positives than sequence length.

Conclusion: In light of false positive rate results, we recommend using paired-end over single-end sequences in differential expression studies, even if the impact is less serious for differential gene expression.

Show MeSH
Scattered plots of isoform (red) and gene (blue) log-transformed expression between all Salix purpurea sequence sets. The numbers indicate the Pearson correlation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4222115&req=5

Figure 4: Scattered plots of isoform (red) and gene (blue) log-transformed expression between all Salix purpurea sequence sets. The numbers indicate the Pearson correlation.

Mentions: Although the main results are given in terms of FDR, the overall pattern is the same when considering the correlation in transcripts or genes counts for pairs of datasets (Figure 4). That is, scatterplots are more scattered and correlations smaller for isoforms than for genes, and the strong effect of sequence type can also be observed (Figure 4).


Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes.

González E, Joly S - BMC Res Notes (2013)

Scattered plots of isoform (red) and gene (blue) log-transformed expression between all Salix purpurea sequence sets. The numbers indicate the Pearson correlation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4222115&req=5

Figure 4: Scattered plots of isoform (red) and gene (blue) log-transformed expression between all Salix purpurea sequence sets. The numbers indicate the Pearson correlation.
Mentions: Although the main results are given in terms of FDR, the overall pattern is the same when considering the correlation in transcripts or genes counts for pairs of datasets (Figure 4). That is, scatterplots are more scattered and correlations smaller for isoforms than for genes, and the strong effect of sequence type can also be observed (Figure 4).

Bottom Line: All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes.The effect of paired-end vs. single-end strategy was found to have a much greater impact in terms of false positives than sequence length.In light of false positive rate results, we recommend using paired-end over single-end sequences in differential expression studies, even if the impact is less serious for differential gene expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institut de recherche en biologie végétale, Université de Montréal, 4101 Sherbrooke E, Montréal, H1X 2B2, (QC), Canada. emmanuel.gonzalez@umontreal.ca.

ABSTRACT

Background: High-throughput RNA sequencing studies are becoming increasingly popular and differential expression studies represent an important downstream analysis that often follow de novo transcriptome assembly. If a lot of attention has been given to bioinformatics tools for differential gene expression, little has yet been given to the impact of the sequence data itself used in pipelines.

Results: We tested how using different types of reads from the ones used to assemble a de novo transcriptome (both differing in length and pairing attributes) could potentially affect differential expression (DE) results. To investigate this, we created artificial datasets out of long paired-end RNA-seq datasets initially used to build the assembly. All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes. If the false positive rate for differential gene expression does not seem to be strongly affected by sequencing strategy (max. of 3.5%), it could reach 12.2% or 28.1% for differential isoform expression depending of the pipeline used. The effect of paired-end vs. single-end strategy was found to have a much greater impact in terms of false positives than sequence length.

Conclusion: In light of false positive rate results, we recommend using paired-end over single-end sequences in differential expression studies, even if the impact is less serious for differential gene expression.

Show MeSH