Limits...
Evaluating bias-reducing protocols for RNA sequencing library preparation.

Jackson TJ, Spriggs RV, Burgoyne NJ, Jones C, Willis AE - BMC Genomics (2014)

Bottom Line: Next-generation sequencing does not yield fully unbiased estimates for read abundance, which may impact on the conclusions that can be drawn from sequencing data.The CircLig protocol resulted in less over-representation of specific sequences than the standard protocol.Ligases that function at temperatures to remove the possible influence of secondary structure on library generation may be of value, although Mth K97A is not effective in this case.

View Article: PubMed Central - PubMed

Affiliation: Medical Research Council Toxicology Unit, Lancaster Rd, Leicester LE1 9HN, UK. mzyatjj@nottingham.ac.uk.

ABSTRACT

Background: Next-generation sequencing does not yield fully unbiased estimates for read abundance, which may impact on the conclusions that can be drawn from sequencing data. The ligation step in RNA sequencing library generation is a known source of bias, motivating developments in enzyme technology and library construction protocols. We present the first comparison of the standard duplex adaptor protocol supplied by Life Technologies for use on the Ion Torrent PGM with an alternate single adaptor approach involving CircLigase (CircLig protocol).A correlation between over-representation in sequenced libraries and degree of secondary structure has been reported previously, therefore we also investigated whether bias could be reduced by ligation with an enzyme that functions at a temperature not permissive for such structure.

Results: A pool of small RNA fragments of known composition was converted into a sequencing library using one of three protocols and sequenced on an Ion Torrent PGM. The CircLig protocol resulted in less over-representation of specific sequences than the standard protocol. Over-represented sequences are more likely to be predicted to have secondary structure and to co-fold with adaptor sequences. However, use of the thermostable ligase Methanobacterium thermoautotrophicum RNA ligase K97A (Mth K97A) was not sufficient to reduce bias.

Conclusions: The single adaptor CircLigase-based approach significantly reduces, but does not eliminate, bias in Ion Torrent data. Ligases that function at temperatures to remove the possible influence of secondary structure on library generation may be of value, although Mth K97A is not effective in this case.

Show MeSH
Over-representation in sequencing libraries. Sequencing libraries were generated from the partially degenerate RNA pool using either the standard protocol (standard), the CircLig protocol with trRnl2 K227Q (rnl2) or the CircLig protocol with Mth K97A (mth). The abundance (read density) of each unique sequence within the degenerate region was calculated as a ratio of the total read data (reads per million sequenced; RPM). The density of the 1000 most abundant sequences are presented for each library. The theoretical is X ~ Binominal(3x106, 1/410).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4117970&req=5

Fig3: Over-representation in sequencing libraries. Sequencing libraries were generated from the partially degenerate RNA pool using either the standard protocol (standard), the CircLig protocol with trRnl2 K227Q (rnl2) or the CircLig protocol with Mth K97A (mth). The abundance (read density) of each unique sequence within the degenerate region was calculated as a ratio of the total read data (reads per million sequenced; RPM). The density of the 1000 most abundant sequences are presented for each library. The theoretical is X ~ Binominal(3x106, 1/410).

Mentions: Degeneracy at 10 positions yields 1,048,576 (i.e. 410) unique sequences at equimolar concentrations, giving a theoretical read distribution of X ~ Binomial(n,1/410) where X is the number of times a particular sequence is observed in n sequenced reads. All library protocols resulted in read distributions that were over-dispersed relative to the theoretical (Figure 3 and Additional file2). Goodness of fit comparisons between each distribution and the theoretical yielded the minimum computable p-value in the R software environment (discrete Kolmogorov –Smirnov test p < 2.2x10-16). The most abundant sequences were present at least 5 times more than would be expected in the absence of bias. However, the most abundant sequences from the trRnl2 K227Q CircLig library were approximately half as over-dispersed as those from the standard library and reached statistical significance (discrete Kolmogorov-Smirnov test, p < 2.2x10-16). Somewhat surprisingly, replacing trRnl2 K227Q with Mth K97A did not further reduce over-representation.Figure 3


Evaluating bias-reducing protocols for RNA sequencing library preparation.

Jackson TJ, Spriggs RV, Burgoyne NJ, Jones C, Willis AE - BMC Genomics (2014)

Over-representation in sequencing libraries. Sequencing libraries were generated from the partially degenerate RNA pool using either the standard protocol (standard), the CircLig protocol with trRnl2 K227Q (rnl2) or the CircLig protocol with Mth K97A (mth). The abundance (read density) of each unique sequence within the degenerate region was calculated as a ratio of the total read data (reads per million sequenced; RPM). The density of the 1000 most abundant sequences are presented for each library. The theoretical is X ~ Binominal(3x106, 1/410).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4117970&req=5

Fig3: Over-representation in sequencing libraries. Sequencing libraries were generated from the partially degenerate RNA pool using either the standard protocol (standard), the CircLig protocol with trRnl2 K227Q (rnl2) or the CircLig protocol with Mth K97A (mth). The abundance (read density) of each unique sequence within the degenerate region was calculated as a ratio of the total read data (reads per million sequenced; RPM). The density of the 1000 most abundant sequences are presented for each library. The theoretical is X ~ Binominal(3x106, 1/410).
Mentions: Degeneracy at 10 positions yields 1,048,576 (i.e. 410) unique sequences at equimolar concentrations, giving a theoretical read distribution of X ~ Binomial(n,1/410) where X is the number of times a particular sequence is observed in n sequenced reads. All library protocols resulted in read distributions that were over-dispersed relative to the theoretical (Figure 3 and Additional file2). Goodness of fit comparisons between each distribution and the theoretical yielded the minimum computable p-value in the R software environment (discrete Kolmogorov –Smirnov test p < 2.2x10-16). The most abundant sequences were present at least 5 times more than would be expected in the absence of bias. However, the most abundant sequences from the trRnl2 K227Q CircLig library were approximately half as over-dispersed as those from the standard library and reached statistical significance (discrete Kolmogorov-Smirnov test, p < 2.2x10-16). Somewhat surprisingly, replacing trRnl2 K227Q with Mth K97A did not further reduce over-representation.Figure 3

Bottom Line: Next-generation sequencing does not yield fully unbiased estimates for read abundance, which may impact on the conclusions that can be drawn from sequencing data.The CircLig protocol resulted in less over-representation of specific sequences than the standard protocol.Ligases that function at temperatures to remove the possible influence of secondary structure on library generation may be of value, although Mth K97A is not effective in this case.

View Article: PubMed Central - PubMed

Affiliation: Medical Research Council Toxicology Unit, Lancaster Rd, Leicester LE1 9HN, UK. mzyatjj@nottingham.ac.uk.

ABSTRACT

Background: Next-generation sequencing does not yield fully unbiased estimates for read abundance, which may impact on the conclusions that can be drawn from sequencing data. The ligation step in RNA sequencing library generation is a known source of bias, motivating developments in enzyme technology and library construction protocols. We present the first comparison of the standard duplex adaptor protocol supplied by Life Technologies for use on the Ion Torrent PGM with an alternate single adaptor approach involving CircLigase (CircLig protocol).A correlation between over-representation in sequenced libraries and degree of secondary structure has been reported previously, therefore we also investigated whether bias could be reduced by ligation with an enzyme that functions at a temperature not permissive for such structure.

Results: A pool of small RNA fragments of known composition was converted into a sequencing library using one of three protocols and sequenced on an Ion Torrent PGM. The CircLig protocol resulted in less over-representation of specific sequences than the standard protocol. Over-represented sequences are more likely to be predicted to have secondary structure and to co-fold with adaptor sequences. However, use of the thermostable ligase Methanobacterium thermoautotrophicum RNA ligase K97A (Mth K97A) was not sufficient to reduce bias.

Conclusions: The single adaptor CircLigase-based approach significantly reduces, but does not eliminate, bias in Ion Torrent data. Ligases that function at temperatures to remove the possible influence of secondary structure on library generation may be of value, although Mth K97A is not effective in this case.

Show MeSH