Summarizing specific profiles in Illumina sequencing from whole-genome amplified DNA.
Bottom Line: Detailed analysis of the reads from amplified libraries revealed characteristics suggesting that majority of amplified fragment ends are identical but inverted versions of each other.Read coverage in amplified libraries is correlated with both tandem and inverted repeat content, while GC content only influences sequencing in long-insert libraries.To utilize the full potential of WGA to reveal the real biological interest, this article highlights the importance of recognizing additional sources of errors from amplified sequence reads and discusses the potential implications in downstream analyses.
Affiliation: Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK Faculty of Medicine, Division of Parasitology, Department of Infectious Disease, University of Miyazaki, Miyazaki 889-1692, Japan.Show MeSH
Related in: MedlinePlus
Mentions: One of the most important criteria for accurate variant calling and assemblies from Illumina reads is an even coverage of sequence data genome-wide. We first evaluated the variability in the depth of coverage of short-insert reads32 by plotting the cumulative fraction of normalized depth of correctly paired read coverage that covers a given cumulative fraction of genome (Fig. 2). Normalization of read coverage depth allows libraries of different coverage depths to be compared with each other. The theoretical line (Fig. 2) indicates a perfectly uniform distribution of reads where 100% of the genome is covered by reads with a normalized and consistent depth of 1. Figure 2 shows that both replicates of the unamplified short-insert library have the closest fit to the theoretical line, suggesting the most uniform distribution of reads. The remaining samples show some level of deviation, suggesting non-uniform distribution across the genome. Distribution plots of the long-insert libraries show more deviation away from the theoretical distribution than short-insert libraries. This effect is more evident in the lower tail of the distribution, indicating a greater proportion of the genome has lower coverage. By inspecting regions of lower coverage across all libraries, the most evident patterns are regions enriched in G homopolymers tracts and GGC motifs33 (Supplementary Fig. S7).Figure 2.
Affiliation: Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK Faculty of Medicine, Division of Parasitology, Department of Infectious Disease, University of Miyazaki, Miyazaki 889-1692, Japan.