Metagenome Skimming of Insect Specimen Pools: Potential for Comparative Genomics.
Bottom Line: In addition to effect of taxonomic composition of the metagenomes, the number of mapped scaffolds also revealed structural differences between the two reference genomes, although the significance of this striking finding remains unclear.Finally, apparently exogenous sequences were recovered, including potential food plants, fungal pathogens, and bacterial symbionts.The "metagenome skimming" approach is useful for capturing the genomic diversity of poorly studied, species-rich lineages and opens new prospects in environmental genomics.
Affiliation: Department of Life Sciences, Natural History Museum, London, United Kingdom.Show MeSH
Related in: MedlinePlus
Mentions: The Truseq libraries (Weevil, Canopy_Long, Canopy_Short) produced 17.3–23.9 M reads pairs and the Nextera library (Canopy_Next) produced 7.3 M reads. Following trimming, 30% of reads were discarded in the three Canopy libraries and 5% in the Weevil library (table 1). Assembly of the four Illumina libraries each produced between 20,000 and nearly 100,000 contigs and numbers were only slightly lower for (noncontiguous) scaffolds (table 1). Using the same DNA pool, both TruSeq libraries resulted in more than twice the number of reads as the Nextera library, and Canopy_Long assembled almost twice as many contigs and scaffolds as Canopy_Short and over three times as many as Canopy_Next. The Weevil pool produced the largest number of scaffolds despite containing the second lowest number of reads, whereby long insert size and greater homogeneity of read numbers from equimolar DNA samples may have aided the assembly. We determined intersections of library contents with pairwise alignments of the scaffolds (fig. 2A). The scaffolds of the three Canopy libraries were aligned with a stringent threshold of sequence identity >90%, E < 1e-18, alignment length >250 bp. In total, 19,297 scaffolds were shared by at least two Canopy libraries, and the tripartite intersection showed a core of 6,940 scaffolds (11–35% of the libraries) that was consistently recovered despite the low-coverage sequencing (fig. 2A, left). We performed a similar pairwise alignment between the Weevil library and the scaffold collection of all Canopy libraries (Canopy_merged), with a slightly lower threshold (sequence identity >80%, E < 1e-18, alignment length >250 bp) to recover potential homologs among different species (fig. 2A, right). A total of 5,174 scaffolds were shared by both samples (5.8% of Weevil scaffolds; 4.7% of Canopy scaffolds), showing that thousands of similar scaffolds can also be recovered between pools of different species composition.Fig. 2.—
Affiliation: Department of Life Sciences, Natural History Museum, London, United Kingdom.