Limits...
Metagenome Skimming of Insect Specimen Pools: Potential for Comparative Genomics.

Linard B, Crampton-Platt A, Gillett CP, Timmermans MJ, Vogler AP - Genome Biol Evol (2015)

Bottom Line: In addition to effect of taxonomic composition of the metagenomes, the number of mapped scaffolds also revealed structural differences between the two reference genomes, although the significance of this striking finding remains unclear.Finally, apparently exogenous sequences were recovered, including potential food plants, fungal pathogens, and bacterial symbionts.The "metagenome skimming" approach is useful for capturing the genomic diversity of poorly studied, species-rich lineages and opens new prospects in environmental genomics.

View Article: PubMed Central - PubMed

Affiliation: Department of Life Sciences, Natural History Museum, London, United Kingdom.

Show MeSH

Related in: MedlinePlus

Hypothetical scenarios of scaffold formation from low-coverage DNA sequencing of specimen pools. The figure represents specimens in the superfamilies Tenebrionoidea, Curculionidea, and other coleopteran superfamilies represented by two reference genomes for Tc and Dp. Eight scenarios of scaffold formation (A, B, C, D, D′, D″, x, and y) are depicted along gray vertical arrows and represent the aggregation of similar DNA motifs (white boxes) into a single scaffold (red lines). The horizontal axis from left to right represent an increasing intragenomic copy number of a locus, and the vertical axis represent the greater phylogenetic distance of taxa. The first three scenarios (A, B, C) represent single copy motifs. A and B are phylogenetically conserved and their presence across specimens will increase the rate of recovery. Their homology to the reference genomes depends on phylogenetic conservation and the distance from available reference genomes (scenario A vs. scenario B). These simple scenarios are overlain on the effects of copy number and variation among paralogs. Scenario D represents several copies of the same DNA motif present in different genome locations and similar enough to be aggregated into the same scaffold. Motifs D′ and D″ are homologous but less similar and will be aggregated into two other scaffolds. The sampling probability of these motifs is increased by higher copy number and wider conservation over the specimens. The probability to generate a scaffold is decreasing from D, D′ to D″. Copy number information is partially lost during their scaffold aggregation process. Finally, high-copy number genomic repeats (scenarios x, x′, and y), may produce scaffolds even if they are limited to a single genome in the mixture. Repeats x′ is aggregated into a single scaffold and can be identified by similarity to repeat x, present in the closely related Dp genome. The repetitive and taxonomic nature of y cannot be deduced as no closely related reference genome is available to observe a similar motif. The bottom of the figure depicts the probability that a particular kind of locus is assembled from shotgun reads derived from within and among genomes.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4494052&req=5

evv086-F1: Hypothetical scenarios of scaffold formation from low-coverage DNA sequencing of specimen pools. The figure represents specimens in the superfamilies Tenebrionoidea, Curculionidea, and other coleopteran superfamilies represented by two reference genomes for Tc and Dp. Eight scenarios of scaffold formation (A, B, C, D, D′, D″, x, and y) are depicted along gray vertical arrows and represent the aggregation of similar DNA motifs (white boxes) into a single scaffold (red lines). The horizontal axis from left to right represent an increasing intragenomic copy number of a locus, and the vertical axis represent the greater phylogenetic distance of taxa. The first three scenarios (A, B, C) represent single copy motifs. A and B are phylogenetically conserved and their presence across specimens will increase the rate of recovery. Their homology to the reference genomes depends on phylogenetic conservation and the distance from available reference genomes (scenario A vs. scenario B). These simple scenarios are overlain on the effects of copy number and variation among paralogs. Scenario D represents several copies of the same DNA motif present in different genome locations and similar enough to be aggregated into the same scaffold. Motifs D′ and D″ are homologous but less similar and will be aggregated into two other scaffolds. The sampling probability of these motifs is increased by higher copy number and wider conservation over the specimens. The probability to generate a scaffold is decreasing from D, D′ to D″. Copy number information is partially lost during their scaffold aggregation process. Finally, high-copy number genomic repeats (scenarios x, x′, and y), may produce scaffolds even if they are limited to a single genome in the mixture. Repeats x′ is aggregated into a single scaffold and can be identified by similarity to repeat x, present in the closely related Dp genome. The repetitive and taxonomic nature of y cannot be deduced as no closely related reference genome is available to observe a similar motif. The bottom of the figure depicts the probability that a particular kind of locus is assembled from shotgun reads derived from within and among genomes.

Mentions: Here, we assessed what kind of genomic information can be extracted from low-coverage metagenome sequencing of two specimen pools that were originally generated to address questions about taxonomic (Gillett et al. 2014) and ecological diversity (Crampton-Platt et al. 2015). These existing analyses were performed on the mtDNA fraction of the sequence data only (“mitochondrial metagenomics”; Crampton-Platt et al. 2015), but the much greater nuclear portion of the sequence data was ignored in these studies. It is interrogated here to obtain insights into the genomic diversity of Coleoptera. High-abundance reads producing the scaffolds in MGS are either derived from orthologous loci conserved among multiple genomes, or they are derived from paralogous copies, for example, from repeat elements present in high-copy numbers (hcn) within a genome, but they may also arise from a combination of orthologous and paralogous sequences (fig. 1). Short shotgun reads therefore produce a mixture of assembled contigs but their composition may be a largely random outcome of an idiosyncratic assembly process or the chance composition of the pool of reads. As a first step toward the characterization of the metagenomes, we establish if scaffolds are encountered consistently and at what sequencing depth, to identify the recognizable high copy fraction obtained from pools of particular phyletic composition. Next, we attempted to annotate the resulting scaffolds against existing databases, including collections of known repeats, and identify potential conserved coding regions, such as gene families and tandemly repeated genes. Mapping of scaffolds against the two available reference genomes can further provide information on the intragenomic organization and their intergenomic distribution across evolutionary lineages. Vice versa, the number and distribution of scaffolds mapped against full genome sequences can contribute a new approach to comparative genomics, and specifically to the analysis of the repetitive fraction that is notoriously difficult to characterize with standard genome sequencing methods. Finally, the scaffolds may represent the associated fauna and flora, including the microbiome and potential food sources, which provide information on the wider ecosystem in which the specimens partake.Fig. 1.—


Metagenome Skimming of Insect Specimen Pools: Potential for Comparative Genomics.

Linard B, Crampton-Platt A, Gillett CP, Timmermans MJ, Vogler AP - Genome Biol Evol (2015)

Hypothetical scenarios of scaffold formation from low-coverage DNA sequencing of specimen pools. The figure represents specimens in the superfamilies Tenebrionoidea, Curculionidea, and other coleopteran superfamilies represented by two reference genomes for Tc and Dp. Eight scenarios of scaffold formation (A, B, C, D, D′, D″, x, and y) are depicted along gray vertical arrows and represent the aggregation of similar DNA motifs (white boxes) into a single scaffold (red lines). The horizontal axis from left to right represent an increasing intragenomic copy number of a locus, and the vertical axis represent the greater phylogenetic distance of taxa. The first three scenarios (A, B, C) represent single copy motifs. A and B are phylogenetically conserved and their presence across specimens will increase the rate of recovery. Their homology to the reference genomes depends on phylogenetic conservation and the distance from available reference genomes (scenario A vs. scenario B). These simple scenarios are overlain on the effects of copy number and variation among paralogs. Scenario D represents several copies of the same DNA motif present in different genome locations and similar enough to be aggregated into the same scaffold. Motifs D′ and D″ are homologous but less similar and will be aggregated into two other scaffolds. The sampling probability of these motifs is increased by higher copy number and wider conservation over the specimens. The probability to generate a scaffold is decreasing from D, D′ to D″. Copy number information is partially lost during their scaffold aggregation process. Finally, high-copy number genomic repeats (scenarios x, x′, and y), may produce scaffolds even if they are limited to a single genome in the mixture. Repeats x′ is aggregated into a single scaffold and can be identified by similarity to repeat x, present in the closely related Dp genome. The repetitive and taxonomic nature of y cannot be deduced as no closely related reference genome is available to observe a similar motif. The bottom of the figure depicts the probability that a particular kind of locus is assembled from shotgun reads derived from within and among genomes.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4494052&req=5

evv086-F1: Hypothetical scenarios of scaffold formation from low-coverage DNA sequencing of specimen pools. The figure represents specimens in the superfamilies Tenebrionoidea, Curculionidea, and other coleopteran superfamilies represented by two reference genomes for Tc and Dp. Eight scenarios of scaffold formation (A, B, C, D, D′, D″, x, and y) are depicted along gray vertical arrows and represent the aggregation of similar DNA motifs (white boxes) into a single scaffold (red lines). The horizontal axis from left to right represent an increasing intragenomic copy number of a locus, and the vertical axis represent the greater phylogenetic distance of taxa. The first three scenarios (A, B, C) represent single copy motifs. A and B are phylogenetically conserved and their presence across specimens will increase the rate of recovery. Their homology to the reference genomes depends on phylogenetic conservation and the distance from available reference genomes (scenario A vs. scenario B). These simple scenarios are overlain on the effects of copy number and variation among paralogs. Scenario D represents several copies of the same DNA motif present in different genome locations and similar enough to be aggregated into the same scaffold. Motifs D′ and D″ are homologous but less similar and will be aggregated into two other scaffolds. The sampling probability of these motifs is increased by higher copy number and wider conservation over the specimens. The probability to generate a scaffold is decreasing from D, D′ to D″. Copy number information is partially lost during their scaffold aggregation process. Finally, high-copy number genomic repeats (scenarios x, x′, and y), may produce scaffolds even if they are limited to a single genome in the mixture. Repeats x′ is aggregated into a single scaffold and can be identified by similarity to repeat x, present in the closely related Dp genome. The repetitive and taxonomic nature of y cannot be deduced as no closely related reference genome is available to observe a similar motif. The bottom of the figure depicts the probability that a particular kind of locus is assembled from shotgun reads derived from within and among genomes.
Mentions: Here, we assessed what kind of genomic information can be extracted from low-coverage metagenome sequencing of two specimen pools that were originally generated to address questions about taxonomic (Gillett et al. 2014) and ecological diversity (Crampton-Platt et al. 2015). These existing analyses were performed on the mtDNA fraction of the sequence data only (“mitochondrial metagenomics”; Crampton-Platt et al. 2015), but the much greater nuclear portion of the sequence data was ignored in these studies. It is interrogated here to obtain insights into the genomic diversity of Coleoptera. High-abundance reads producing the scaffolds in MGS are either derived from orthologous loci conserved among multiple genomes, or they are derived from paralogous copies, for example, from repeat elements present in high-copy numbers (hcn) within a genome, but they may also arise from a combination of orthologous and paralogous sequences (fig. 1). Short shotgun reads therefore produce a mixture of assembled contigs but their composition may be a largely random outcome of an idiosyncratic assembly process or the chance composition of the pool of reads. As a first step toward the characterization of the metagenomes, we establish if scaffolds are encountered consistently and at what sequencing depth, to identify the recognizable high copy fraction obtained from pools of particular phyletic composition. Next, we attempted to annotate the resulting scaffolds against existing databases, including collections of known repeats, and identify potential conserved coding regions, such as gene families and tandemly repeated genes. Mapping of scaffolds against the two available reference genomes can further provide information on the intragenomic organization and their intergenomic distribution across evolutionary lineages. Vice versa, the number and distribution of scaffolds mapped against full genome sequences can contribute a new approach to comparative genomics, and specifically to the analysis of the repetitive fraction that is notoriously difficult to characterize with standard genome sequencing methods. Finally, the scaffolds may represent the associated fauna and flora, including the microbiome and potential food sources, which provide information on the wider ecosystem in which the specimens partake.Fig. 1.—

Bottom Line: In addition to effect of taxonomic composition of the metagenomes, the number of mapped scaffolds also revealed structural differences between the two reference genomes, although the significance of this striking finding remains unclear.Finally, apparently exogenous sequences were recovered, including potential food plants, fungal pathogens, and bacterial symbionts.The "metagenome skimming" approach is useful for capturing the genomic diversity of poorly studied, species-rich lineages and opens new prospects in environmental genomics.

View Article: PubMed Central - PubMed

Affiliation: Department of Life Sciences, Natural History Museum, London, United Kingdom.

Show MeSH
Related in: MedlinePlus