Limits...
Targeted recovery of novel phylogenetic diversity from next-generation sequence data.

Lynch MD, Bartram AK, Neufeld JD - ISME J (2012)

Bottom Line: We combined BLASTN network analysis, phylogenetics and targeted primer design to amplify 16S rRNA gene sequences from unique potential bacterial lineages, comprising part of the rare biosphere from a multi-million sequence data set from an Arctic tundra soil sample.Demonstrating the feasibility of the protocol developed here, three of seven recovered phylogenetic lineages represented extremely divergent taxonomic entities.A comparison to twelve next-generation data sets from additional soils suggested persistent low-abundance distributions of these novel 16S rRNA genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, University of Waterloo, Waterloo, ON, Canada.

ABSTRACT
Next-generation sequencing technologies have led to recognition of a so-called 'rare biosphere'. These microbial operational taxonomic units (OTUs) are defined by low relative abundance and may be specifically adapted to maintaining low population sizes. We hypothesized that mining of low-abundance next-generation 16S ribosomal RNA (rRNA) gene data would lead to the discovery of novel phylogenetic diversity, reflecting microorganisms not yet discovered by previous sampling efforts. Here, we test this hypothesis by combining molecular and bioinformatic approaches for targeted retrieval of phylogenetic novelty within rare biosphere OTUs. We combined BLASTN network analysis, phylogenetics and targeted primer design to amplify 16S rRNA gene sequences from unique potential bacterial lineages, comprising part of the rare biosphere from a multi-million sequence data set from an Arctic tundra soil sample. Demonstrating the feasibility of the protocol developed here, three of seven recovered phylogenetic lineages represented extremely divergent taxonomic entities. These divergent target sequences correspond to (a) a previously unknown lineage within the BRC1 candidate phylum, (b) a sister group to the early diverging and currently recognized monospecific Cyanobacteria Gloeobacter, a genus containing multiple plesiomorphic traits and (c) a highly divergent lineage phylogenetically resolved within mitochondria. A comparison to twelve next-generation data sets from additional soils suggested persistent low-abundance distributions of these novel 16S rRNA genes. The results demonstrate this sequence analysis and retrieval pipeline as applicable for exploring underrepresented phylogenetic novelty and recovering taxa that may represent significant steps in bacterial evolution.

Show MeSH

Related in: MedlinePlus

Network analysis of Alert library sequences against SILVA SSU-Parc release 106 (Pruesse et al., 2007). Red nodes represent 97% sequence identity clusters with diameters corresponding to cluster abundance. SILVA sequences are represented by black (named) and blue (unnamed) nodes. Edges represent a BLASTN result of ⩾90% identity across ⩾80% of the V3 region. (a) A highly connected subgroup corresponding to the Bacteroidetes/Chlorobi group (b) low degree or unconnected nodes representing sequence clusters of potential phylogenetic novelty.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475379&req=5

fig1: Network analysis of Alert library sequences against SILVA SSU-Parc release 106 (Pruesse et al., 2007). Red nodes represent 97% sequence identity clusters with diameters corresponding to cluster abundance. SILVA sequences are represented by black (named) and blue (unnamed) nodes. Edges represent a BLASTN result of ⩾90% identity across ⩾80% of the V3 region. (a) A highly connected subgroup corresponding to the Bacteroidetes/Chlorobi group (b) low degree or unconnected nodes representing sequence clusters of potential phylogenetic novelty.

Mentions: Naïve assembly and CD-HIT clustering of approximately 12 million raw paired-end sequences derived from an Arctic tundra soil library (Bartram et al., 2011) generated close to 6.5 million assembled sequences for comparison with sequence databases. Most assembled V3-region sequence clusters had BLASTN hits within the ‘known' threshold of ⩾90% sequence identity and ⩾80% length against SILVA SSU-Parc release 106, represented as connected nodes (Figure 1, Supplementary S1). Sequence clusters representing 97% sequence identity groups that were abundant or of known taxonomy tended to occur in highly connected subtrees (for example, Figure 1a). Unconnected nodes (for example, Figure 1b), corresponding to V3 sequence clusters that lacked BLASTN association with SILVA 16S rRNA gene sequences, had the highest potential for phylogenetic novelty and were analyzed further. A total of 558 nodes were unconnected, 512 of which successfully aligned to the bacterial 16S rRNA gene model representing 28 203 sequences (0.44% of the full library). In phylogenetic screening of unconnected nodes, representative sequences tended to be distributed throughout well-defined clades with known taxonomy and were thus less likely to represent novel phylogenetic entities (Supplementary Figure S2). Eight clades consisting of multiple Alert OTU clusters that were notable or phylogenetically distinct from known seed sequences were selected and oligonucleotide primers specific to each clade were designed, primarily against the highly variable 3′ end of the V3 region (Supplementary Table S1 primers).


Targeted recovery of novel phylogenetic diversity from next-generation sequence data.

Lynch MD, Bartram AK, Neufeld JD - ISME J (2012)

Network analysis of Alert library sequences against SILVA SSU-Parc release 106 (Pruesse et al., 2007). Red nodes represent 97% sequence identity clusters with diameters corresponding to cluster abundance. SILVA sequences are represented by black (named) and blue (unnamed) nodes. Edges represent a BLASTN result of ⩾90% identity across ⩾80% of the V3 region. (a) A highly connected subgroup corresponding to the Bacteroidetes/Chlorobi group (b) low degree or unconnected nodes representing sequence clusters of potential phylogenetic novelty.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475379&req=5

fig1: Network analysis of Alert library sequences against SILVA SSU-Parc release 106 (Pruesse et al., 2007). Red nodes represent 97% sequence identity clusters with diameters corresponding to cluster abundance. SILVA sequences are represented by black (named) and blue (unnamed) nodes. Edges represent a BLASTN result of ⩾90% identity across ⩾80% of the V3 region. (a) A highly connected subgroup corresponding to the Bacteroidetes/Chlorobi group (b) low degree or unconnected nodes representing sequence clusters of potential phylogenetic novelty.
Mentions: Naïve assembly and CD-HIT clustering of approximately 12 million raw paired-end sequences derived from an Arctic tundra soil library (Bartram et al., 2011) generated close to 6.5 million assembled sequences for comparison with sequence databases. Most assembled V3-region sequence clusters had BLASTN hits within the ‘known' threshold of ⩾90% sequence identity and ⩾80% length against SILVA SSU-Parc release 106, represented as connected nodes (Figure 1, Supplementary S1). Sequence clusters representing 97% sequence identity groups that were abundant or of known taxonomy tended to occur in highly connected subtrees (for example, Figure 1a). Unconnected nodes (for example, Figure 1b), corresponding to V3 sequence clusters that lacked BLASTN association with SILVA 16S rRNA gene sequences, had the highest potential for phylogenetic novelty and were analyzed further. A total of 558 nodes were unconnected, 512 of which successfully aligned to the bacterial 16S rRNA gene model representing 28 203 sequences (0.44% of the full library). In phylogenetic screening of unconnected nodes, representative sequences tended to be distributed throughout well-defined clades with known taxonomy and were thus less likely to represent novel phylogenetic entities (Supplementary Figure S2). Eight clades consisting of multiple Alert OTU clusters that were notable or phylogenetically distinct from known seed sequences were selected and oligonucleotide primers specific to each clade were designed, primarily against the highly variable 3′ end of the V3 region (Supplementary Table S1 primers).

Bottom Line: We combined BLASTN network analysis, phylogenetics and targeted primer design to amplify 16S rRNA gene sequences from unique potential bacterial lineages, comprising part of the rare biosphere from a multi-million sequence data set from an Arctic tundra soil sample.Demonstrating the feasibility of the protocol developed here, three of seven recovered phylogenetic lineages represented extremely divergent taxonomic entities.A comparison to twelve next-generation data sets from additional soils suggested persistent low-abundance distributions of these novel 16S rRNA genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, University of Waterloo, Waterloo, ON, Canada.

ABSTRACT
Next-generation sequencing technologies have led to recognition of a so-called 'rare biosphere'. These microbial operational taxonomic units (OTUs) are defined by low relative abundance and may be specifically adapted to maintaining low population sizes. We hypothesized that mining of low-abundance next-generation 16S ribosomal RNA (rRNA) gene data would lead to the discovery of novel phylogenetic diversity, reflecting microorganisms not yet discovered by previous sampling efforts. Here, we test this hypothesis by combining molecular and bioinformatic approaches for targeted retrieval of phylogenetic novelty within rare biosphere OTUs. We combined BLASTN network analysis, phylogenetics and targeted primer design to amplify 16S rRNA gene sequences from unique potential bacterial lineages, comprising part of the rare biosphere from a multi-million sequence data set from an Arctic tundra soil sample. Demonstrating the feasibility of the protocol developed here, three of seven recovered phylogenetic lineages represented extremely divergent taxonomic entities. These divergent target sequences correspond to (a) a previously unknown lineage within the BRC1 candidate phylum, (b) a sister group to the early diverging and currently recognized monospecific Cyanobacteria Gloeobacter, a genus containing multiple plesiomorphic traits and (c) a highly divergent lineage phylogenetically resolved within mitochondria. A comparison to twelve next-generation data sets from additional soils suggested persistent low-abundance distributions of these novel 16S rRNA genes. The results demonstrate this sequence analysis and retrieval pipeline as applicable for exploring underrepresented phylogenetic novelty and recovering taxa that may represent significant steps in bacterial evolution.

Show MeSH
Related in: MedlinePlus