Limits...
Séance: reference-based phylogenetic analysis for 18S rRNA studies.

Medlar A, Aivelo T, Löytynoja A - BMC Evol. Biol. (2014)

Bottom Line: Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples.We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, University of Helsinki, Helsinki, P.O.Box 56, Finland. alan.j.medlar@helsinki.fi.

ABSTRACT

Background: Marker gene studies often use short amplicons spanning one or more hypervariable regions from an rRNA gene to interrogate the community structure of uncultured environmental samples. Target regions are chosen for their discriminatory power, but the limited phylogenetic signal of short high-throughput sequencing reads precludes accurate phylogenetic analysis. This is particularly unfortunate in the study of microscopic eukaryotes where horizontal gene flow is limited and the rRNA gene is expected to accurately reflect the species phylogeny. A promising alternative to full phylogenetic analysis is phylogenetic placement, where a reference phylogeny is inferred using the complete marker gene and iteratively extended with the short sequences from a metagenetic sample under study.

Results: Based on the phylogenetic placement approach we built Séance, a community analysis pipeline focused on the analysis of 18S marker gene data. Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples. We showcase Séance by analysing 454 data from a longitudinal study of intestinal parasite communities in wild rufous mouse lemurs (Microcebus rufus) as well as in simulation. We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.

Conclusions: Séance is an open source community analysis pipeline that provides reference-based phylogenetic analysis for rRNA marker gene studies. Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created. Séance can be downloaded from http://wasabiapp.org/software/seance/ .

Show MeSH

Related in: MedlinePlus

Heatmap withde novo phylogenetic tree. Heatmap showing relative abundances of OTUs together with corresponding phylogenetic tree inferred using just OTU centroid sequences. This figure was generated using the Séance ‘heatmap’ command.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4265393&req=5

Fig4: Heatmap withde novo phylogenetic tree. Heatmap showing relative abundances of OTUs together with corresponding phylogenetic tree inferred using just OTU centroid sequences. This figure was generated using the Séance ‘heatmap’ command.

Mentions: To build a reference phylogenetic tree, we extracted the complete 18S rRNA alignment from 1320 members of the phylum Nematoda from the SILVA database (SSURef NR 115) [19]. After the removal of columns which contained only gaps, a tree was inferred using RAxML (ver. 7.2.8) [20]. RAxML was run with the GTR +Γ substitution model for 10 repetitions. We are aware that the SILVA alignment is not perfect and alignment errors may adversely affect results but the use of the original alignment allows for reproducibility. Next, we used Séance’s phylogenetic placement command to place the cluster centroid sequences into the reference tree with Pagan. Figure 3 shows the result but for the purposes of exposition we have limited it to only those OTUs that appear in the data for a lemur called Malalako. For comparison, we aligned the cluster centroid sequences using MAFFT (ver. 7.149b) [21] and inferred a tree, de novo, using RAxML. The tree was manually rerooted and is shown in Figure 4. Whilst the two phylogenetic trees are very similar, in this example there is a single topological error in the de novo tree in the location of the Caenorhabditis cluster, which should be more closely related to Strongyloides. We further note that the branch lengths have been underestimated in the de novo phylogeny due to the lower relative proportion of variable to conserved sequence compared to the complete marker gene.Figure 4


Séance: reference-based phylogenetic analysis for 18S rRNA studies.

Medlar A, Aivelo T, Löytynoja A - BMC Evol. Biol. (2014)

Heatmap withde novo phylogenetic tree. Heatmap showing relative abundances of OTUs together with corresponding phylogenetic tree inferred using just OTU centroid sequences. This figure was generated using the Séance ‘heatmap’ command.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4265393&req=5

Fig4: Heatmap withde novo phylogenetic tree. Heatmap showing relative abundances of OTUs together with corresponding phylogenetic tree inferred using just OTU centroid sequences. This figure was generated using the Séance ‘heatmap’ command.
Mentions: To build a reference phylogenetic tree, we extracted the complete 18S rRNA alignment from 1320 members of the phylum Nematoda from the SILVA database (SSURef NR 115) [19]. After the removal of columns which contained only gaps, a tree was inferred using RAxML (ver. 7.2.8) [20]. RAxML was run with the GTR +Γ substitution model for 10 repetitions. We are aware that the SILVA alignment is not perfect and alignment errors may adversely affect results but the use of the original alignment allows for reproducibility. Next, we used Séance’s phylogenetic placement command to place the cluster centroid sequences into the reference tree with Pagan. Figure 3 shows the result but for the purposes of exposition we have limited it to only those OTUs that appear in the data for a lemur called Malalako. For comparison, we aligned the cluster centroid sequences using MAFFT (ver. 7.149b) [21] and inferred a tree, de novo, using RAxML. The tree was manually rerooted and is shown in Figure 4. Whilst the two phylogenetic trees are very similar, in this example there is a single topological error in the de novo tree in the location of the Caenorhabditis cluster, which should be more closely related to Strongyloides. We further note that the branch lengths have been underestimated in the de novo phylogeny due to the lower relative proportion of variable to conserved sequence compared to the complete marker gene.Figure 4

Bottom Line: Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples.We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, University of Helsinki, Helsinki, P.O.Box 56, Finland. alan.j.medlar@helsinki.fi.

ABSTRACT

Background: Marker gene studies often use short amplicons spanning one or more hypervariable regions from an rRNA gene to interrogate the community structure of uncultured environmental samples. Target regions are chosen for their discriminatory power, but the limited phylogenetic signal of short high-throughput sequencing reads precludes accurate phylogenetic analysis. This is particularly unfortunate in the study of microscopic eukaryotes where horizontal gene flow is limited and the rRNA gene is expected to accurately reflect the species phylogeny. A promising alternative to full phylogenetic analysis is phylogenetic placement, where a reference phylogeny is inferred using the complete marker gene and iteratively extended with the short sequences from a metagenetic sample under study.

Results: Based on the phylogenetic placement approach we built Séance, a community analysis pipeline focused on the analysis of 18S marker gene data. Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples. We showcase Séance by analysing 454 data from a longitudinal study of intestinal parasite communities in wild rufous mouse lemurs (Microcebus rufus) as well as in simulation. We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.

Conclusions: Séance is an open source community analysis pipeline that provides reference-based phylogenetic analysis for rRNA marker gene studies. Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created. Séance can be downloaded from http://wasabiapp.org/software/seance/ .

Show MeSH
Related in: MedlinePlus