Limits...
Séance: reference-based phylogenetic analysis for 18S rRNA studies.

Medlar A, Aivelo T, Löytynoja A - BMC Evol. Biol. (2014)

Bottom Line: Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples.We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, University of Helsinki, Helsinki, P.O.Box 56, Finland. alan.j.medlar@helsinki.fi.

ABSTRACT

Background: Marker gene studies often use short amplicons spanning one or more hypervariable regions from an rRNA gene to interrogate the community structure of uncultured environmental samples. Target regions are chosen for their discriminatory power, but the limited phylogenetic signal of short high-throughput sequencing reads precludes accurate phylogenetic analysis. This is particularly unfortunate in the study of microscopic eukaryotes where horizontal gene flow is limited and the rRNA gene is expected to accurately reflect the species phylogeny. A promising alternative to full phylogenetic analysis is phylogenetic placement, where a reference phylogeny is inferred using the complete marker gene and iteratively extended with the short sequences from a metagenetic sample under study.

Results: Based on the phylogenetic placement approach we built Séance, a community analysis pipeline focused on the analysis of 18S marker gene data. Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples. We showcase Séance by analysing 454 data from a longitudinal study of intestinal parasite communities in wild rufous mouse lemurs (Microcebus rufus) as well as in simulation. We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.

Conclusions: Séance is an open source community analysis pipeline that provides reference-based phylogenetic analysis for rRNA marker gene studies. Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created. Séance can be downloaded from http://wasabiapp.org/software/seance/ .

Show MeSH

Related in: MedlinePlus

Phylogenetic placement versusde novo phylogeny heatmap. Comparison of normalised Robinson-Foulds metric of different phylogenetic methods performed on short 250bp sequences compared to the ML tree inferred from the complete marker gene: phylogenetic placement of short query sequences produces trees with fewer errors than de novo phylogenetic inference.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4265393&req=5

Fig5: Phylogenetic placement versusde novo phylogeny heatmap. Comparison of normalised Robinson-Foulds metric of different phylogenetic methods performed on short 250bp sequences compared to the ML tree inferred from the complete marker gene: phylogenetic placement of short query sequences produces trees with fewer errors than de novo phylogenetic inference.

Mentions: Figure 5 shows a heatmap of how the accuracy of Pagan’s phylogenetic placement and de novo phylogenetic inference compared over all experiments. Figure 6 shows box plots of the same data broken down by the number of query sequences and include the results from HMMER+Pplacer and Pagan+Pplacer. In general, phylogenetic placement outperformed de novo phylogenetic inference in reconstructing the original tree due to the short sequences containing less phylogenetic information than the complete gene sequence.Figure 5


Séance: reference-based phylogenetic analysis for 18S rRNA studies.

Medlar A, Aivelo T, Löytynoja A - BMC Evol. Biol. (2014)

Phylogenetic placement versusde novo phylogeny heatmap. Comparison of normalised Robinson-Foulds metric of different phylogenetic methods performed on short 250bp sequences compared to the ML tree inferred from the complete marker gene: phylogenetic placement of short query sequences produces trees with fewer errors than de novo phylogenetic inference.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4265393&req=5

Fig5: Phylogenetic placement versusde novo phylogeny heatmap. Comparison of normalised Robinson-Foulds metric of different phylogenetic methods performed on short 250bp sequences compared to the ML tree inferred from the complete marker gene: phylogenetic placement of short query sequences produces trees with fewer errors than de novo phylogenetic inference.
Mentions: Figure 5 shows a heatmap of how the accuracy of Pagan’s phylogenetic placement and de novo phylogenetic inference compared over all experiments. Figure 6 shows box plots of the same data broken down by the number of query sequences and include the results from HMMER+Pplacer and Pagan+Pplacer. In general, phylogenetic placement outperformed de novo phylogenetic inference in reconstructing the original tree due to the short sequences containing less phylogenetic information than the complete gene sequence.Figure 5

Bottom Line: Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples.We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, University of Helsinki, Helsinki, P.O.Box 56, Finland. alan.j.medlar@helsinki.fi.

ABSTRACT

Background: Marker gene studies often use short amplicons spanning one or more hypervariable regions from an rRNA gene to interrogate the community structure of uncultured environmental samples. Target regions are chosen for their discriminatory power, but the limited phylogenetic signal of short high-throughput sequencing reads precludes accurate phylogenetic analysis. This is particularly unfortunate in the study of microscopic eukaryotes where horizontal gene flow is limited and the rRNA gene is expected to accurately reflect the species phylogeny. A promising alternative to full phylogenetic analysis is phylogenetic placement, where a reference phylogeny is inferred using the complete marker gene and iteratively extended with the short sequences from a metagenetic sample under study.

Results: Based on the phylogenetic placement approach we built Séance, a community analysis pipeline focused on the analysis of 18S marker gene data. Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples. We showcase Séance by analysing 454 data from a longitudinal study of intestinal parasite communities in wild rufous mouse lemurs (Microcebus rufus) as well as in simulation. We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.

Conclusions: Séance is an open source community analysis pipeline that provides reference-based phylogenetic analysis for rRNA marker gene studies. Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created. Séance can be downloaded from http://wasabiapp.org/software/seance/ .

Show MeSH
Related in: MedlinePlus