Limits...
A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets.

Leimena MM, Ramiro-Garcia J, Davids M, van den Bogert B, Smidt H, Smid EJ, Boekhorst J, Zoetendal EG, Schaap PJ, Kleerebezem M - BMC Genomics (2013)

Bottom Line: Reproducibility of the metatranscriptome sequencing approach was established by independent duplicate experiments.In addition, comparison of metatranscriptome analysis employing single- or paired-end sequencing methods indicated that the latter approach does not provide improved functional or phylogenetic insights.The set-up of the pipeline is very generic and can be applied for (bacterial) metatranscriptome analysis in any chosen niche.

View Article: PubMed Central - HTML - PubMed

Affiliation: TI Food and Nutrition (TIFN), P,O, Box 557, 6700 AN, Wageningen, The Netherlands.

ABSTRACT

Background: Next generation sequencing (NGS) technologies can be applied in complex microbial ecosystems for metatranscriptome analysis by employing direct cDNA sequencing, which is known as RNA sequencing (RNA-seq). RNA-seq generates large datasets of great complexity, the comprehensive interpretation of which requires a reliable bioinformatic pipeline. In this study, we focus on the development of such a metatranscriptome pipeline, which we validate using Illumina RNA-seq datasets derived from the small intestine microbiota of two individuals with an ileostomy.

Results: The metatranscriptome pipeline developed here enabled effective removal of rRNA derived sequences, followed by confident assignment of the predicted function and taxonomic origin of the mRNA reads. Phylogenetic analysis of the small intestine metatranscriptome datasets revealed a strong similarity with the community composition profiles obtained from 16S rDNA and rRNA pyrosequencing, indicating considerable congruency between community composition (rDNA), and the taxonomic distribution of overall (rRNA) and specific (mRNA) activity among its microbial members. Reproducibility of the metatranscriptome sequencing approach was established by independent duplicate experiments. In addition, comparison of metatranscriptome analysis employing single- or paired-end sequencing methods indicated that the latter approach does not provide improved functional or phylogenetic insights. Metatranscriptome functional-mapping allowed the analysis of global, and genus specific activity of the microbiota, and illustrated the potential of these approaches to unravel syntrophic interactions in microbial ecosystems.

Conclusions: A reliable pipeline for metatransciptome data analysis was developed and evaluated using RNA-seq datasets obtained for the human small intestine microbiota. The set-up of the pipeline is very generic and can be applied for (bacterial) metatranscriptome analysis in any chosen niche.

Show MeSH
Flow diagram of the bioinformatics analysis pipeline. The rRNA/tRNA reads were removed from the unique Illumina reads using SortMeRNA software followed by BLASTN alignment to NCBI and SILVA ribosomal databases. The mRNA reads are assigned to the prokaryote genomes of NCBI using MegaBLAST followed by BLASTN, followed by classification according to alignment bit scores using a minimum bit score of 148 and 110 for prediction of phylogenetic origin at genus and family level, respectively. The genome assigned reads were classified into protein encoding or non-coding reads, followed by COG and KEGG functional annotation and metabolic mapping. Additional functional assignment was performed for evaluation purposes by assigning 10% of randomly selected unassigned reads (bit score ≤74) to the NCBI protein database followed by MetaHIT and SI metagenome databases using BLASTX (see methods for details).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750648&req=5

Figure 1: Flow diagram of the bioinformatics analysis pipeline. The rRNA/tRNA reads were removed from the unique Illumina reads using SortMeRNA software followed by BLASTN alignment to NCBI and SILVA ribosomal databases. The mRNA reads are assigned to the prokaryote genomes of NCBI using MegaBLAST followed by BLASTN, followed by classification according to alignment bit scores using a minimum bit score of 148 and 110 for prediction of phylogenetic origin at genus and family level, respectively. The genome assigned reads were classified into protein encoding or non-coding reads, followed by COG and KEGG functional annotation and metabolic mapping. Additional functional assignment was performed for evaluation purposes by assigning 10% of randomly selected unassigned reads (bit score ≤74) to the NCBI protein database followed by MetaHIT and SI metagenome databases using BLASTX (see methods for details).

Mentions: Both single- and paired-end Illumina cDNA libraries had an insert size ranging between 200-300 bp. Two independent single-end cDNA libraries of sample A were constructed and sequenced, yielding datasets ‘A’ that contained ~29.7 million reads and ‘A-rep’ that was sequenced at 3-fold lower depth and contained ~9 million reads. The mRNA-enriched RNA of sample B was used to construct a paired-end sequencing library, of which the sequencing generated approximately 42.2 million read-pairs. Both single and paired-end sequencing reads had a read-length of 101nt. The paired-end sequencing dataset of sample B was split in two individual datasets arbitrarily designated B-left and B-right, corresponding to the forward and reverse Illumina reads, respectively. The resulting four datasets (A, A-rep, B-left, B-right) were used for the development and validation of a bioinformatics analysis pipeline (Figure 1), and for primary functional analyses of the resulting activity patterns of the human small intestine microbiota.


A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets.

Leimena MM, Ramiro-Garcia J, Davids M, van den Bogert B, Smidt H, Smid EJ, Boekhorst J, Zoetendal EG, Schaap PJ, Kleerebezem M - BMC Genomics (2013)

Flow diagram of the bioinformatics analysis pipeline. The rRNA/tRNA reads were removed from the unique Illumina reads using SortMeRNA software followed by BLASTN alignment to NCBI and SILVA ribosomal databases. The mRNA reads are assigned to the prokaryote genomes of NCBI using MegaBLAST followed by BLASTN, followed by classification according to alignment bit scores using a minimum bit score of 148 and 110 for prediction of phylogenetic origin at genus and family level, respectively. The genome assigned reads were classified into protein encoding or non-coding reads, followed by COG and KEGG functional annotation and metabolic mapping. Additional functional assignment was performed for evaluation purposes by assigning 10% of randomly selected unassigned reads (bit score ≤74) to the NCBI protein database followed by MetaHIT and SI metagenome databases using BLASTX (see methods for details).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750648&req=5

Figure 1: Flow diagram of the bioinformatics analysis pipeline. The rRNA/tRNA reads were removed from the unique Illumina reads using SortMeRNA software followed by BLASTN alignment to NCBI and SILVA ribosomal databases. The mRNA reads are assigned to the prokaryote genomes of NCBI using MegaBLAST followed by BLASTN, followed by classification according to alignment bit scores using a minimum bit score of 148 and 110 for prediction of phylogenetic origin at genus and family level, respectively. The genome assigned reads were classified into protein encoding or non-coding reads, followed by COG and KEGG functional annotation and metabolic mapping. Additional functional assignment was performed for evaluation purposes by assigning 10% of randomly selected unassigned reads (bit score ≤74) to the NCBI protein database followed by MetaHIT and SI metagenome databases using BLASTX (see methods for details).
Mentions: Both single- and paired-end Illumina cDNA libraries had an insert size ranging between 200-300 bp. Two independent single-end cDNA libraries of sample A were constructed and sequenced, yielding datasets ‘A’ that contained ~29.7 million reads and ‘A-rep’ that was sequenced at 3-fold lower depth and contained ~9 million reads. The mRNA-enriched RNA of sample B was used to construct a paired-end sequencing library, of which the sequencing generated approximately 42.2 million read-pairs. Both single and paired-end sequencing reads had a read-length of 101nt. The paired-end sequencing dataset of sample B was split in two individual datasets arbitrarily designated B-left and B-right, corresponding to the forward and reverse Illumina reads, respectively. The resulting four datasets (A, A-rep, B-left, B-right) were used for the development and validation of a bioinformatics analysis pipeline (Figure 1), and for primary functional analyses of the resulting activity patterns of the human small intestine microbiota.

Bottom Line: Reproducibility of the metatranscriptome sequencing approach was established by independent duplicate experiments.In addition, comparison of metatranscriptome analysis employing single- or paired-end sequencing methods indicated that the latter approach does not provide improved functional or phylogenetic insights.The set-up of the pipeline is very generic and can be applied for (bacterial) metatranscriptome analysis in any chosen niche.

View Article: PubMed Central - HTML - PubMed

Affiliation: TI Food and Nutrition (TIFN), P,O, Box 557, 6700 AN, Wageningen, The Netherlands.

ABSTRACT

Background: Next generation sequencing (NGS) technologies can be applied in complex microbial ecosystems for metatranscriptome analysis by employing direct cDNA sequencing, which is known as RNA sequencing (RNA-seq). RNA-seq generates large datasets of great complexity, the comprehensive interpretation of which requires a reliable bioinformatic pipeline. In this study, we focus on the development of such a metatranscriptome pipeline, which we validate using Illumina RNA-seq datasets derived from the small intestine microbiota of two individuals with an ileostomy.

Results: The metatranscriptome pipeline developed here enabled effective removal of rRNA derived sequences, followed by confident assignment of the predicted function and taxonomic origin of the mRNA reads. Phylogenetic analysis of the small intestine metatranscriptome datasets revealed a strong similarity with the community composition profiles obtained from 16S rDNA and rRNA pyrosequencing, indicating considerable congruency between community composition (rDNA), and the taxonomic distribution of overall (rRNA) and specific (mRNA) activity among its microbial members. Reproducibility of the metatranscriptome sequencing approach was established by independent duplicate experiments. In addition, comparison of metatranscriptome analysis employing single- or paired-end sequencing methods indicated that the latter approach does not provide improved functional or phylogenetic insights. Metatranscriptome functional-mapping allowed the analysis of global, and genus specific activity of the microbiota, and illustrated the potential of these approaches to unravel syntrophic interactions in microbial ecosystems.

Conclusions: A reliable pipeline for metatransciptome data analysis was developed and evaluated using RNA-seq datasets obtained for the human small intestine microbiota. The set-up of the pipeline is very generic and can be applied for (bacterial) metatranscriptome analysis in any chosen niche.

Show MeSH