Limits...
Strand-specific community RNA-seq reveals prevalent and dynamic antisense transcription in human gut microbiota.

Bao G, Wang M, Doak TG, Ye Y - Front Microbiol (2015)

Bottom Line: Metagenomics and other meta-omics approaches (including metatranscriptomics) provide insights into the composition and function of microbial communities living in different environments or animal hosts.Metatranscriptomics research provides an unprecedented opportunity to examine gene regulation for many microbial species simultaneously, and more importantly, for the majority that are unculturable microbial species, in their natural environments (or hosts).Current analyses of metatranscriptomic datasets focus on the detection of gene expression levels and the study of the relationship between changes of gene expression and changes of environment.

View Article: PubMed Central - PubMed

Affiliation: School of Informatics and Computing, Indiana University Bloomington, IN, USA.

ABSTRACT
Metagenomics and other meta-omics approaches (including metatranscriptomics) provide insights into the composition and function of microbial communities living in different environments or animal hosts. Metatranscriptomics research provides an unprecedented opportunity to examine gene regulation for many microbial species simultaneously, and more importantly, for the majority that are unculturable microbial species, in their natural environments (or hosts). Current analyses of metatranscriptomic datasets focus on the detection of gene expression levels and the study of the relationship between changes of gene expression and changes of environment. As a demonstration of utilizing metatranscriptomics beyond these common analyses, we developed a computational and statistical procedure to analyze the antisense transcripts in strand-specific metatranscriptomic datasets. Antisense RNAs encoded on the DNA strand opposite a gene's CDS have the potential to form extensive base-pairing interactions with the corresponding sense RNA, and can have important regulatory functions. Most studies of antisense RNAs in bacteria are rather recent, are mostly based on transcriptome analysis, and have been applied mainly to single bacterial species. Application of our approaches to human gut-associated metatranscriptomic datasets allowed us to survey antisense transcription for a large number of bacterial species associated with human beings. The ratio of protein coding genes with antisense transcription ranges from 0 to 35.8% (median = 10.0%) among 47 species. Our results show that antisense transcription is dynamic, varying between human individuals. Functional enrichment analysis revealed a preference of certain gene functions for antisense transcription, and transposase genes are among the most prominent ones (but we also observed antisense transcription in bacterial house-keeping genes).

No MeSH data available.


Highly expressed genes tend to be dominated by sense transcription or antisense transcription. Each circle represents a gene. The y-axis shows the gene expression (log(FPKM)) and the x-axis shows d, which is close to 1 for genes with mostly sense transcription, and -1 with mostly antisense transcription. RNA-seq data from individual 1 (X310763260) was used for this plot. To limit the bias that may be introduced by rare species or genes with low expression levels, we only included the genes (3,689 in total) each supported by at least 20 RNA-seq reads, and are from the species (23 in total) each having at least 100 genes with RNA-seq reads support. See Supplementary Figure S2 for the plot using the original gene expression data, involving 30,493 genes from all 47 strains, one for each species.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4555090&req=5

Figure 7: Highly expressed genes tend to be dominated by sense transcription or antisense transcription. Each circle represents a gene. The y-axis shows the gene expression (log(FPKM)) and the x-axis shows d, which is close to 1 for genes with mostly sense transcription, and -1 with mostly antisense transcription. RNA-seq data from individual 1 (X310763260) was used for this plot. To limit the bias that may be introduced by rare species or genes with low expression levels, we only included the genes (3,689 in total) each supported by at least 20 RNA-seq reads, and are from the species (23 in total) each having at least 100 genes with RNA-seq reads support. See Supplementary Figure S2 for the plot using the original gene expression data, involving 30,493 genes from all 47 strains, one for each species.

Mentions: We can roughly group genes into three categories: genes with mostly sense transcripts, genes with mostly antisense transcripts, and genes in between, based on their sense and antisense transcription. We define d = (#sense reads – #antisense reads)/(#sense + #antisense reads), so that genes with mostly sense transcripts have d that is close to 1, while genes with mostly antisense transcripts have d that is close to -1. Figure 7 shows the plot of gene expression levels versus the d ratios, using expressed genes from 23 species (each having at least 100 genes with detectable expression), based on the RNA-seq dataset of individual 1 (X310763260; see Supplementary Figure S2 for the plot using all 47 strain; only one strain was included for each species). We used FPKM (Fragments Per Kilobase of transcript per Million mapped reads; Garber et al., 2011) to quantify the gene expression levels, to normalize read counts by the gene length and sequencing depth of the RNA-seq experiments. The number of mapped reads for a dataset was computed as the total number of reads that can be mapped to one of the 116 strains. The plot reveals a “U” shape, indicating that genes with either sense- or antisense-dominated transcription are typically highly expressed, while genes in between have relatively low gene expression. This correlation is confirmed by a statistical test: the Spearman’s correlation coefficient between log(FPKM) and /d/ for the genes (each recruited at least 20 RNA-seq reads) shown in Figure 7 (excluding the genes with d ratios of 1 or -1) is 0.57 (p-value < 2.2e-16). Similar results can be observed using an unfiltered dataset from this individual (Spearman’s r = 0.69, p-value < 2.2e-16), and RNA-seq datasets from other individuals.


Strand-specific community RNA-seq reveals prevalent and dynamic antisense transcription in human gut microbiota.

Bao G, Wang M, Doak TG, Ye Y - Front Microbiol (2015)

Highly expressed genes tend to be dominated by sense transcription or antisense transcription. Each circle represents a gene. The y-axis shows the gene expression (log(FPKM)) and the x-axis shows d, which is close to 1 for genes with mostly sense transcription, and -1 with mostly antisense transcription. RNA-seq data from individual 1 (X310763260) was used for this plot. To limit the bias that may be introduced by rare species or genes with low expression levels, we only included the genes (3,689 in total) each supported by at least 20 RNA-seq reads, and are from the species (23 in total) each having at least 100 genes with RNA-seq reads support. See Supplementary Figure S2 for the plot using the original gene expression data, involving 30,493 genes from all 47 strains, one for each species.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4555090&req=5

Figure 7: Highly expressed genes tend to be dominated by sense transcription or antisense transcription. Each circle represents a gene. The y-axis shows the gene expression (log(FPKM)) and the x-axis shows d, which is close to 1 for genes with mostly sense transcription, and -1 with mostly antisense transcription. RNA-seq data from individual 1 (X310763260) was used for this plot. To limit the bias that may be introduced by rare species or genes with low expression levels, we only included the genes (3,689 in total) each supported by at least 20 RNA-seq reads, and are from the species (23 in total) each having at least 100 genes with RNA-seq reads support. See Supplementary Figure S2 for the plot using the original gene expression data, involving 30,493 genes from all 47 strains, one for each species.
Mentions: We can roughly group genes into three categories: genes with mostly sense transcripts, genes with mostly antisense transcripts, and genes in between, based on their sense and antisense transcription. We define d = (#sense reads – #antisense reads)/(#sense + #antisense reads), so that genes with mostly sense transcripts have d that is close to 1, while genes with mostly antisense transcripts have d that is close to -1. Figure 7 shows the plot of gene expression levels versus the d ratios, using expressed genes from 23 species (each having at least 100 genes with detectable expression), based on the RNA-seq dataset of individual 1 (X310763260; see Supplementary Figure S2 for the plot using all 47 strain; only one strain was included for each species). We used FPKM (Fragments Per Kilobase of transcript per Million mapped reads; Garber et al., 2011) to quantify the gene expression levels, to normalize read counts by the gene length and sequencing depth of the RNA-seq experiments. The number of mapped reads for a dataset was computed as the total number of reads that can be mapped to one of the 116 strains. The plot reveals a “U” shape, indicating that genes with either sense- or antisense-dominated transcription are typically highly expressed, while genes in between have relatively low gene expression. This correlation is confirmed by a statistical test: the Spearman’s correlation coefficient between log(FPKM) and /d/ for the genes (each recruited at least 20 RNA-seq reads) shown in Figure 7 (excluding the genes with d ratios of 1 or -1) is 0.57 (p-value < 2.2e-16). Similar results can be observed using an unfiltered dataset from this individual (Spearman’s r = 0.69, p-value < 2.2e-16), and RNA-seq datasets from other individuals.

Bottom Line: Metagenomics and other meta-omics approaches (including metatranscriptomics) provide insights into the composition and function of microbial communities living in different environments or animal hosts.Metatranscriptomics research provides an unprecedented opportunity to examine gene regulation for many microbial species simultaneously, and more importantly, for the majority that are unculturable microbial species, in their natural environments (or hosts).Current analyses of metatranscriptomic datasets focus on the detection of gene expression levels and the study of the relationship between changes of gene expression and changes of environment.

View Article: PubMed Central - PubMed

Affiliation: School of Informatics and Computing, Indiana University Bloomington, IN, USA.

ABSTRACT
Metagenomics and other meta-omics approaches (including metatranscriptomics) provide insights into the composition and function of microbial communities living in different environments or animal hosts. Metatranscriptomics research provides an unprecedented opportunity to examine gene regulation for many microbial species simultaneously, and more importantly, for the majority that are unculturable microbial species, in their natural environments (or hosts). Current analyses of metatranscriptomic datasets focus on the detection of gene expression levels and the study of the relationship between changes of gene expression and changes of environment. As a demonstration of utilizing metatranscriptomics beyond these common analyses, we developed a computational and statistical procedure to analyze the antisense transcripts in strand-specific metatranscriptomic datasets. Antisense RNAs encoded on the DNA strand opposite a gene's CDS have the potential to form extensive base-pairing interactions with the corresponding sense RNA, and can have important regulatory functions. Most studies of antisense RNAs in bacteria are rather recent, are mostly based on transcriptome analysis, and have been applied mainly to single bacterial species. Application of our approaches to human gut-associated metatranscriptomic datasets allowed us to survey antisense transcription for a large number of bacterial species associated with human beings. The ratio of protein coding genes with antisense transcription ranges from 0 to 35.8% (median = 10.0%) among 47 species. Our results show that antisense transcription is dynamic, varying between human individuals. Functional enrichment analysis revealed a preference of certain gene functions for antisense transcription, and transposase genes are among the most prominent ones (but we also observed antisense transcription in bacterial house-keeping genes).

No MeSH data available.