Limits...
Mining RNA-seq data for infections and contaminations.

Bonfert T, Csaba G, Zimmer R, Friedel CC - PLoS ONE (2013)

Bottom Line: In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime.In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non-unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences.By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.

View Article: PubMed Central - PubMed

Affiliation: Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany.

ABSTRACT
RNA sequencing (RNA-seq) provides novel opportunities for transcriptomic studies at nucleotide resolution, including transcriptomics of viruses or microbes infecting a cell. However, standard approaches for mapping the resulting sequencing reads generally ignore alternative sources of expression other than the host cell and are little equipped to address the problems arising from redundancies and gaps among sequenced microbe and virus genomes. We show that screening of sequencing reads for contaminations and infections can be performed easily using ContextMap, our recently developed mapping software. Based on mapping-derived statistics, mapping confidence, similarities and misidentifications (e.g. due to missing genome sequences) of species/strains can be assessed. Performance of our approach is evaluated on three real-life sequencing data sets and compared to state-of-the-art metagenomics tools. In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime. In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non-unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences. Our study illustrates the importance and potentials of routinely mining RNA-seq experiments for infections or contaminations by microbes and viruses. By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.

Show MeSH

Related in: MedlinePlus

Characterization of HPV–18 infection in HeLa cells.(A) Distribution of reads across the HPV–18 genome for the mock and miR–155 transfected cells. Read numbers are shown in log scale. Expressed genes include E1 as well as E6 and E7, which are required for ongoing proliferation in cervical carcinoma [28]. L1 also appeared to be weakly expressed, however the expression pattern did not exactly correspond to the annotated gene coordinates. While the start of the gene was not expressed, L1 expression was extended to a region downstream of the gene. (B) Coverage as a function of increasing sequencing depth was evaluated by randomly sampling from the miR–155 data set. Coverage is shown as an average of ten repeated samplings for HPV–18 (black) and other species (gray). Sample size is annotated to the HPV–18 data points.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3760913&req=5

pone-0073071-g002: Characterization of HPV–18 infection in HeLa cells.(A) Distribution of reads across the HPV–18 genome for the mock and miR–155 transfected cells. Read numbers are shown in log scale. Expressed genes include E1 as well as E6 and E7, which are required for ongoing proliferation in cervical carcinoma [28]. L1 also appeared to be weakly expressed, however the expression pattern did not exactly correspond to the annotated gene coordinates. While the start of the gene was not expressed, L1 expression was extended to a region downstream of the gene. (B) Coverage as a function of increasing sequencing depth was evaluated by randomly sampling from the miR–155 data set. Coverage is shown as an average of ten repeated samplings for HPV–18 (black) and other species (gray). Sample size is annotated to the HPV–18 data points.

Mentions: Tables S1 and S2 show coverage, mapping confidence and compared to the human reference genome for all species with at least 1,000 mapped reads. Figure S2 illustrates coverage for all microbial or virus hits. Here, HPV–18 is the only virus or microbe with a coverage (0.34–0.37), high confidence () and small () in both samples. This confirms previous reports of HPV–18 expression in HeLa cells [12]. In contrast, no reads were mapped to HPV–16, which is not expressed in HeLa cells. Figure 2 A shows the distribution of reads across the HPV–18 genome both in the mock and miR–155 transfected cells. Here, results were highly reproducible between the two samples with peaks in read heights at the same genomic locations. The mapping to genes showed that only the E6, E7 and E1 genes were strongly expressed. In addition, weaker expression by an order of magnitude was observed for L1 as well as for a region covering the end of E1 and the start of E2. However, as no reads were observed for the rest of E2, it is likely not expressed. The same was true for genes E4, E5 and L2. These observations are in accordance with recent results showing that the oncogenes E6 and E7 are essential for continued proliferation in cervical carcinoma [28]. Both genes are transcriptionally repressed by the E2 protein and loss of E2 expression leads to upregulation of E6 and E7 [29]. Thus, loss of E2 expression in HeLa cells as well as high E6 and E7 expression is consistent with their origin from cervical carcinoma cells and ongoing proliferation.


Mining RNA-seq data for infections and contaminations.

Bonfert T, Csaba G, Zimmer R, Friedel CC - PLoS ONE (2013)

Characterization of HPV–18 infection in HeLa cells.(A) Distribution of reads across the HPV–18 genome for the mock and miR–155 transfected cells. Read numbers are shown in log scale. Expressed genes include E1 as well as E6 and E7, which are required for ongoing proliferation in cervical carcinoma [28]. L1 also appeared to be weakly expressed, however the expression pattern did not exactly correspond to the annotated gene coordinates. While the start of the gene was not expressed, L1 expression was extended to a region downstream of the gene. (B) Coverage as a function of increasing sequencing depth was evaluated by randomly sampling from the miR–155 data set. Coverage is shown as an average of ten repeated samplings for HPV–18 (black) and other species (gray). Sample size is annotated to the HPV–18 data points.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3760913&req=5

pone-0073071-g002: Characterization of HPV–18 infection in HeLa cells.(A) Distribution of reads across the HPV–18 genome for the mock and miR–155 transfected cells. Read numbers are shown in log scale. Expressed genes include E1 as well as E6 and E7, which are required for ongoing proliferation in cervical carcinoma [28]. L1 also appeared to be weakly expressed, however the expression pattern did not exactly correspond to the annotated gene coordinates. While the start of the gene was not expressed, L1 expression was extended to a region downstream of the gene. (B) Coverage as a function of increasing sequencing depth was evaluated by randomly sampling from the miR–155 data set. Coverage is shown as an average of ten repeated samplings for HPV–18 (black) and other species (gray). Sample size is annotated to the HPV–18 data points.
Mentions: Tables S1 and S2 show coverage, mapping confidence and compared to the human reference genome for all species with at least 1,000 mapped reads. Figure S2 illustrates coverage for all microbial or virus hits. Here, HPV–18 is the only virus or microbe with a coverage (0.34–0.37), high confidence () and small () in both samples. This confirms previous reports of HPV–18 expression in HeLa cells [12]. In contrast, no reads were mapped to HPV–16, which is not expressed in HeLa cells. Figure 2 A shows the distribution of reads across the HPV–18 genome both in the mock and miR–155 transfected cells. Here, results were highly reproducible between the two samples with peaks in read heights at the same genomic locations. The mapping to genes showed that only the E6, E7 and E1 genes were strongly expressed. In addition, weaker expression by an order of magnitude was observed for L1 as well as for a region covering the end of E1 and the start of E2. However, as no reads were observed for the rest of E2, it is likely not expressed. The same was true for genes E4, E5 and L2. These observations are in accordance with recent results showing that the oncogenes E6 and E7 are essential for continued proliferation in cervical carcinoma [28]. Both genes are transcriptionally repressed by the E2 protein and loss of E2 expression leads to upregulation of E6 and E7 [29]. Thus, loss of E2 expression in HeLa cells as well as high E6 and E7 expression is consistent with their origin from cervical carcinoma cells and ongoing proliferation.

Bottom Line: In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime.In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non-unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences.By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.

View Article: PubMed Central - PubMed

Affiliation: Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany.

ABSTRACT
RNA sequencing (RNA-seq) provides novel opportunities for transcriptomic studies at nucleotide resolution, including transcriptomics of viruses or microbes infecting a cell. However, standard approaches for mapping the resulting sequencing reads generally ignore alternative sources of expression other than the host cell and are little equipped to address the problems arising from redundancies and gaps among sequenced microbe and virus genomes. We show that screening of sequencing reads for contaminations and infections can be performed easily using ContextMap, our recently developed mapping software. Based on mapping-derived statistics, mapping confidence, similarities and misidentifications (e.g. due to missing genome sequences) of species/strains can be assessed. Performance of our approach is evaluated on three real-life sequencing data sets and compared to state-of-the-art metagenomics tools. In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime. In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non-unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences. Our study illustrates the importance and potentials of routinely mining RNA-seq experiments for infections or contaminations by microbes and viruses. By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.

Show MeSH
Related in: MedlinePlus