Limits...
Vy-PER: eliminating false positive detection of virus integration events in next generation sequencing data.

Forster M, Szymczak S, Ellinghaus D, Hemmrich G, Rühlemann M, Kraemer L, Mucha S, Wienbrandt L, Staa M, UFO Sequencing Consortium within I-BFM Study GroupFranke A - Sci Rep (2015)

Bottom Line: We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure.However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses.Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform.

View Article: PubMed Central - PubMed

Affiliation: Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Schleswig-Holstein, D-24105 Kiel, Germany.

ABSTRACT
Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses.

No MeSH data available.


Related in: MedlinePlus

Vy-PER ideogram summary plot.HBV integration loci into the liver cancer genome detected at low stringency (threshold: 1 chimera), also showing detected phiX singletons.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4499804&req=5

f4: Vy-PER ideogram summary plot.HBV integration loci into the liver cancer genome detected at low stringency (threshold: 1 chimera), also showing detected phiX singletons.

Mentions: L526401A liver cancer sample (RNA). The known HBV integration loci were reported by Chen and colleagues26 and we reproduced these loci using their VirusSeq pipeline (Supplementary Table S1 online). All reported HBV loci were also detected by Vy-PER. Figure 3 displays the virus candidate loci in the genome that were detected by Vy-PER with the stringent default setting of at least ten supporting chimeras, i.e. at least ten virus/host paired-ends are required to support a virus integration locus, and Table 6 lists the virus candidate loci in 1000 bp bins. Note that there are three integration loci on chromosome 16, one more than reported by VirusSeq. However, there is no split read to support this last locus, only the respective paired read that was aligned to hg19. In transcriptome data, a paired-end library may conceivably span two or more exons, which could lead to an additional candidate locus that is 4000 bp distant from the true integration locus. Table 7 shows the number of virus candidates reported by Vy-PER if singletons are enabled, and Fig. 4 shows the corresponding candidate loci. The VirusSeq pipeline includes gene annotation of the integration loci. However, the annotation is occasionally misleading, here for the integration locus chr4:63647816-63648816 which is 1.5 megabases distant from the 3’ end of the TECRL gene. This locus is nevertheless annotated as “TECRL/3-prime”, which the VirusSeq authors have also copied into their publication26. The actual locus that VirusSeq computed is correct, but it is located in a gene desert. The nearest gene, LPHN3, is less than half the distance to TECRL. The ViralFusionSeq pipeline reported 10 clipped fusion sequences of which most are duplicates, but did not report their integration loci (Supplementary Table S2 online). Confusingly, the 100 bp fusion sequences reported by ViralFusionSeq are a tandem repeat of identical 50 bp reads. A BLAT search for the integration locus did not place any of these fusion reads correctly, due to their short length of only 50 nucleotides and the unavailable paired-end information. Only one such read aligned to the correct locus on chromosome 11, but the alternative locus on chromosome 20 was ranked higher in the BLAT search. The VirusFinder pipeline did not report any virus integrations at all (Table 4 and Supplementary Table S1 online). Finally, no other pipeline reported the phiX chimeras seen by Vy-PER, but VirusFinder reported non-integrated phiX sequences.


Vy-PER: eliminating false positive detection of virus integration events in next generation sequencing data.

Forster M, Szymczak S, Ellinghaus D, Hemmrich G, Rühlemann M, Kraemer L, Mucha S, Wienbrandt L, Staa M, UFO Sequencing Consortium within I-BFM Study GroupFranke A - Sci Rep (2015)

Vy-PER ideogram summary plot.HBV integration loci into the liver cancer genome detected at low stringency (threshold: 1 chimera), also showing detected phiX singletons.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4499804&req=5

f4: Vy-PER ideogram summary plot.HBV integration loci into the liver cancer genome detected at low stringency (threshold: 1 chimera), also showing detected phiX singletons.
Mentions: L526401A liver cancer sample (RNA). The known HBV integration loci were reported by Chen and colleagues26 and we reproduced these loci using their VirusSeq pipeline (Supplementary Table S1 online). All reported HBV loci were also detected by Vy-PER. Figure 3 displays the virus candidate loci in the genome that were detected by Vy-PER with the stringent default setting of at least ten supporting chimeras, i.e. at least ten virus/host paired-ends are required to support a virus integration locus, and Table 6 lists the virus candidate loci in 1000 bp bins. Note that there are three integration loci on chromosome 16, one more than reported by VirusSeq. However, there is no split read to support this last locus, only the respective paired read that was aligned to hg19. In transcriptome data, a paired-end library may conceivably span two or more exons, which could lead to an additional candidate locus that is 4000 bp distant from the true integration locus. Table 7 shows the number of virus candidates reported by Vy-PER if singletons are enabled, and Fig. 4 shows the corresponding candidate loci. The VirusSeq pipeline includes gene annotation of the integration loci. However, the annotation is occasionally misleading, here for the integration locus chr4:63647816-63648816 which is 1.5 megabases distant from the 3’ end of the TECRL gene. This locus is nevertheless annotated as “TECRL/3-prime”, which the VirusSeq authors have also copied into their publication26. The actual locus that VirusSeq computed is correct, but it is located in a gene desert. The nearest gene, LPHN3, is less than half the distance to TECRL. The ViralFusionSeq pipeline reported 10 clipped fusion sequences of which most are duplicates, but did not report their integration loci (Supplementary Table S2 online). Confusingly, the 100 bp fusion sequences reported by ViralFusionSeq are a tandem repeat of identical 50 bp reads. A BLAT search for the integration locus did not place any of these fusion reads correctly, due to their short length of only 50 nucleotides and the unavailable paired-end information. Only one such read aligned to the correct locus on chromosome 11, but the alternative locus on chromosome 20 was ranked higher in the BLAT search. The VirusFinder pipeline did not report any virus integrations at all (Table 4 and Supplementary Table S1 online). Finally, no other pipeline reported the phiX chimeras seen by Vy-PER, but VirusFinder reported non-integrated phiX sequences.

Bottom Line: We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure.However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses.Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform.

View Article: PubMed Central - PubMed

Affiliation: Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Schleswig-Holstein, D-24105 Kiel, Germany.

ABSTRACT
Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses.

No MeSH data available.


Related in: MedlinePlus