Limits...
vipR: variant identification in pooled DNA using R.

Altmann A, Weber P, Quast C, Rex-Haffner M, Binder EB, Müller-Myhsok B - Bioinformatics (2011)

Bottom Line: The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool.The performance of vipR was compared with three other models on data from a targeted resequencing study of the TMEM132D locus in 600 individuals distributed over four DNA pools.On a set of 82 sequence variants, vipR achieved an average sensitivity of 0.80 at an average specificity of 0.92, thus outperforming the reference methods by at least 0.17 in specificity at comparable sensitivity.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany. altmann@mpipsykl.mpg.de

ABSTRACT

Motivation: High-throughput-sequencing (HTS) technologies are the method of choice for screening the human genome for rare sequence variants causing susceptibility to complex diseases. Unfortunately, preparation of samples for a large number of individuals is still very cost- and labor intensive. Thus, recently, screens for rare sequence variants were carried out in samples of pooled DNA, in which equimolar amounts of DNA from multiple individuals are mixed prior to sequencing with HTS. The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool.

Results: Our method vipR uses data from multiple DNA pools in order to compensate for differences in sequencing error rates along the sequenced region. More precisely, instead of aiming at discriminating sequence variants from sequencing errors, vipR identifies sequence positions that exhibit significantly different minor allele frequencies in at least two DNA pools using the Skellam distribution. The performance of vipR was compared with three other models on data from a targeted resequencing study of the TMEM132D locus in 600 individuals distributed over four DNA pools. Performance of the methods was computed on SNPs that were also genotyped individually using a MALDI-TOF technique. On a set of 82 sequence variants, vipR achieved an average sensitivity of 0.80 at an average specificity of 0.92, thus outperforming the reference methods by at least 0.17 in specificity at comparable sensitivity.

Availability: The code of vipR is freely available via: http://sourceforge.net/projects/htsvipr/

Contact: altmann@mpipsykl.mpg.de.

Show MeSH

Related in: MedlinePlus

Runtime of variant calling algorithms on the TMEM132D dataset in dependence of the number of pools. Time was measured in seconds and assessed on a single Intel core at 2.67 GHz (and 6 GB memory).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117388&req=5

Figure 3: Runtime of variant calling algorithms on the TMEM132D dataset in dependence of the number of pools. Time was measured in seconds and assessed on a single Intel core at 2.67 GHz (and 6 GB memory).

Mentions: Figure 3 depicts the runtime behavior of all four tested algorithms on a single Intel core at 2.67 GHz (and 6 GB memory). The reported times comprise the computational time required for producing the output file starting from a pileup file, i.e. the generation of the pileup file was not part of the performance assessment. As expected, for the two programs that analyze DNA pools independently the runtime grows linearly with the increasing number of pools. The more interesting case concerns the tools that analyze the DNA pools in parallel. Here, vipR clearly outperforms CRISP: where CRISP ranges from 1.5 days for two pools to 9.5 days for all four pools, vipR requires only ≈20 min for all four pools. Remarkably, vipR was even quicker than Poisson. This fact, however, was mainly caused by longer output files generated by the Poisson model (comprising on average 650 SNPs versus 371 SNPs with vipR; Table 2). Moreover, the majority of time for vipR and Poisson was required during the pre-processing step (indicated by the difference between dashed and solid lines in Fig. 3).Fig. 3.


vipR: variant identification in pooled DNA using R.

Altmann A, Weber P, Quast C, Rex-Haffner M, Binder EB, Müller-Myhsok B - Bioinformatics (2011)

Runtime of variant calling algorithms on the TMEM132D dataset in dependence of the number of pools. Time was measured in seconds and assessed on a single Intel core at 2.67 GHz (and 6 GB memory).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117388&req=5

Figure 3: Runtime of variant calling algorithms on the TMEM132D dataset in dependence of the number of pools. Time was measured in seconds and assessed on a single Intel core at 2.67 GHz (and 6 GB memory).
Mentions: Figure 3 depicts the runtime behavior of all four tested algorithms on a single Intel core at 2.67 GHz (and 6 GB memory). The reported times comprise the computational time required for producing the output file starting from a pileup file, i.e. the generation of the pileup file was not part of the performance assessment. As expected, for the two programs that analyze DNA pools independently the runtime grows linearly with the increasing number of pools. The more interesting case concerns the tools that analyze the DNA pools in parallel. Here, vipR clearly outperforms CRISP: where CRISP ranges from 1.5 days for two pools to 9.5 days for all four pools, vipR requires only ≈20 min for all four pools. Remarkably, vipR was even quicker than Poisson. This fact, however, was mainly caused by longer output files generated by the Poisson model (comprising on average 650 SNPs versus 371 SNPs with vipR; Table 2). Moreover, the majority of time for vipR and Poisson was required during the pre-processing step (indicated by the difference between dashed and solid lines in Fig. 3).Fig. 3.

Bottom Line: The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool.The performance of vipR was compared with three other models on data from a targeted resequencing study of the TMEM132D locus in 600 individuals distributed over four DNA pools.On a set of 82 sequence variants, vipR achieved an average sensitivity of 0.80 at an average specificity of 0.92, thus outperforming the reference methods by at least 0.17 in specificity at comparable sensitivity.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany. altmann@mpipsykl.mpg.de

ABSTRACT

Motivation: High-throughput-sequencing (HTS) technologies are the method of choice for screening the human genome for rare sequence variants causing susceptibility to complex diseases. Unfortunately, preparation of samples for a large number of individuals is still very cost- and labor intensive. Thus, recently, screens for rare sequence variants were carried out in samples of pooled DNA, in which equimolar amounts of DNA from multiple individuals are mixed prior to sequencing with HTS. The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool.

Results: Our method vipR uses data from multiple DNA pools in order to compensate for differences in sequencing error rates along the sequenced region. More precisely, instead of aiming at discriminating sequence variants from sequencing errors, vipR identifies sequence positions that exhibit significantly different minor allele frequencies in at least two DNA pools using the Skellam distribution. The performance of vipR was compared with three other models on data from a targeted resequencing study of the TMEM132D locus in 600 individuals distributed over four DNA pools. Performance of the methods was computed on SNPs that were also genotyped individually using a MALDI-TOF technique. On a set of 82 sequence variants, vipR achieved an average sensitivity of 0.80 at an average specificity of 0.92, thus outperforming the reference methods by at least 0.17 in specificity at comparable sensitivity.

Availability: The code of vipR is freely available via: http://sourceforge.net/projects/htsvipr/

Contact: altmann@mpipsykl.mpg.de.

Show MeSH
Related in: MedlinePlus