Limits...
Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie.

Giannoulatou E, Park SH, Humphreys DT, Ho JW - BMC Bioinformatics (2014)

Bottom Line: MT alleviates the problems associated with the lack of gold standard by checking that the results from multiple executions of a program satisfy a set of expected or desirable properties that can be derived from the software specification or user expectations.It is interesting to observe that multiple executions of the same aligner using slightly modified input FASTQ sequence file, such as after randomly re-ordering of the reads, may affect alignment results.This paper demonstrates a different framework to test a program that involves checking its properties, thus greatly expanding the number and repertoire of test cases we can apply in practice.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Bioinformatics software quality assurance is essential in genomic medicine. Systematic verification and validation of bioinformatics software is difficult because it is often not possible to obtain a realistic "gold standard" for systematic evaluation. Here we apply a technique that originates from the software testing literature, namely Metamorphic Testing (MT), to systematically test three widely used short-read sequence alignment programs.

Results: MT alleviates the problems associated with the lack of gold standard by checking that the results from multiple executions of a program satisfy a set of expected or desirable properties that can be derived from the software specification or user expectations. We tested BWA, Bowtie and Bowtie2 using simulated data and one HapMap dataset. It is interesting to observe that multiple executions of the same aligner using slightly modified input FASTQ sequence file, such as after randomly re-ordering of the reads, may affect alignment results. Furthermore, we found that the list of variant calls can be affected unless strict quality control is applied during variant calling.

Conclusion: Thorough testing of bioinformatics software is important in delivering clinical genomic medicine. This paper demonstrates a different framework to test a program that involves checking its properties, thus greatly expanding the number and repertoire of test cases we can apply in practice.

Show MeSH

Related in: MedlinePlus

Number of variants called using original read mapping and mapping after the application of MR1, MR5 and MR7. A. Using all the reads. B. After removal of non-uniquely mapped reads.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4290646&req=5

Figure 2: Number of variants called using original read mapping and mapping after the application of MR1, MR5 and MR7. A. Using all the reads. B. After removal of non-uniquely mapped reads.

Mentions: In order to investigate the effect of these properties in downstream WGS or WES analysis, we ran a commonly used pipeline that involves BWA alignment followed by using Genome Analysis Toolkit (GATK) for variant calling [38]. We ran this pipeline for the exome sequenced sample NA12872. Since our MRs do not apply any filtering on the BAM (mapping) files, the analysis was repeated after considering only the uniquely mapped reads. We found that prior to any filtering, the number of variants called is different when we use the Original BAM file, and the resulting BAM files after MR1, MR5 or MR7 (Figure 2A).


Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie.

Giannoulatou E, Park SH, Humphreys DT, Ho JW - BMC Bioinformatics (2014)

Number of variants called using original read mapping and mapping after the application of MR1, MR5 and MR7. A. Using all the reads. B. After removal of non-uniquely mapped reads.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4290646&req=5

Figure 2: Number of variants called using original read mapping and mapping after the application of MR1, MR5 and MR7. A. Using all the reads. B. After removal of non-uniquely mapped reads.
Mentions: In order to investigate the effect of these properties in downstream WGS or WES analysis, we ran a commonly used pipeline that involves BWA alignment followed by using Genome Analysis Toolkit (GATK) for variant calling [38]. We ran this pipeline for the exome sequenced sample NA12872. Since our MRs do not apply any filtering on the BAM (mapping) files, the analysis was repeated after considering only the uniquely mapped reads. We found that prior to any filtering, the number of variants called is different when we use the Original BAM file, and the resulting BAM files after MR1, MR5 or MR7 (Figure 2A).

Bottom Line: MT alleviates the problems associated with the lack of gold standard by checking that the results from multiple executions of a program satisfy a set of expected or desirable properties that can be derived from the software specification or user expectations.It is interesting to observe that multiple executions of the same aligner using slightly modified input FASTQ sequence file, such as after randomly re-ordering of the reads, may affect alignment results.This paper demonstrates a different framework to test a program that involves checking its properties, thus greatly expanding the number and repertoire of test cases we can apply in practice.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Bioinformatics software quality assurance is essential in genomic medicine. Systematic verification and validation of bioinformatics software is difficult because it is often not possible to obtain a realistic "gold standard" for systematic evaluation. Here we apply a technique that originates from the software testing literature, namely Metamorphic Testing (MT), to systematically test three widely used short-read sequence alignment programs.

Results: MT alleviates the problems associated with the lack of gold standard by checking that the results from multiple executions of a program satisfy a set of expected or desirable properties that can be derived from the software specification or user expectations. We tested BWA, Bowtie and Bowtie2 using simulated data and one HapMap dataset. It is interesting to observe that multiple executions of the same aligner using slightly modified input FASTQ sequence file, such as after randomly re-ordering of the reads, may affect alignment results. Furthermore, we found that the list of variant calls can be affected unless strict quality control is applied during variant calling.

Conclusion: Thorough testing of bioinformatics software is important in delivering clinical genomic medicine. This paper demonstrates a different framework to test a program that involves checking its properties, thus greatly expanding the number and repertoire of test cases we can apply in practice.

Show MeSH
Related in: MedlinePlus