Limits...
Challenges in exome analysis by LifeScope and its alternative computational pipelines.

Pranckevičiene E, Rančelis T, Pranculis A, Kučinskas V - BMC Res Notes (2015)

Bottom Line: We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior.We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system.It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

View Article: PubMed Central - PubMed

Affiliation: Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania. erinija.pranckeviciene@mf.vu.lt.

ABSTRACT

Background: Every next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope's pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope's pipeline versus open source tools.

Results: Sufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50% concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants.

Conclusions: We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

No MeSH data available.


Related in: MedlinePlus

Empirical cumulative distribution function (ECDF) of variant quality (QUAL) property assigned by GATK for variants identified in alignments produced by different mapping programs. To compute ECFD only variants that have been identified by all approaches together were used. ECDF’s of different alignments are color-coded: BFAST by blue line, LifeScope-GATK by black, MAQ by green and SHRiMP by red. Panels correspond to the family exomes. ECDF plots of QUAL per method in proband exome are on the top, the mother exome is in the middle and exome of the father is on the bottom. Median QUAL value of LifeScope-GATK consistently apears around 300 in all exomes. For BFAST-GATK it is around 200. Other approaches differ across the exomes
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4562342&req=5

Fig3: Empirical cumulative distribution function (ECDF) of variant quality (QUAL) property assigned by GATK for variants identified in alignments produced by different mapping programs. To compute ECFD only variants that have been identified by all approaches together were used. ECDF’s of different alignments are color-coded: BFAST by blue line, LifeScope-GATK by black, MAQ by green and SHRiMP by red. Panels correspond to the family exomes. ECDF plots of QUAL per method in proband exome are on the top, the mother exome is in the middle and exome of the father is on the bottom. Median QUAL value of LifeScope-GATK consistently apears around 300 in all exomes. For BFAST-GATK it is around 200. Other approaches differ across the exomes

Mentions: Depth of coverage (DP) and variant quality (QUAL) properties are assigned by GATK to the called variants. DP represents number of reads that overlap in the genomic position of a variant. QUAL is Phred encoded score assigned to the variant by GATK showing call quality and it can be very large. We assume that a better variant calling approach produces variants with high DP and QUAL values. To compare the DP and QUAL across the methods we used the variants simultaneously identified by all used methods. Figure 3 illustrates per-method per exome differences in QUAL property by means of its empirical distribution functions. The best QUAL values were achieved by SHRiMP-GATK, followed by MAQ-GATK, followed by LifeScope-GATK and the last was BFAST-GATK. The overall result for the DP property is presented in Table 5. With regards to DP a highest variant coverage is achieved by LifeScope. Variants produced by both SHRiMP-GATK and LifeScope-GATK have higher median coverage than MAQ-GATK and BFAST-GATK.


Challenges in exome analysis by LifeScope and its alternative computational pipelines.

Pranckevičiene E, Rančelis T, Pranculis A, Kučinskas V - BMC Res Notes (2015)

Empirical cumulative distribution function (ECDF) of variant quality (QUAL) property assigned by GATK for variants identified in alignments produced by different mapping programs. To compute ECFD only variants that have been identified by all approaches together were used. ECDF’s of different alignments are color-coded: BFAST by blue line, LifeScope-GATK by black, MAQ by green and SHRiMP by red. Panels correspond to the family exomes. ECDF plots of QUAL per method in proband exome are on the top, the mother exome is in the middle and exome of the father is on the bottom. Median QUAL value of LifeScope-GATK consistently apears around 300 in all exomes. For BFAST-GATK it is around 200. Other approaches differ across the exomes
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4562342&req=5

Fig3: Empirical cumulative distribution function (ECDF) of variant quality (QUAL) property assigned by GATK for variants identified in alignments produced by different mapping programs. To compute ECFD only variants that have been identified by all approaches together were used. ECDF’s of different alignments are color-coded: BFAST by blue line, LifeScope-GATK by black, MAQ by green and SHRiMP by red. Panels correspond to the family exomes. ECDF plots of QUAL per method in proband exome are on the top, the mother exome is in the middle and exome of the father is on the bottom. Median QUAL value of LifeScope-GATK consistently apears around 300 in all exomes. For BFAST-GATK it is around 200. Other approaches differ across the exomes
Mentions: Depth of coverage (DP) and variant quality (QUAL) properties are assigned by GATK to the called variants. DP represents number of reads that overlap in the genomic position of a variant. QUAL is Phred encoded score assigned to the variant by GATK showing call quality and it can be very large. We assume that a better variant calling approach produces variants with high DP and QUAL values. To compare the DP and QUAL across the methods we used the variants simultaneously identified by all used methods. Figure 3 illustrates per-method per exome differences in QUAL property by means of its empirical distribution functions. The best QUAL values were achieved by SHRiMP-GATK, followed by MAQ-GATK, followed by LifeScope-GATK and the last was BFAST-GATK. The overall result for the DP property is presented in Table 5. With regards to DP a highest variant coverage is achieved by LifeScope. Variants produced by both SHRiMP-GATK and LifeScope-GATK have higher median coverage than MAQ-GATK and BFAST-GATK.

Bottom Line: We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior.We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system.It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

View Article: PubMed Central - PubMed

Affiliation: Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania. erinija.pranckeviciene@mf.vu.lt.

ABSTRACT

Background: Every next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope's pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope's pipeline versus open source tools.

Results: Sufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50% concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants.

Conclusions: We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

No MeSH data available.


Related in: MedlinePlus