Limits...
Challenges in exome analysis by LifeScope and its alternative computational pipelines.

Pranckevičiene E, Rančelis T, Pranculis A, Kučinskas V - BMC Res Notes (2015)

Bottom Line: We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior.We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system.It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

View Article: PubMed Central - PubMed

Affiliation: Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania. erinija.pranckeviciene@mf.vu.lt.

ABSTRACT

Background: Every next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope's pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope's pipeline versus open source tools.

Results: Sufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50% concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants.

Conclusions: We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

No MeSH data available.


Related in: MedlinePlus

Coverage of target regions by mapping methods. Average coverage of the targeted exome regions by the mapped reads in family exomes is shown. Five coverage intervals that we created to assess mappings of different methods are presented in the legend. They comprise intervals of [1,10), [10,20), [20,30), [30,60), [60,100) and equal and higher than 100. Each individual barplot shows a percentages of the targeted regions falling into a coverage category for each mapping method: LifeScope, SHRiMP, MAQ and BFAST. The targeted regions are mostly covered by 30–60 reads in all mapping methods
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4562342&req=5

Fig2: Coverage of target regions by mapping methods. Average coverage of the targeted exome regions by the mapped reads in family exomes is shown. Five coverage intervals that we created to assess mappings of different methods are presented in the legend. They comprise intervals of [1,10), [10,20), [20,30), [30,60), [60,100) and equal and higher than 100. Each individual barplot shows a percentages of the targeted regions falling into a coverage category for each mapping method: LifeScope, SHRiMP, MAQ and BFAST. The targeted regions are mostly covered by 30–60 reads in all mapping methods

Mentions: All mapping methods covered 97 % of the targeted regions. How the methods comapre to each other in mapping is presented in Fig. 2. The largest fraction of the targeted exome regions are covered by 30–60 reads. LifeScope and SHRiMP produced better coverage than MAQ and BFAST. We computed which fraction of regions of the low-coverage (less than 20 reads) by LifeScope are covered better by other mapping programs. SHRiMP improved coverage on 6 % and MAQ on 1 % of those regions. Analysis of the agreement between the individual aligners shows that alternative aligners can map only negligible fraction of the reads unmapped by LifeScope. If compared to each other, then MAQ can map about 28 % of the reads unmapped by BFAST and about 19 % of the reads unmapped by SHRiMP. BFAST can align 12 % and SHRiMP can align 19 % of the reads unmapped by MAQ.


Challenges in exome analysis by LifeScope and its alternative computational pipelines.

Pranckevičiene E, Rančelis T, Pranculis A, Kučinskas V - BMC Res Notes (2015)

Coverage of target regions by mapping methods. Average coverage of the targeted exome regions by the mapped reads in family exomes is shown. Five coverage intervals that we created to assess mappings of different methods are presented in the legend. They comprise intervals of [1,10), [10,20), [20,30), [30,60), [60,100) and equal and higher than 100. Each individual barplot shows a percentages of the targeted regions falling into a coverage category for each mapping method: LifeScope, SHRiMP, MAQ and BFAST. The targeted regions are mostly covered by 30–60 reads in all mapping methods
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4562342&req=5

Fig2: Coverage of target regions by mapping methods. Average coverage of the targeted exome regions by the mapped reads in family exomes is shown. Five coverage intervals that we created to assess mappings of different methods are presented in the legend. They comprise intervals of [1,10), [10,20), [20,30), [30,60), [60,100) and equal and higher than 100. Each individual barplot shows a percentages of the targeted regions falling into a coverage category for each mapping method: LifeScope, SHRiMP, MAQ and BFAST. The targeted regions are mostly covered by 30–60 reads in all mapping methods
Mentions: All mapping methods covered 97 % of the targeted regions. How the methods comapre to each other in mapping is presented in Fig. 2. The largest fraction of the targeted exome regions are covered by 30–60 reads. LifeScope and SHRiMP produced better coverage than MAQ and BFAST. We computed which fraction of regions of the low-coverage (less than 20 reads) by LifeScope are covered better by other mapping programs. SHRiMP improved coverage on 6 % and MAQ on 1 % of those regions. Analysis of the agreement between the individual aligners shows that alternative aligners can map only negligible fraction of the reads unmapped by LifeScope. If compared to each other, then MAQ can map about 28 % of the reads unmapped by BFAST and about 19 % of the reads unmapped by SHRiMP. BFAST can align 12 % and SHRiMP can align 19 % of the reads unmapped by MAQ.

Bottom Line: We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior.We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system.It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

View Article: PubMed Central - PubMed

Affiliation: Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania. erinija.pranckeviciene@mf.vu.lt.

ABSTRACT

Background: Every next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope's pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope's pipeline versus open source tools.

Results: Sufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50% concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants.

Conclusions: We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

No MeSH data available.


Related in: MedlinePlus