Limits...
Challenges in exome analysis by LifeScope and its alternative computational pipelines.

Pranckevičiene E, Rančelis T, Pranculis A, Kučinskas V - BMC Res Notes (2015)

Bottom Line: We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior.We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system.It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

View Article: PubMed Central - PubMed

Affiliation: Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania. erinija.pranckeviciene@mf.vu.lt.

ABSTRACT

Background: Every next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope's pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope's pipeline versus open source tools.

Results: Sufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50% concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants.

Conclusions: We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

No MeSH data available.


Related in: MedlinePlus

Agreement between different variant calling approaches with respect to called SNPs that are present in ClinVar and COSMIC databases. Venn diagrams show how much different approaches agree in identifying harmful variants. The middle area of each diagram shows number of variants common to all methods. The Venn diagram leafs show number of variants specific to each method. On the left are diagrams representing SNPs in ClinVar database and on the right is distribution of identified SNPs present in COSMIC database. The ittop part of the figure shows diagrams of proband, the itmiddle represents mother and the bottom represents father. Substantial agreement between the methods was observed on pathogenic and drug response ClinVar variants
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4562342&req=5

Fig4: Agreement between different variant calling approaches with respect to called SNPs that are present in ClinVar and COSMIC databases. Venn diagrams show how much different approaches agree in identifying harmful variants. The middle area of each diagram shows number of variants common to all methods. The Venn diagram leafs show number of variants specific to each method. On the left are diagrams representing SNPs in ClinVar database and on the right is distribution of identified SNPs present in COSMIC database. The ittop part of the figure shows diagrams of proband, the itmiddle represents mother and the bottom represents father. Substantial agreement between the methods was observed on pathogenic and drug response ClinVar variants

Mentions: ClinVar is a public archive that provides reports of relationships among medically important variants and phenotypes. Data to ClinVar streams from OMIM, GeneReviews, dbSNP and also from direct submissions by scientists. Database represents 19,774 genes which include 149,202 variants from 248 submitters [25]. COSMIC (Catalog of Somatic Mutations in Cancer) database is designed to store and display somatic mutation information and contains information relating to human cancers - publications, samples and mutations. COSMIC database describes over 2500 cancer disease classifications, from 47 primary tissue types and represents full literature curation of 136 genes and 12,542 cancer genomes [26]. We analyzed how many harmful variants from ClinVar and COSMIC were identified by each investigated variant calling approach and how these approaches complemented each other in detecting important deleterious variants. Summary of this analysis is presented in Table 6. Figure 4 illustrates an agreement of the variant calling approaches in detecting those deleterious variants. Largest number of deleterious variants is detected by LifeScope’s pipeline. Pipelines based on MAQ and BFAST are similar to each other in terms of their performance.Table 6


Challenges in exome analysis by LifeScope and its alternative computational pipelines.

Pranckevičiene E, Rančelis T, Pranculis A, Kučinskas V - BMC Res Notes (2015)

Agreement between different variant calling approaches with respect to called SNPs that are present in ClinVar and COSMIC databases. Venn diagrams show how much different approaches agree in identifying harmful variants. The middle area of each diagram shows number of variants common to all methods. The Venn diagram leafs show number of variants specific to each method. On the left are diagrams representing SNPs in ClinVar database and on the right is distribution of identified SNPs present in COSMIC database. The ittop part of the figure shows diagrams of proband, the itmiddle represents mother and the bottom represents father. Substantial agreement between the methods was observed on pathogenic and drug response ClinVar variants
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4562342&req=5

Fig4: Agreement between different variant calling approaches with respect to called SNPs that are present in ClinVar and COSMIC databases. Venn diagrams show how much different approaches agree in identifying harmful variants. The middle area of each diagram shows number of variants common to all methods. The Venn diagram leafs show number of variants specific to each method. On the left are diagrams representing SNPs in ClinVar database and on the right is distribution of identified SNPs present in COSMIC database. The ittop part of the figure shows diagrams of proband, the itmiddle represents mother and the bottom represents father. Substantial agreement between the methods was observed on pathogenic and drug response ClinVar variants
Mentions: ClinVar is a public archive that provides reports of relationships among medically important variants and phenotypes. Data to ClinVar streams from OMIM, GeneReviews, dbSNP and also from direct submissions by scientists. Database represents 19,774 genes which include 149,202 variants from 248 submitters [25]. COSMIC (Catalog of Somatic Mutations in Cancer) database is designed to store and display somatic mutation information and contains information relating to human cancers - publications, samples and mutations. COSMIC database describes over 2500 cancer disease classifications, from 47 primary tissue types and represents full literature curation of 136 genes and 12,542 cancer genomes [26]. We analyzed how many harmful variants from ClinVar and COSMIC were identified by each investigated variant calling approach and how these approaches complemented each other in detecting important deleterious variants. Summary of this analysis is presented in Table 6. Figure 4 illustrates an agreement of the variant calling approaches in detecting those deleterious variants. Largest number of deleterious variants is detected by LifeScope’s pipeline. Pipelines based on MAQ and BFAST are similar to each other in terms of their performance.Table 6

Bottom Line: We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior.We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system.It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

View Article: PubMed Central - PubMed

Affiliation: Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania. erinija.pranckeviciene@mf.vu.lt.

ABSTRACT

Background: Every next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope's pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope's pipeline versus open source tools.

Results: Sufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50% concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope's computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants.

Conclusions: We quantitatively supported a conclusion that Lifescope's pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

No MeSH data available.


Related in: MedlinePlus