Limits...
Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies.

Damiati E, Borsani G, Giacopuzzi E - Hum. Genet. (2016)

Bottom Line: Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively.The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively.False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

View Article: PubMed Central - PubMed

Affiliation: Unit of Genetics, Department of Molecular and Translational Medicine, University of Brescia, 25123, Brescia, Italy.

ABSTRACT
The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

No MeSH data available.


Related in: MedlinePlus

Analysis of variant calling errors in HiQ datasets. A detailed characterization of errors in variants identified by the TVC using the optimize parameters provided by the manufacturer. a Fraction of true-positive (red) and false-positive (black) variants occurring with more than 2 or more than 3 alternate alleles in the corresponding VCF file. b Fraction of false-negative SNP (blue) and indels (green) with read depth <10 in the corresponding sample. The indel length of false positive (red), true positive (green) and false negative (blue) calls identified across the nine HiQ datasets is analyzed in c. True positive calls are highly consistent across the nine HiQ samples, while false-positive calls are often run specific as suggested by the density plot (d), that evidences the recurrence of false positives (red) and true positives (green)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835520&req=5

Fig3: Analysis of variant calling errors in HiQ datasets. A detailed characterization of errors in variants identified by the TVC using the optimize parameters provided by the manufacturer. a Fraction of true-positive (red) and false-positive (black) variants occurring with more than 2 or more than 3 alternate alleles in the corresponding VCF file. b Fraction of false-negative SNP (blue) and indels (green) with read depth <10 in the corresponding sample. The indel length of false positive (red), true positive (green) and false negative (blue) calls identified across the nine HiQ datasets is analyzed in c. True positive calls are highly consistent across the nine HiQ samples, while false-positive calls are often run specific as suggested by the density plot (d), that evidences the recurrence of false positives (red) and true positives (green)

Mentions: To investigate the nature of errors in variants identification on the Ion Proton platform, we performed a detailed characterization of both false-positive and false-negative variants described above. We first evaluated the distribution of 11 parameters reported by the Torrent Variant Caller (see “Materials and methods”) across false-positive and true-positive variants for indels and SNPs separately (Fig. 4). Moreover, we assessed the proportion of false-positive and true-positive variants represented as multiallelic variant calls in the original VCF file. Concerning indels, variants occurring with three or more alternate alleles represented 8.5–12.5 % of false positives and only 0–0.4 % of true positives (Fig. 3a). No significant differences were detected for SNP variants (data not shown). For false-negative variants, we evaluated the proportion of missed calls due to low read depth (<10 reads), showing that 26–41 % of false-negative SNPs and 10–23 % of false-negative indels are due to low coverage (Fig. 3b). Further inspection of the false-negative calls with read depth >10× revealed that triplet repetition and homopolymeric regions are recurrent among missed variants (data not shown). Analysis of the indels length showed that most false positives and false negatives are represented by short (1–2 bp) insertion/deletions, while large ones above 100 bp are almost all erroneous calls (Fig. 3c). We then compared read length distribution and variant identification performances in the nine HiQ datasets. The NA12878_HiQ dataset, that showed lower performances in variant identification (see Supplementary table 10), revealed a substantial deviation from the expected distribution with a loss of long fragments (Supplementary figure 6).Fig. 3


Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies.

Damiati E, Borsani G, Giacopuzzi E - Hum. Genet. (2016)

Analysis of variant calling errors in HiQ datasets. A detailed characterization of errors in variants identified by the TVC using the optimize parameters provided by the manufacturer. a Fraction of true-positive (red) and false-positive (black) variants occurring with more than 2 or more than 3 alternate alleles in the corresponding VCF file. b Fraction of false-negative SNP (blue) and indels (green) with read depth <10 in the corresponding sample. The indel length of false positive (red), true positive (green) and false negative (blue) calls identified across the nine HiQ datasets is analyzed in c. True positive calls are highly consistent across the nine HiQ samples, while false-positive calls are often run specific as suggested by the density plot (d), that evidences the recurrence of false positives (red) and true positives (green)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835520&req=5

Fig3: Analysis of variant calling errors in HiQ datasets. A detailed characterization of errors in variants identified by the TVC using the optimize parameters provided by the manufacturer. a Fraction of true-positive (red) and false-positive (black) variants occurring with more than 2 or more than 3 alternate alleles in the corresponding VCF file. b Fraction of false-negative SNP (blue) and indels (green) with read depth <10 in the corresponding sample. The indel length of false positive (red), true positive (green) and false negative (blue) calls identified across the nine HiQ datasets is analyzed in c. True positive calls are highly consistent across the nine HiQ samples, while false-positive calls are often run specific as suggested by the density plot (d), that evidences the recurrence of false positives (red) and true positives (green)
Mentions: To investigate the nature of errors in variants identification on the Ion Proton platform, we performed a detailed characterization of both false-positive and false-negative variants described above. We first evaluated the distribution of 11 parameters reported by the Torrent Variant Caller (see “Materials and methods”) across false-positive and true-positive variants for indels and SNPs separately (Fig. 4). Moreover, we assessed the proportion of false-positive and true-positive variants represented as multiallelic variant calls in the original VCF file. Concerning indels, variants occurring with three or more alternate alleles represented 8.5–12.5 % of false positives and only 0–0.4 % of true positives (Fig. 3a). No significant differences were detected for SNP variants (data not shown). For false-negative variants, we evaluated the proportion of missed calls due to low read depth (<10 reads), showing that 26–41 % of false-negative SNPs and 10–23 % of false-negative indels are due to low coverage (Fig. 3b). Further inspection of the false-negative calls with read depth >10× revealed that triplet repetition and homopolymeric regions are recurrent among missed variants (data not shown). Analysis of the indels length showed that most false positives and false negatives are represented by short (1–2 bp) insertion/deletions, while large ones above 100 bp are almost all erroneous calls (Fig. 3c). We then compared read length distribution and variant identification performances in the nine HiQ datasets. The NA12878_HiQ dataset, that showed lower performances in variant identification (see Supplementary table 10), revealed a substantial deviation from the expected distribution with a loss of long fragments (Supplementary figure 6).Fig. 3

Bottom Line: Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively.The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively.False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

View Article: PubMed Central - PubMed

Affiliation: Unit of Genetics, Department of Molecular and Translational Medicine, University of Brescia, 25123, Brescia, Italy.

ABSTRACT
The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

No MeSH data available.


Related in: MedlinePlus