Limits...
Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies.

Damiati E, Borsani G, Giacopuzzi E - Hum. Genet. (2016)

Bottom Line: Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively.The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively.False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

View Article: PubMed Central - PubMed

Affiliation: Unit of Genetics, Department of Molecular and Translational Medicine, University of Brescia, 25123, Brescia, Italy.

ABSTRACT
The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

No MeSH data available.


Related in: MedlinePlus

Study of parameters for variant filtering. We evaluated the distribution of 11 parameters reported by the TVC for indel (a) and SNP (b) variants identified from the HiQ datasets. AO alternate allele observation, DP read depth, FAO flow-space alternate allele observations, FDP flow space read depth, FXX flow evaluator failed reads ratio, GQ genotype quality, HRUN length of homopolymer, QD quality per read length, QUAL variant quality, STB strand bias ratio, STBP strand bias p value. The five parameters selected for filtering are reported as solid lines
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835520&req=5

Fig4: Study of parameters for variant filtering. We evaluated the distribution of 11 parameters reported by the TVC for indel (a) and SNP (b) variants identified from the HiQ datasets. AO alternate allele observation, DP read depth, FAO flow-space alternate allele observations, FDP flow space read depth, FXX flow evaluator failed reads ratio, GQ genotype quality, HRUN length of homopolymer, QD quality per read length, QUAL variant quality, STB strand bias ratio, STBP strand bias p value. The five parameters selected for filtering are reported as solid lines

Mentions: To investigate the nature of errors in variants identification on the Ion Proton platform, we performed a detailed characterization of both false-positive and false-negative variants described above. We first evaluated the distribution of 11 parameters reported by the Torrent Variant Caller (see “Materials and methods”) across false-positive and true-positive variants for indels and SNPs separately (Fig. 4). Moreover, we assessed the proportion of false-positive and true-positive variants represented as multiallelic variant calls in the original VCF file. Concerning indels, variants occurring with three or more alternate alleles represented 8.5–12.5 % of false positives and only 0–0.4 % of true positives (Fig. 3a). No significant differences were detected for SNP variants (data not shown). For false-negative variants, we evaluated the proportion of missed calls due to low read depth (<10 reads), showing that 26–41 % of false-negative SNPs and 10–23 % of false-negative indels are due to low coverage (Fig. 3b). Further inspection of the false-negative calls with read depth >10× revealed that triplet repetition and homopolymeric regions are recurrent among missed variants (data not shown). Analysis of the indels length showed that most false positives and false negatives are represented by short (1–2 bp) insertion/deletions, while large ones above 100 bp are almost all erroneous calls (Fig. 3c). We then compared read length distribution and variant identification performances in the nine HiQ datasets. The NA12878_HiQ dataset, that showed lower performances in variant identification (see Supplementary table 10), revealed a substantial deviation from the expected distribution with a loss of long fragments (Supplementary figure 6).Fig. 3


Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies.

Damiati E, Borsani G, Giacopuzzi E - Hum. Genet. (2016)

Study of parameters for variant filtering. We evaluated the distribution of 11 parameters reported by the TVC for indel (a) and SNP (b) variants identified from the HiQ datasets. AO alternate allele observation, DP read depth, FAO flow-space alternate allele observations, FDP flow space read depth, FXX flow evaluator failed reads ratio, GQ genotype quality, HRUN length of homopolymer, QD quality per read length, QUAL variant quality, STB strand bias ratio, STBP strand bias p value. The five parameters selected for filtering are reported as solid lines
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835520&req=5

Fig4: Study of parameters for variant filtering. We evaluated the distribution of 11 parameters reported by the TVC for indel (a) and SNP (b) variants identified from the HiQ datasets. AO alternate allele observation, DP read depth, FAO flow-space alternate allele observations, FDP flow space read depth, FXX flow evaluator failed reads ratio, GQ genotype quality, HRUN length of homopolymer, QD quality per read length, QUAL variant quality, STB strand bias ratio, STBP strand bias p value. The five parameters selected for filtering are reported as solid lines
Mentions: To investigate the nature of errors in variants identification on the Ion Proton platform, we performed a detailed characterization of both false-positive and false-negative variants described above. We first evaluated the distribution of 11 parameters reported by the Torrent Variant Caller (see “Materials and methods”) across false-positive and true-positive variants for indels and SNPs separately (Fig. 4). Moreover, we assessed the proportion of false-positive and true-positive variants represented as multiallelic variant calls in the original VCF file. Concerning indels, variants occurring with three or more alternate alleles represented 8.5–12.5 % of false positives and only 0–0.4 % of true positives (Fig. 3a). No significant differences were detected for SNP variants (data not shown). For false-negative variants, we evaluated the proportion of missed calls due to low read depth (<10 reads), showing that 26–41 % of false-negative SNPs and 10–23 % of false-negative indels are due to low coverage (Fig. 3b). Further inspection of the false-negative calls with read depth >10× revealed that triplet repetition and homopolymeric regions are recurrent among missed variants (data not shown). Analysis of the indels length showed that most false positives and false negatives are represented by short (1–2 bp) insertion/deletions, while large ones above 100 bp are almost all erroneous calls (Fig. 3c). We then compared read length distribution and variant identification performances in the nine HiQ datasets. The NA12878_HiQ dataset, that showed lower performances in variant identification (see Supplementary table 10), revealed a substantial deviation from the expected distribution with a loss of long fragments (Supplementary figure 6).Fig. 3

Bottom Line: Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively.The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively.False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

View Article: PubMed Central - PubMed

Affiliation: Unit of Genetics, Department of Molecular and Translational Medicine, University of Brescia, 25123, Brescia, Italy.

ABSTRACT
The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

No MeSH data available.


Related in: MedlinePlus