Limits...
Next Generation Sequencing of Pooled Samples: Guideline for Variants ’ Filtering

View Article: PubMed Central - PubMed

ABSTRACT

Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

No MeSH data available.


Pool sequencing AF vs. AF obtained from individual genotyping by ImmunoChip SNP-array.(a) Correlation scatterplot. The points are colour coded according to the absolute difference (delta) between the two frequencies; the number of points for corresponding ranges of delta is shown in top left inset. (b) Pool-by-pool correlation. A representative scatter plot for one of the pools (12 individuals) for 1535 SNVs is shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037392&req=5

f3: Pool sequencing AF vs. AF obtained from individual genotyping by ImmunoChip SNP-array.(a) Correlation scatterplot. The points are colour coded according to the absolute difference (delta) between the two frequencies; the number of points for corresponding ranges of delta is shown in top left inset. (b) Pool-by-pool correlation. A representative scatter plot for one of the pools (12 individuals) for 1535 SNVs is shown.

Mentions: The subjects of 50 pools (out of total 83) for a total of 600 individuals have been each genotyped individually using Illumina’s Immunochip202122 SNP-genotyping platform. The Immunochip platform tested 1535 variants covered in our targeted sequencing, for which a comparison was possible between the two platforms. AFs obtained from Pool-seq show an excellent correlation (R2 = 0.987) with AFs obtained from individual genotyping, with majority of the AF-pairs (N = 69237, 90.32%) differing by <0.05 (~1 varied chromosome out of total 24 autosomes) between two sets [Fig. 3(a)]. The relative differences (absolute delta/AF) are also small both for common as well as rare variants (Supplementary Fig. S8). In addition, the pool-by-pool correlation was very high: the mean correlation for all pools was 0.987 ± 0.001 [Fig. 3(b)]. These results further show that the estimation of AF in Pool-seq is reliable and robust.


Next Generation Sequencing of Pooled Samples: Guideline for Variants ’ Filtering
Pool sequencing AF vs. AF obtained from individual genotyping by ImmunoChip SNP-array.(a) Correlation scatterplot. The points are colour coded according to the absolute difference (delta) between the two frequencies; the number of points for corresponding ranges of delta is shown in top left inset. (b) Pool-by-pool correlation. A representative scatter plot for one of the pools (12 individuals) for 1535 SNVs is shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037392&req=5

f3: Pool sequencing AF vs. AF obtained from individual genotyping by ImmunoChip SNP-array.(a) Correlation scatterplot. The points are colour coded according to the absolute difference (delta) between the two frequencies; the number of points for corresponding ranges of delta is shown in top left inset. (b) Pool-by-pool correlation. A representative scatter plot for one of the pools (12 individuals) for 1535 SNVs is shown.
Mentions: The subjects of 50 pools (out of total 83) for a total of 600 individuals have been each genotyped individually using Illumina’s Immunochip202122 SNP-genotyping platform. The Immunochip platform tested 1535 variants covered in our targeted sequencing, for which a comparison was possible between the two platforms. AFs obtained from Pool-seq show an excellent correlation (R2 = 0.987) with AFs obtained from individual genotyping, with majority of the AF-pairs (N = 69237, 90.32%) differing by <0.05 (~1 varied chromosome out of total 24 autosomes) between two sets [Fig. 3(a)]. The relative differences (absolute delta/AF) are also small both for common as well as rare variants (Supplementary Fig. S8). In addition, the pool-by-pool correlation was very high: the mean correlation for all pools was 0.987 ± 0.001 [Fig. 3(b)]. These results further show that the estimation of AF in Pool-seq is reliable and robust.

View Article: PubMed Central - PubMed

ABSTRACT

Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12&thinsp;individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

No MeSH data available.