Limits...
TMC-SNPdb: an Indian germline variant database derived from whole exome sequences.

Upadhyay P, Gardi N, Desai S, Sahoo B, Singh A, Togar T, Iyer P, Prasad R, Chandrani P, Gupta S, Dutt A - Database (Oxford) (2016)

Bottom Line: The current build of dbSNP, the most comprehensive public SNP database, however inadequately represents several non-European Caucasian populations, posing a limitation in cancer genomic analyses of data from these populations.Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that TMC-SNPdb could deplete 42, 33 and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively.Beyond cancer somatic analyses, we anticipate utility of the TMC-SNPdb in several Mendelian germline diseases.

View Article: PubMed Central - PubMed

Affiliation: Integrated Genomics Laboratory, Advanced Centre for Treatment Research Education in Cancer (ACTREC).

No MeSH data available.


Related in: MedlinePlus

Overall overview of characteristic features of TMC-SNP database. (A) Circle plot of coding and non-coding variants obtained in the dataset. (B) Percent minor allele frequency distribution of variants in ‘TMC-SNPdb’ across 62 normal samples. Percentage frequencies are presented on the top of each bar. (C) Genome-wide distribution of percent frequency of variants obtained in each chromosome as compared with dbSNP database.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4940432&req=5

baw104-F2: Overall overview of characteristic features of TMC-SNP database. (A) Circle plot of coding and non-coding variants obtained in the dataset. (B) Percent minor allele frequency distribution of variants in ‘TMC-SNPdb’ across 62 normal samples. Percentage frequencies are presented on the top of each bar. (C) Genome-wide distribution of percent frequency of variants obtained in each chromosome as compared with dbSNP database.

Mentions: A total 114 309 variants were annotated using Oncotator for functional features (25). A distribution pattern of coding (∼17 973) and non-coding variants germline variants (∼96 336) is shown in Figure 2A. Of 17 973 coding variants, 11 466 were of non-synonymous (NS) (∼63%) and 6507 were synonymous variants (S) (∼36%) with NS/S ratio 1.76, consistent with previous reports for exome data from normal samples (27, 28). Furthermore, we observed a high frequency of missense (∼58%) and silent variants (∼30%) as compared with indel (∼3%), nonsense (∼2%) and splice site (∼6%) region (Supplementary Figure 1A). Of all the SNPs present in TMC-SNPdb, distribution varied across the genome as follows: protein-coding exon (15.7%), intron (40%), IGR (25.8%), 3′UTR (9.5%), 5′UTR (2.37%), RNA (3.74%) and lincRNA (1.7%), consistent with earlier report from exome sequencing data (Supplementary Figure S1B) (29, 30). Next, we computed the allele frequency of all 114 309 variants present in the TMC-SNPdb, across 62 samples. Given that TMC-SNPdb predominantly enlists low frequency germline variants prevalent among Indian population, similar to 1000 genomes and ExAC wherein about 99% of SNPs are estimated to have a minor allele frequency over 1% (8, 26), Similarly, in TMC-SNPdb >90% of variants present exist at a minor allele frequency≤5% (Figure 2B).Figure 2.


TMC-SNPdb: an Indian germline variant database derived from whole exome sequences.

Upadhyay P, Gardi N, Desai S, Sahoo B, Singh A, Togar T, Iyer P, Prasad R, Chandrani P, Gupta S, Dutt A - Database (Oxford) (2016)

Overall overview of characteristic features of TMC-SNP database. (A) Circle plot of coding and non-coding variants obtained in the dataset. (B) Percent minor allele frequency distribution of variants in ‘TMC-SNPdb’ across 62 normal samples. Percentage frequencies are presented on the top of each bar. (C) Genome-wide distribution of percent frequency of variants obtained in each chromosome as compared with dbSNP database.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4940432&req=5

baw104-F2: Overall overview of characteristic features of TMC-SNP database. (A) Circle plot of coding and non-coding variants obtained in the dataset. (B) Percent minor allele frequency distribution of variants in ‘TMC-SNPdb’ across 62 normal samples. Percentage frequencies are presented on the top of each bar. (C) Genome-wide distribution of percent frequency of variants obtained in each chromosome as compared with dbSNP database.
Mentions: A total 114 309 variants were annotated using Oncotator for functional features (25). A distribution pattern of coding (∼17 973) and non-coding variants germline variants (∼96 336) is shown in Figure 2A. Of 17 973 coding variants, 11 466 were of non-synonymous (NS) (∼63%) and 6507 were synonymous variants (S) (∼36%) with NS/S ratio 1.76, consistent with previous reports for exome data from normal samples (27, 28). Furthermore, we observed a high frequency of missense (∼58%) and silent variants (∼30%) as compared with indel (∼3%), nonsense (∼2%) and splice site (∼6%) region (Supplementary Figure 1A). Of all the SNPs present in TMC-SNPdb, distribution varied across the genome as follows: protein-coding exon (15.7%), intron (40%), IGR (25.8%), 3′UTR (9.5%), 5′UTR (2.37%), RNA (3.74%) and lincRNA (1.7%), consistent with earlier report from exome sequencing data (Supplementary Figure S1B) (29, 30). Next, we computed the allele frequency of all 114 309 variants present in the TMC-SNPdb, across 62 samples. Given that TMC-SNPdb predominantly enlists low frequency germline variants prevalent among Indian population, similar to 1000 genomes and ExAC wherein about 99% of SNPs are estimated to have a minor allele frequency over 1% (8, 26), Similarly, in TMC-SNPdb >90% of variants present exist at a minor allele frequency≤5% (Figure 2B).Figure 2.

Bottom Line: The current build of dbSNP, the most comprehensive public SNP database, however inadequately represents several non-European Caucasian populations, posing a limitation in cancer genomic analyses of data from these populations.Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that TMC-SNPdb could deplete 42, 33 and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively.Beyond cancer somatic analyses, we anticipate utility of the TMC-SNPdb in several Mendelian germline diseases.

View Article: PubMed Central - PubMed

Affiliation: Integrated Genomics Laboratory, Advanced Centre for Treatment Research Education in Cancer (ACTREC).

No MeSH data available.


Related in: MedlinePlus