Limits...
A computational framework discovers new copy number variants with functional importance.

Banerjee S, Oldridge D, Poptsova M, Hussain WM, Chakravarty D, Demichelis F - PLoS ONE (2011)

Bottom Line: Using data generated for a subset of individuals by a 42 million marker platform, we validated the majority of the variants with the highest validation rate (66.7%) was for variants of size larger than 1 kb.We investigated the possible enrichment for variant's regulatory effect and found that smaller variants (<1 Kb) are more likely to regulate gene transcript than larger variants (p-value = 2.04e-08).Our results support the validity of the computational framework to detect novel variants relevant to disease susceptibility studies and provide evidence of the importance of genetic variants in regulatory network studies.

View Article: PubMed Central - PubMed

Affiliation: Department of Public Health, Weill Cornell Medical College, New York, New York, United States of America.

ABSTRACT
Structural variants which cause changes in copy numbers constitute an important component of genomic variability. They account for 0.7% of genomic differences in two individual genomes, of which copy number variants (CNVs) are the largest component. A recent population-based CNV study revealed the need of better characterization of CNVs, especially the small ones (<500 bp).We propose a three step computational framework (Identification of germline Changes in Copy Number or IgC2N) to discover and genotype germline CNVs. First, we detect candidate CNV loci by combining information across multiple samples without imposing restrictions to the number of coverage markers or to the variant size. Secondly, we fine tune the detection of rare variants and infer the putative copy number classes for each locus. Last, for each variant we combine the relative distance between consecutive copy number classes with genetic information in a novel attempt to estimate the reference model bias. This computational approach is applied to genome-wide data from 1250 HapMap individuals. Novel variants were discovered and characterized in terms of size, minor allele frequency, type of polymorphism (gains, losses or both), and mechanism of formation. Using data generated for a subset of individuals by a 42 million marker platform, we validated the majority of the variants with the highest validation rate (66.7%) was for variants of size larger than 1 kb. Finally, we queried transcriptomic data from 129 individuals determined by RNA-sequencing as further validation and to assess the functional role of the new variants. We investigated the possible enrichment for variant's regulatory effect and found that smaller variants (<1 Kb) are more likely to regulate gene transcript than larger variants (p-value = 2.04e-08). Our results support the validity of the computational framework to detect novel variants relevant to disease susceptibility studies and provide evidence of the importance of genetic variants in regulatory network studies.

Show MeSH

Related in: MedlinePlus

Functional impact of CNVs on human transcriptome.(A) Proportion of functional variants with respect to variant size and type of polymorphisms. Percentages are evaluated with respect to each subclass. (B) Significance of associations with respect to gene-variant distance. The cis analysis included 2 Mb windows. Minus log 10 of the q-values are plotted against the distance between the mid points of genes and variants. Up and down arrows depict the direction of the association. Red symbols identify data points corresponding to the new CNVs. (C) List of top ranked associations involving new variant residing within protein coding regions. (D) Examples of new variants showing significant effect on gene transcript. mRNA levels are plotted against the copy number states of new variants identified by IgC2N (box plots) and against the copy number intensity ratios (scatter plots). P-values from the regression analysis against copy number states are reported.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3066184&req=5

pone-0017539-g004: Functional impact of CNVs on human transcriptome.(A) Proportion of functional variants with respect to variant size and type of polymorphisms. Percentages are evaluated with respect to each subclass. (B) Significance of associations with respect to gene-variant distance. The cis analysis included 2 Mb windows. Minus log 10 of the q-values are plotted against the distance between the mid points of genes and variants. Up and down arrows depict the direction of the association. Red symbols identify data points corresponding to the new CNVs. (C) List of top ranked associations involving new variant residing within protein coding regions. (D) Examples of new variants showing significant effect on gene transcript. mRNA levels are plotted against the copy number states of new variants identified by IgC2N (box plots) and against the copy number intensity ratios (scatter plots). P-values from the regression analysis against copy number states are reported.

Mentions: Focusing on CEU population datasets, we then investigated possible enrichment for CNVs regulatory effect with respect to MAF, genomic complexity (segmental duplication), mechanism of formation, variant size and type of polymorphisms within the cis analysis window. Small variants are more likely to regulate gene transcript than larger variants (p = 2.04e-08) with no preference in terms of type of polymorphism. Whereas, within variants of size >1 kb, variants involving copy number gains are overall more effective than deletions (p  = 8.5e-05) (Figure 4A). No clear pattern for CNV effect versus CNV-gene distance was observed, nor preference in terms of direct or inverse effect (Figure 4B).


A computational framework discovers new copy number variants with functional importance.

Banerjee S, Oldridge D, Poptsova M, Hussain WM, Chakravarty D, Demichelis F - PLoS ONE (2011)

Functional impact of CNVs on human transcriptome.(A) Proportion of functional variants with respect to variant size and type of polymorphisms. Percentages are evaluated with respect to each subclass. (B) Significance of associations with respect to gene-variant distance. The cis analysis included 2 Mb windows. Minus log 10 of the q-values are plotted against the distance between the mid points of genes and variants. Up and down arrows depict the direction of the association. Red symbols identify data points corresponding to the new CNVs. (C) List of top ranked associations involving new variant residing within protein coding regions. (D) Examples of new variants showing significant effect on gene transcript. mRNA levels are plotted against the copy number states of new variants identified by IgC2N (box plots) and against the copy number intensity ratios (scatter plots). P-values from the regression analysis against copy number states are reported.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3066184&req=5

pone-0017539-g004: Functional impact of CNVs on human transcriptome.(A) Proportion of functional variants with respect to variant size and type of polymorphisms. Percentages are evaluated with respect to each subclass. (B) Significance of associations with respect to gene-variant distance. The cis analysis included 2 Mb windows. Minus log 10 of the q-values are plotted against the distance between the mid points of genes and variants. Up and down arrows depict the direction of the association. Red symbols identify data points corresponding to the new CNVs. (C) List of top ranked associations involving new variant residing within protein coding regions. (D) Examples of new variants showing significant effect on gene transcript. mRNA levels are plotted against the copy number states of new variants identified by IgC2N (box plots) and against the copy number intensity ratios (scatter plots). P-values from the regression analysis against copy number states are reported.
Mentions: Focusing on CEU population datasets, we then investigated possible enrichment for CNVs regulatory effect with respect to MAF, genomic complexity (segmental duplication), mechanism of formation, variant size and type of polymorphisms within the cis analysis window. Small variants are more likely to regulate gene transcript than larger variants (p = 2.04e-08) with no preference in terms of type of polymorphism. Whereas, within variants of size >1 kb, variants involving copy number gains are overall more effective than deletions (p  = 8.5e-05) (Figure 4A). No clear pattern for CNV effect versus CNV-gene distance was observed, nor preference in terms of direct or inverse effect (Figure 4B).

Bottom Line: Using data generated for a subset of individuals by a 42 million marker platform, we validated the majority of the variants with the highest validation rate (66.7%) was for variants of size larger than 1 kb.We investigated the possible enrichment for variant's regulatory effect and found that smaller variants (<1 Kb) are more likely to regulate gene transcript than larger variants (p-value = 2.04e-08).Our results support the validity of the computational framework to detect novel variants relevant to disease susceptibility studies and provide evidence of the importance of genetic variants in regulatory network studies.

View Article: PubMed Central - PubMed

Affiliation: Department of Public Health, Weill Cornell Medical College, New York, New York, United States of America.

ABSTRACT
Structural variants which cause changes in copy numbers constitute an important component of genomic variability. They account for 0.7% of genomic differences in two individual genomes, of which copy number variants (CNVs) are the largest component. A recent population-based CNV study revealed the need of better characterization of CNVs, especially the small ones (<500 bp).We propose a three step computational framework (Identification of germline Changes in Copy Number or IgC2N) to discover and genotype germline CNVs. First, we detect candidate CNV loci by combining information across multiple samples without imposing restrictions to the number of coverage markers or to the variant size. Secondly, we fine tune the detection of rare variants and infer the putative copy number classes for each locus. Last, for each variant we combine the relative distance between consecutive copy number classes with genetic information in a novel attempt to estimate the reference model bias. This computational approach is applied to genome-wide data from 1250 HapMap individuals. Novel variants were discovered and characterized in terms of size, minor allele frequency, type of polymorphism (gains, losses or both), and mechanism of formation. Using data generated for a subset of individuals by a 42 million marker platform, we validated the majority of the variants with the highest validation rate (66.7%) was for variants of size larger than 1 kb. Finally, we queried transcriptomic data from 129 individuals determined by RNA-sequencing as further validation and to assess the functional role of the new variants. We investigated the possible enrichment for variant's regulatory effect and found that smaller variants (<1 Kb) are more likely to regulate gene transcript than larger variants (p-value = 2.04e-08). Our results support the validity of the computational framework to detect novel variants relevant to disease susceptibility studies and provide evidence of the importance of genetic variants in regulatory network studies.

Show MeSH
Related in: MedlinePlus