Limits...
The CNVrd2 package: measurement of copy number at complex loci using high-throughput sequencing data.

Nguyen HT, Merriman TR, Black MA - Front Genet (2014)

Bottom Line: The CNVrd2 method first uses observed read-count ratios to refine segmentation results in one population.The performance of CNVrd2 was compared to that of two other read depth-based methods (CNVnator, cn.mops) at the CCL3L1 and DEFB103A loci.The highest concordance with the paralog ratio test method was observed for CNVrd2 (77.8/90.4% for CNVrd2, 36.7/4.8% for cn.mops and 7.2/1% for CNVnator at CCL3L1 and DEF103A).

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, University of Otago Dunedin, New Zealand ; Department of Mathematics and Statistics, University of Otago Dunedin, New Zealand ; Department of Biochemistry, Virtual Institute of Statistical Genetics, University of Otago Dunedin, New Zealand.

ABSTRACT
Recent advances in high-throughout sequencing technologies have made it possible to accurately assign copy number (CN) at CN variable loci. However, current analytic methods often perform poorly in regions in which complex CN variation is observed. Here we report the development of a read depth-based approach, CNVrd2, for investigation of CN variation using high-throughput sequencing data. This methodology was developed using data from the 1000 Genomes Project from the CCL3L1 locus, and tested using data from the DEFB103A locus. In both cases, samples were selected for which paralog ratio test data were also available for comparison. The CNVrd2 method first uses observed read-count ratios to refine segmentation results in one population. Then a linear regression model is applied to adjust the results across multiple populations, in combination with a Bayesian normal mixture model to cluster segmentation scores into groups for individual CN counts. The performance of CNVrd2 was compared to that of two other read depth-based methods (CNVnator, cn.mops) at the CCL3L1 and DEFB103A loci. The highest concordance with the paralog ratio test method was observed for CNVrd2 (77.8/90.4% for CNVrd2, 36.7/4.8% for cn.mops and 7.2/1% for CNVnator at CCL3L1 and DEF103A). CNVrd2 is available as an R package as part of the Bioconductor project: http://www.bioconductor.org/packages/release/bioc/html/CNVrd2.html.

No MeSH data available.


Related in: MedlinePlus

Comparison of copy number assignments of high-throughput sequencing-based with PRT-based methods. (A)CCL3L1 on 180 samples [only 111 samples measured by Sudmant et al. (2010) overlapped]. (B)DEFB103A on 104 samples.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4117933&req=5

Figure 3: Comparison of copy number assignments of high-throughput sequencing-based with PRT-based methods. (A)CCL3L1 on 180 samples [only 111 samples measured by Sudmant et al. (2010) overlapped]. (B)DEFB103A on 104 samples.

Mentions: Copy number assignments at CCL3L1 by cn.Mops (Klambauer et al., 2012), CNVnator (Abyzov et al., 2011) and Sudmant et al. (2010) were compared to assignments of CNVrd2 on the 180 samples measured by the modified PRT. To run CNVnator, we downloaded all of the chromosome 17 data for the 180 samples. CNVnator was applied as previously described (Nguyen et al., 2013). cn.Mops uses read-count information of single samples and multiple samples to detect CN variable regions and uses a Poisson mixture model to automatically infer CN. We used CNVrd2 to obtain matrices of read counts in the 2 Mb region (chr17:33670000-34670000) in eight different scenarios: constant windows of 25000, 20000, 10000, 5000, 2000, 1000, 500, and 200 bp with default values for other parameters. The 5000 bp window for cn.Mops had the highest proportion of samples having CN > 3 and the highest concordance with other methods and was used in the Figure 3 comparison. The Sudmant et al. (2010) read count-based method utilized all possible mapping locations of a read combined with singly unique nucleotide positions to measure CN for 169 samples from the 1000 Genomes Project at different loci, including the CCL3L1 gene. The CCL3L1 gene results were validated by using Q-PCR based assays with high correlation being observed (r = 0.95) (Sudmant et al., 2010). 111 of the 169 samples overlapped with the samples analyzed here using CNVrd2, and by Carpenter et al. (2011) using the PRT-based methods. The coordinates of CCL3L1 in the Sudmant et al. (2010) data were chr17:34623842-34625730 (hg19).


The CNVrd2 package: measurement of copy number at complex loci using high-throughput sequencing data.

Nguyen HT, Merriman TR, Black MA - Front Genet (2014)

Comparison of copy number assignments of high-throughput sequencing-based with PRT-based methods. (A)CCL3L1 on 180 samples [only 111 samples measured by Sudmant et al. (2010) overlapped]. (B)DEFB103A on 104 samples.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4117933&req=5

Figure 3: Comparison of copy number assignments of high-throughput sequencing-based with PRT-based methods. (A)CCL3L1 on 180 samples [only 111 samples measured by Sudmant et al. (2010) overlapped]. (B)DEFB103A on 104 samples.
Mentions: Copy number assignments at CCL3L1 by cn.Mops (Klambauer et al., 2012), CNVnator (Abyzov et al., 2011) and Sudmant et al. (2010) were compared to assignments of CNVrd2 on the 180 samples measured by the modified PRT. To run CNVnator, we downloaded all of the chromosome 17 data for the 180 samples. CNVnator was applied as previously described (Nguyen et al., 2013). cn.Mops uses read-count information of single samples and multiple samples to detect CN variable regions and uses a Poisson mixture model to automatically infer CN. We used CNVrd2 to obtain matrices of read counts in the 2 Mb region (chr17:33670000-34670000) in eight different scenarios: constant windows of 25000, 20000, 10000, 5000, 2000, 1000, 500, and 200 bp with default values for other parameters. The 5000 bp window for cn.Mops had the highest proportion of samples having CN > 3 and the highest concordance with other methods and was used in the Figure 3 comparison. The Sudmant et al. (2010) read count-based method utilized all possible mapping locations of a read combined with singly unique nucleotide positions to measure CN for 169 samples from the 1000 Genomes Project at different loci, including the CCL3L1 gene. The CCL3L1 gene results were validated by using Q-PCR based assays with high correlation being observed (r = 0.95) (Sudmant et al., 2010). 111 of the 169 samples overlapped with the samples analyzed here using CNVrd2, and by Carpenter et al. (2011) using the PRT-based methods. The coordinates of CCL3L1 in the Sudmant et al. (2010) data were chr17:34623842-34625730 (hg19).

Bottom Line: The CNVrd2 method first uses observed read-count ratios to refine segmentation results in one population.The performance of CNVrd2 was compared to that of two other read depth-based methods (CNVnator, cn.mops) at the CCL3L1 and DEFB103A loci.The highest concordance with the paralog ratio test method was observed for CNVrd2 (77.8/90.4% for CNVrd2, 36.7/4.8% for cn.mops and 7.2/1% for CNVnator at CCL3L1 and DEF103A).

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, University of Otago Dunedin, New Zealand ; Department of Mathematics and Statistics, University of Otago Dunedin, New Zealand ; Department of Biochemistry, Virtual Institute of Statistical Genetics, University of Otago Dunedin, New Zealand.

ABSTRACT
Recent advances in high-throughout sequencing technologies have made it possible to accurately assign copy number (CN) at CN variable loci. However, current analytic methods often perform poorly in regions in which complex CN variation is observed. Here we report the development of a read depth-based approach, CNVrd2, for investigation of CN variation using high-throughput sequencing data. This methodology was developed using data from the 1000 Genomes Project from the CCL3L1 locus, and tested using data from the DEFB103A locus. In both cases, samples were selected for which paralog ratio test data were also available for comparison. The CNVrd2 method first uses observed read-count ratios to refine segmentation results in one population. Then a linear regression model is applied to adjust the results across multiple populations, in combination with a Bayesian normal mixture model to cluster segmentation scores into groups for individual CN counts. The performance of CNVrd2 was compared to that of two other read depth-based methods (CNVnator, cn.mops) at the CCL3L1 and DEFB103A loci. The highest concordance with the paralog ratio test method was observed for CNVrd2 (77.8/90.4% for CNVrd2, 36.7/4.8% for cn.mops and 7.2/1% for CNVnator at CCL3L1 and DEF103A). CNVrd2 is available as an R package as part of the Bioconductor project: http://www.bioconductor.org/packages/release/bioc/html/CNVrd2.html.

No MeSH data available.


Related in: MedlinePlus