Limits...
Copy number variation analysis based on AluScan sequences.

Yang JF, Ding XF, Chen L, Mat WK, Xu MZ, Chen JF, Wang JM, Xu L, Poon WS, Kwong A, Leung GK, Tan TC, Yu CH, Ke YB, Xu XY, Ke XY, Ma RC, Chan JC, Wan WQ, Zhang LW, Kumar Y, Tsang SY, Li S, Wang HY, Xue H - J Clin Bioinforma (2014)

Bottom Line: The results obtained from non-cancer and cancerous tissues indicated that the AluScanCNV package can be employed to call localized, recurrent and extended CNVs from AluScan sequences.Moreover, both the localized and recurrent CNVs identified by this method could be subjected to machine-learning selection to yield distinguishing CNV-features that were capable of separating between liver cancers and other types of cancers.Since the method is applicable to any human DNA sample with or without the availability of a paired control, it can also be employed to analyze the constitutional CNVs of individuals.

View Article: PubMed Central - PubMed

Affiliation: Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.

ABSTRACT

Background: AluScan combines inter-Alu PCR using multiple Alu-based primers with opposite orientations and next-generation sequencing to capture a huge number of Alu-proximal genomic sequences for investigation. Its requirement of only sub-microgram quantities of DNA facilitates the examination of large numbers of samples. However, the special features of AluScan data rendered difficult the calling of copy number variation (CNV) directly using the calling algorithms designed for whole genome sequencing (WGS) or exome sequencing.

Results: In this study, an AluScanCNV package has been assembled for efficient CNV calling from AluScan sequencing data employing a Geary-Hinkley transformation (GHT) of read-depth ratios between either paired test-control samples, or between test samples and a reference template constructed from reference samples, to call the localized CNVs, followed by use of a GISTIC-like algorithm to identify recurrent CNVs and circular binary segmentation (CBS) to reveal large extended CNVs. To evaluate the utility of CNVs called from AluScan data, the AluScans from 23 non-cancer and 38 cancer genomes were analyzed in this study. The glioma samples analyzed yielded the familiar extended copy-number losses on chromosomes 1p and 9. Also, the recurrent somatic CNVs identified from liver cancer samples were similar to those reported for liver cancer WGS with respect to a striking enrichment of copy-number gains in chromosomes 1q and 8q. When localized or recurrent CNV-features capable of distinguishing between liver and non-liver cancer samples were selected by correlation-based machine learning, a highly accurate separation of the liver and non-liver cancer classes was attained.

Conclusions: The results obtained from non-cancer and cancerous tissues indicated that the AluScanCNV package can be employed to call localized, recurrent and extended CNVs from AluScan sequences. Moreover, both the localized and recurrent CNVs identified by this method could be subjected to machine-learning selection to yield distinguishing CNV-features that were capable of separating between liver cancers and other types of cancers. Since the method is applicable to any human DNA sample with or without the availability of a paired control, it can also be employed to analyze the constitutional CNVs of individuals.

No MeSH data available.


Related in: MedlinePlus

Poisson binomial distribution of CNVs among samples. The frequency for any window is the percentage of total samples that display a CNV at that window, and the density is the fraction of all the windows analyzed that display a given frequency. Accordingly, CNVs that give rise to frequencies to the right of the cut-off frequency (indicated by red line) represent CNVs that occur at an exceptionally high percentage of samples with p <0.01, and are therefore regarded as recurrent CNVs. The curve shown was calculated using localized CNVs called from the AluScans of the 38 cancer samples in column 2 of Additional file 1: Table S1, in each case employing for comparison the 23-sample reference template.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4273479&req=5

Fig3: Poisson binomial distribution of CNVs among samples. The frequency for any window is the percentage of total samples that display a CNV at that window, and the density is the fraction of all the windows analyzed that display a given frequency. Accordingly, CNVs that give rise to frequencies to the right of the cut-off frequency (indicated by red line) represent CNVs that occur at an exceptionally high percentage of samples with p <0.01, and are therefore regarded as recurrent CNVs. The curve shown was calculated using localized CNVs called from the AluScans of the 38 cancer samples in column 2 of Additional file 1: Table S1, in each case employing for comparison the 23-sample reference template.

Mentions: where Fk is the set of all subsets of k integers encountered, A the set of matrix elements with value ‘1’, AC the set of matrix elements with value ‘0’, the frequency of ‘1’ elements in the samples and is the frequency of ‘0’ elements in the samples. Based on Eqn. 13, the ‘poibin’ package in R-program [29] is employed to calculate the cut-off frequency in the P(k) distribution that gives rise to p <0.01, which is the criterion for the identification of a recurrent CNV (Figure 3).Figure 3


Copy number variation analysis based on AluScan sequences.

Yang JF, Ding XF, Chen L, Mat WK, Xu MZ, Chen JF, Wang JM, Xu L, Poon WS, Kwong A, Leung GK, Tan TC, Yu CH, Ke YB, Xu XY, Ke XY, Ma RC, Chan JC, Wan WQ, Zhang LW, Kumar Y, Tsang SY, Li S, Wang HY, Xue H - J Clin Bioinforma (2014)

Poisson binomial distribution of CNVs among samples. The frequency for any window is the percentage of total samples that display a CNV at that window, and the density is the fraction of all the windows analyzed that display a given frequency. Accordingly, CNVs that give rise to frequencies to the right of the cut-off frequency (indicated by red line) represent CNVs that occur at an exceptionally high percentage of samples with p <0.01, and are therefore regarded as recurrent CNVs. The curve shown was calculated using localized CNVs called from the AluScans of the 38 cancer samples in column 2 of Additional file 1: Table S1, in each case employing for comparison the 23-sample reference template.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4273479&req=5

Fig3: Poisson binomial distribution of CNVs among samples. The frequency for any window is the percentage of total samples that display a CNV at that window, and the density is the fraction of all the windows analyzed that display a given frequency. Accordingly, CNVs that give rise to frequencies to the right of the cut-off frequency (indicated by red line) represent CNVs that occur at an exceptionally high percentage of samples with p <0.01, and are therefore regarded as recurrent CNVs. The curve shown was calculated using localized CNVs called from the AluScans of the 38 cancer samples in column 2 of Additional file 1: Table S1, in each case employing for comparison the 23-sample reference template.
Mentions: where Fk is the set of all subsets of k integers encountered, A the set of matrix elements with value ‘1’, AC the set of matrix elements with value ‘0’, the frequency of ‘1’ elements in the samples and is the frequency of ‘0’ elements in the samples. Based on Eqn. 13, the ‘poibin’ package in R-program [29] is employed to calculate the cut-off frequency in the P(k) distribution that gives rise to p <0.01, which is the criterion for the identification of a recurrent CNV (Figure 3).Figure 3

Bottom Line: The results obtained from non-cancer and cancerous tissues indicated that the AluScanCNV package can be employed to call localized, recurrent and extended CNVs from AluScan sequences.Moreover, both the localized and recurrent CNVs identified by this method could be subjected to machine-learning selection to yield distinguishing CNV-features that were capable of separating between liver cancers and other types of cancers.Since the method is applicable to any human DNA sample with or without the availability of a paired control, it can also be employed to analyze the constitutional CNVs of individuals.

View Article: PubMed Central - PubMed

Affiliation: Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.

ABSTRACT

Background: AluScan combines inter-Alu PCR using multiple Alu-based primers with opposite orientations and next-generation sequencing to capture a huge number of Alu-proximal genomic sequences for investigation. Its requirement of only sub-microgram quantities of DNA facilitates the examination of large numbers of samples. However, the special features of AluScan data rendered difficult the calling of copy number variation (CNV) directly using the calling algorithms designed for whole genome sequencing (WGS) or exome sequencing.

Results: In this study, an AluScanCNV package has been assembled for efficient CNV calling from AluScan sequencing data employing a Geary-Hinkley transformation (GHT) of read-depth ratios between either paired test-control samples, or between test samples and a reference template constructed from reference samples, to call the localized CNVs, followed by use of a GISTIC-like algorithm to identify recurrent CNVs and circular binary segmentation (CBS) to reveal large extended CNVs. To evaluate the utility of CNVs called from AluScan data, the AluScans from 23 non-cancer and 38 cancer genomes were analyzed in this study. The glioma samples analyzed yielded the familiar extended copy-number losses on chromosomes 1p and 9. Also, the recurrent somatic CNVs identified from liver cancer samples were similar to those reported for liver cancer WGS with respect to a striking enrichment of copy-number gains in chromosomes 1q and 8q. When localized or recurrent CNV-features capable of distinguishing between liver and non-liver cancer samples were selected by correlation-based machine learning, a highly accurate separation of the liver and non-liver cancer classes was attained.

Conclusions: The results obtained from non-cancer and cancerous tissues indicated that the AluScanCNV package can be employed to call localized, recurrent and extended CNVs from AluScan sequences. Moreover, both the localized and recurrent CNVs identified by this method could be subjected to machine-learning selection to yield distinguishing CNV-features that were capable of separating between liver cancers and other types of cancers. Since the method is applicable to any human DNA sample with or without the availability of a paired control, it can also be employed to analyze the constitutional CNVs of individuals.

No MeSH data available.


Related in: MedlinePlus