Limits...
A Single-Array-Based Method for Detecting Copy Number Variants Using Affymetrix High Density SNP Arrays and its Application to Breast Cancer.

Li M, Wen Y, Fu W - Cancer Inform (2015)

Bottom Line: This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects.In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise.Through simulations, we show our refined method is able to infer copy number variants accurately.

View Article: PubMed Central - PubMed

Affiliation: Division of Biostatistics, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.

ABSTRACT
Cumulative evidence has shown that structural variations, due to insertions, deletions, and inversions of DNA, may contribute considerably to the development of complex human diseases, such as breast cancer. High-throughput genotyping technologies, such as Affymetrix high density single-nucleotide polymorphism (SNP) arrays, have produced large amounts of genetic data for genome-wide SNP genotype calling and copy number estimation. Meanwhile, there is a great need for accurate and efficient statistical methods to detect copy number variants. In this article, we introduce a hidden-Markov-model (HMM)-based method, referred to as the PICR-CNV, for copy number inference. The proposed method first estimates copy number abundance for each single SNP on a single array based on the raw fluorescence values, and then standardizes the estimated copy number abundance to achieve equal footing among multiple arrays. This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects. In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise. Through simulations, we show our refined method is able to infer copy number variants accurately. Application of the proposed method to a breast cancer dataset helps to identify genomic regions significantly associated with the disease.

No MeSH data available.


Related in: MedlinePlus

Raw and standardized copy number abundance for a randomly selected HapMap sample (NA12892).
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4519351&req=5

f2-cin-suppl.4-2014-095: Raw and standardized copy number abundance for a randomly selected HapMap sample (NA12892).

Mentions: In this study, we have proposed an HMM-based method (PICR-CNV) for copy number inference. Through simulations, we have shown that the proposed method is highly accurate for copy number inference and robust against mis-specification of the predetermined model parameter. While it is not straightforward to evaluate the copy number inference with real data due to the unknown copy number status, we have evaluated the proposed standardization approach for genotyping accuracy. We applied PICR to 90 HapMap samples with Affymetrix Mapping 100K arrays, and found that the genotyping accuracies were improved by using standardized copy number abundance compared to using raw copy number abundance (99.70% vs 99.63%). Empirically, we also found that the standardized copy number abundance provided better genotype clustering than its alternative (Fig. 2). The proposed method was further illustrated with an application to breast cancer datasets. The analysis of breast cancer data also identified a few genomic regions that were significantly associated with breast cancer development. Most of these identified regions have been reported in the literature for potential involvement in breast cancer. One SNP in the region 4q31.23 has been recently reported to be significantly associated with breast cancer progression.39 A gene ARHGAP10-NR3C2, which was located in the region, was also known to be related to carcinogenesis through structure alteration.40 Possible copy number changes of the region were also observed from cancer cell line data.41 Regions 1p21.1 and 10q21.1 have also been reported repeatedly for potential association with breast cancer. Chromosome arm 1p was suggested to contain multiple tumor suppressor genes.42 Structure alterations of 1p21.1 have been observed from many studies.42–45 Region 10q21.1 also has multiple candidate tumor suppressors, such as ANX7 and CDC2.46,47 Interestingly, for region 6q22.33, it was identified by the initial GWAS as a novel locus for breast cancer development.37 Our analysis also confirmed this finding and also suggested that the copy number changes in the region may also play an important role.


A Single-Array-Based Method for Detecting Copy Number Variants Using Affymetrix High Density SNP Arrays and its Application to Breast Cancer.

Li M, Wen Y, Fu W - Cancer Inform (2015)

Raw and standardized copy number abundance for a randomly selected HapMap sample (NA12892).
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4519351&req=5

f2-cin-suppl.4-2014-095: Raw and standardized copy number abundance for a randomly selected HapMap sample (NA12892).
Mentions: In this study, we have proposed an HMM-based method (PICR-CNV) for copy number inference. Through simulations, we have shown that the proposed method is highly accurate for copy number inference and robust against mis-specification of the predetermined model parameter. While it is not straightforward to evaluate the copy number inference with real data due to the unknown copy number status, we have evaluated the proposed standardization approach for genotyping accuracy. We applied PICR to 90 HapMap samples with Affymetrix Mapping 100K arrays, and found that the genotyping accuracies were improved by using standardized copy number abundance compared to using raw copy number abundance (99.70% vs 99.63%). Empirically, we also found that the standardized copy number abundance provided better genotype clustering than its alternative (Fig. 2). The proposed method was further illustrated with an application to breast cancer datasets. The analysis of breast cancer data also identified a few genomic regions that were significantly associated with breast cancer development. Most of these identified regions have been reported in the literature for potential involvement in breast cancer. One SNP in the region 4q31.23 has been recently reported to be significantly associated with breast cancer progression.39 A gene ARHGAP10-NR3C2, which was located in the region, was also known to be related to carcinogenesis through structure alteration.40 Possible copy number changes of the region were also observed from cancer cell line data.41 Regions 1p21.1 and 10q21.1 have also been reported repeatedly for potential association with breast cancer. Chromosome arm 1p was suggested to contain multiple tumor suppressor genes.42 Structure alterations of 1p21.1 have been observed from many studies.42–45 Region 10q21.1 also has multiple candidate tumor suppressors, such as ANX7 and CDC2.46,47 Interestingly, for region 6q22.33, it was identified by the initial GWAS as a novel locus for breast cancer development.37 Our analysis also confirmed this finding and also suggested that the copy number changes in the region may also play an important role.

Bottom Line: This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects.In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise.Through simulations, we show our refined method is able to infer copy number variants accurately.

View Article: PubMed Central - PubMed

Affiliation: Division of Biostatistics, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.

ABSTRACT
Cumulative evidence has shown that structural variations, due to insertions, deletions, and inversions of DNA, may contribute considerably to the development of complex human diseases, such as breast cancer. High-throughput genotyping technologies, such as Affymetrix high density single-nucleotide polymorphism (SNP) arrays, have produced large amounts of genetic data for genome-wide SNP genotype calling and copy number estimation. Meanwhile, there is a great need for accurate and efficient statistical methods to detect copy number variants. In this article, we introduce a hidden-Markov-model (HMM)-based method, referred to as the PICR-CNV, for copy number inference. The proposed method first estimates copy number abundance for each single SNP on a single array based on the raw fluorescence values, and then standardizes the estimated copy number abundance to achieve equal footing among multiple arrays. This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects. In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise. Through simulations, we show our refined method is able to infer copy number variants accurately. Application of the proposed method to a breast cancer dataset helps to identify genomic regions significantly associated with the disease.

No MeSH data available.


Related in: MedlinePlus