Limits...
Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data.

Mosén-Ansorena D, Aransay AM, Rodríguez-Ezpeleta N - BMC Bioinformatics (2012)

Bottom Line: This supports the viability of approaches other than the common hidden Markov model (HMM)-based.We devised and implemented a comprehensive model to generate data that simulate tumoural samples genotyped using SNP arrays.The validity of the model is supported by the similarity of the results obtained with synthetic and real data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Genome Analysis Platform, CIC bioGUNE-CIBERehd, Technologic Park of Bizkaia, Building 502, 48160 Derio, Spain. dmosen.gn@cicbiogune.es

ABSTRACT

Background: The detection of genomic copy number alterations (CNA) in cancer based on SNP arrays requires methods that take into account tumour specific factors such as normal cell contamination and tumour heterogeneity. A number of tools have been recently developed but their performance needs yet to be thoroughly assessed. To this aim, a comprehensive model that integrates the factors of normal cell contamination and intra-tumour heterogeneity and that can be translated to synthetic data on which to perform benchmarks is indispensable.

Results: We propose such model and implement it in an R package called CnaGen to synthetically generate a wide range of alterations under different normal cell contamination levels. Six recently published methods for CNA and loss of heterozygosity (LOH) detection on tumour samples were assessed on this synthetic data and on a dilution series of a breast cancer cell-line: ASCAT, GAP, GenoCNA, GPHMM, MixHMM and OncoSNP. We report the recall rates in terms of normal cell contamination levels and alteration characteristics: length, copy number and LOH state, as well as the false discovery rate distribution for each copy number under different normal cell contamination levels.Assessed methods are in general better at detecting alterations with low copy number and under a little normal cell contamination levels. All methods except GPHMM, which failed to recognize the alteration pattern in the cell-line samples, provided similar results for the synthetic and cell-line sample sets. MixHMM and GenoCNA are the poorliest performing methods, while GAP generally performed better. This supports the viability of approaches other than the common hidden Markov model (HMM)-based.

Conclusions: We devised and implemented a comprehensive model to generate data that simulate tumoural samples genotyped using SNP arrays. The validity of the model is supported by the similarity of the results obtained with synthetic and real data. Based on these results and on the software implementation of the methods, we recommend GAP for advanced users and GPHMM for a fully driven analysis.

Show MeSH

Related in: MedlinePlus

Recall rates by method, contamination, and alteration copy number and length. (A) Recall rates (y-axis) of each of the assessed methods, calculated by contamination over each of the 5 synthetic sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey). (B) Recall rates (y-axis) of each of the assessed methods, calculated by contamination and alteration copy number over each of the 5 synthetic sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3472297&req=5

Figure 3: Recall rates by method, contamination, and alteration copy number and length. (A) Recall rates (y-axis) of each of the assessed methods, calculated by contamination over each of the 5 synthetic sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey). (B) Recall rates (y-axis) of each of the assessed methods, calculated by contamination and alteration copy number over each of the 5 synthetic sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey).

Mentions: To determine the effect of the different factors tested in each method's performance, recall rates were plotted against the different values tested for copy number and length (see Figure 3 and Additional file 3). Graphs were grouped by sample pattern and normal cell contamination level. Recall of LOH status was assessed by regarding correct calls as those that matched not only copy number but also LOH status (lack or presence of LOH, regardless of whether germline or somatic), and similar graphs were generated under this criterion (Additional file 4). Methods tested include an updated version of GAP released in September, 2011 (named "updated GAP" in the following). See Additional file 5 for specifications and parameters used on each method.


Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data.

Mosén-Ansorena D, Aransay AM, Rodríguez-Ezpeleta N - BMC Bioinformatics (2012)

Recall rates by method, contamination, and alteration copy number and length. (A) Recall rates (y-axis) of each of the assessed methods, calculated by contamination over each of the 5 synthetic sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey). (B) Recall rates (y-axis) of each of the assessed methods, calculated by contamination and alteration copy number over each of the 5 synthetic sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3472297&req=5

Figure 3: Recall rates by method, contamination, and alteration copy number and length. (A) Recall rates (y-axis) of each of the assessed methods, calculated by contamination over each of the 5 synthetic sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey). (B) Recall rates (y-axis) of each of the assessed methods, calculated by contamination and alteration copy number over each of the 5 synthetic sample sets. Colour code: GAP (orange), updated GAP (golden), ASCAT (purple), GPHMM (black), OncoSNP (blue), GenoCNA (green), MixHMM (grey).
Mentions: To determine the effect of the different factors tested in each method's performance, recall rates were plotted against the different values tested for copy number and length (see Figure 3 and Additional file 3). Graphs were grouped by sample pattern and normal cell contamination level. Recall of LOH status was assessed by regarding correct calls as those that matched not only copy number but also LOH status (lack or presence of LOH, regardless of whether germline or somatic), and similar graphs were generated under this criterion (Additional file 4). Methods tested include an updated version of GAP released in September, 2011 (named "updated GAP" in the following). See Additional file 5 for specifications and parameters used on each method.

Bottom Line: This supports the viability of approaches other than the common hidden Markov model (HMM)-based.We devised and implemented a comprehensive model to generate data that simulate tumoural samples genotyped using SNP arrays.The validity of the model is supported by the similarity of the results obtained with synthetic and real data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Genome Analysis Platform, CIC bioGUNE-CIBERehd, Technologic Park of Bizkaia, Building 502, 48160 Derio, Spain. dmosen.gn@cicbiogune.es

ABSTRACT

Background: The detection of genomic copy number alterations (CNA) in cancer based on SNP arrays requires methods that take into account tumour specific factors such as normal cell contamination and tumour heterogeneity. A number of tools have been recently developed but their performance needs yet to be thoroughly assessed. To this aim, a comprehensive model that integrates the factors of normal cell contamination and intra-tumour heterogeneity and that can be translated to synthetic data on which to perform benchmarks is indispensable.

Results: We propose such model and implement it in an R package called CnaGen to synthetically generate a wide range of alterations under different normal cell contamination levels. Six recently published methods for CNA and loss of heterozygosity (LOH) detection on tumour samples were assessed on this synthetic data and on a dilution series of a breast cancer cell-line: ASCAT, GAP, GenoCNA, GPHMM, MixHMM and OncoSNP. We report the recall rates in terms of normal cell contamination levels and alteration characteristics: length, copy number and LOH state, as well as the false discovery rate distribution for each copy number under different normal cell contamination levels.Assessed methods are in general better at detecting alterations with low copy number and under a little normal cell contamination levels. All methods except GPHMM, which failed to recognize the alteration pattern in the cell-line samples, provided similar results for the synthetic and cell-line sample sets. MixHMM and GenoCNA are the poorliest performing methods, while GAP generally performed better. This supports the viability of approaches other than the common hidden Markov model (HMM)-based.

Conclusions: We devised and implemented a comprehensive model to generate data that simulate tumoural samples genotyped using SNP arrays. The validity of the model is supported by the similarity of the results obtained with synthetic and real data. Based on these results and on the software implementation of the methods, we recommend GAP for advanced users and GPHMM for a fully driven analysis.

Show MeSH
Related in: MedlinePlus