Limits...
Classification of human cancers based on DNA copy number amplification modeling.

Myllykangas S, Tikka J, Böhling T, Knuutila S, Hollmén J - BMC Med Genomics (2008)

Bottom Line: The distribution of classification terms in the amplification-model based clustering of cancer cases revealed cancer classes that were associated with specific DNA copy number amplification models.The boundaries of amplification patterns were shown to be enriched with fragile sites, telomeres, centromeres, and light chromosome bands.Furthermore, statistical evidence showed that specific chromosomal features co-localize with amplification breakpoints and link them in the amplification process.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Pathology, Haartman Institute and HUSLAB, University of Helsinki and Helsinki University Central Hospital, P,O, Box 21, FI-00014, University of Helsinki, Helsinki, Finland. samuel.myllykangas@helsinki.fi

ABSTRACT

Background: DNA amplifications alter gene dosage in cancer genomes by multiplying the gene copy number. Amplifications are quintessential in a considerable number of advanced cancers of various anatomical locations. The aims of this study were to classify human cancers based on their amplification patterns, explore the biological and clinical fundamentals behind their amplification-pattern based classification, and understand the characteristics in human genomic architecture that associate with amplification mechanisms.

Methods: We applied a machine learning approach to model DNA copy number amplifications using a data set of binary amplification records at chromosome sub-band resolution from 4400 cases that represent 82 cancer types. Amplification data was fused with background data: clinical, histological and biological classifications, and cytogenetic annotations. Statistical hypothesis testing was used to mine associations between the data sets.

Results: Probabilistic clustering of each chromosome identified 111 amplification models and divided the cancer cases into clusters. The distribution of classification terms in the amplification-model based clustering of cancer cases revealed cancer classes that were associated with specific DNA copy number amplification models. Amplification patterns - finite or bounded descriptions of the ranges of the amplifications in the chromosome - were extracted from the clustered data and expressed according to the original cytogenetic nomenclature. This was achieved by maximal frequent itemset mining using the cluster-specific data sets. The boundaries of amplification patterns were shown to be enriched with fragile sites, telomeres, centromeres, and light chromosome bands.

Conclusions: Our results demonstrate that amplifications are non-random chromosomal changes and specifically selected in tumor tissue microenvironment. Furthermore, statistical evidence showed that specific chromosomal features co-localize with amplification breakpoints and link them in the amplification process.

No MeSH data available.


Related in: MedlinePlus

Etiological factors of cancers. Etiological data was compiled from the WHO sources [9]. Each row describes a cancer type and the etiological factors that have been associated with it (indicated by black boxes). Cancer type rows and etiological factor columns are sorted according to hierarchical clustering. Between groups-linkage method and Squared Euclidean distance measure for binary classification terms were used in clustering.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2397431&req=5

Figure 4: Etiological factors of cancers. Etiological data was compiled from the WHO sources [9]. Each row describes a cancer type and the etiological factors that have been associated with it (indicated by black boxes). Cancer type rows and etiological factor columns are sorted according to hierarchical clustering. Between groups-linkage method and Squared Euclidean distance measure for binary classification terms were used in clustering.

Mentions: Classification data of 95 human neoplasms were collected from the literature [see Additional file 1]. The analysis was restricted to malignant cancers of 82 different cancer types. Figure 3 presents the classification distribution based on cell lineages, age, and gender. Classification based on cell lineage contained anatomical system, organ, tissue, differentiation, and embryonic lineages. These attributes were divided into classification terms, e.g., nervous system (anatomical system), brain (organ), and glioma (cell). The classification terms can partially overlap with different attributes. Differentiation lineage (e.g., adenocarcinoma) refers to the histological type of the malignancy. Embryonic lineage divides cases into four main developmental compartments: epithelial, mesenchymal, hematopoietic, and neuroepithelial. The clinical attributes were age (pediatric, young adults, and adults) and gender specifications. In addition, 19 different etiological factors were collected (Figure 4). In all, 29 attributes and 100 classification terms were accumulated. Classification terms were appointed for cancers as primary data of individual cases was not available in the amplification data compilation. The compilation of DNA copy number amplification data was revised regarding the new annotations [12].


Classification of human cancers based on DNA copy number amplification modeling.

Myllykangas S, Tikka J, Böhling T, Knuutila S, Hollmén J - BMC Med Genomics (2008)

Etiological factors of cancers. Etiological data was compiled from the WHO sources [9]. Each row describes a cancer type and the etiological factors that have been associated with it (indicated by black boxes). Cancer type rows and etiological factor columns are sorted according to hierarchical clustering. Between groups-linkage method and Squared Euclidean distance measure for binary classification terms were used in clustering.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2397431&req=5

Figure 4: Etiological factors of cancers. Etiological data was compiled from the WHO sources [9]. Each row describes a cancer type and the etiological factors that have been associated with it (indicated by black boxes). Cancer type rows and etiological factor columns are sorted according to hierarchical clustering. Between groups-linkage method and Squared Euclidean distance measure for binary classification terms were used in clustering.
Mentions: Classification data of 95 human neoplasms were collected from the literature [see Additional file 1]. The analysis was restricted to malignant cancers of 82 different cancer types. Figure 3 presents the classification distribution based on cell lineages, age, and gender. Classification based on cell lineage contained anatomical system, organ, tissue, differentiation, and embryonic lineages. These attributes were divided into classification terms, e.g., nervous system (anatomical system), brain (organ), and glioma (cell). The classification terms can partially overlap with different attributes. Differentiation lineage (e.g., adenocarcinoma) refers to the histological type of the malignancy. Embryonic lineage divides cases into four main developmental compartments: epithelial, mesenchymal, hematopoietic, and neuroepithelial. The clinical attributes were age (pediatric, young adults, and adults) and gender specifications. In addition, 19 different etiological factors were collected (Figure 4). In all, 29 attributes and 100 classification terms were accumulated. Classification terms were appointed for cancers as primary data of individual cases was not available in the amplification data compilation. The compilation of DNA copy number amplification data was revised regarding the new annotations [12].

Bottom Line: The distribution of classification terms in the amplification-model based clustering of cancer cases revealed cancer classes that were associated with specific DNA copy number amplification models.The boundaries of amplification patterns were shown to be enriched with fragile sites, telomeres, centromeres, and light chromosome bands.Furthermore, statistical evidence showed that specific chromosomal features co-localize with amplification breakpoints and link them in the amplification process.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Pathology, Haartman Institute and HUSLAB, University of Helsinki and Helsinki University Central Hospital, P,O, Box 21, FI-00014, University of Helsinki, Helsinki, Finland. samuel.myllykangas@helsinki.fi

ABSTRACT

Background: DNA amplifications alter gene dosage in cancer genomes by multiplying the gene copy number. Amplifications are quintessential in a considerable number of advanced cancers of various anatomical locations. The aims of this study were to classify human cancers based on their amplification patterns, explore the biological and clinical fundamentals behind their amplification-pattern based classification, and understand the characteristics in human genomic architecture that associate with amplification mechanisms.

Methods: We applied a machine learning approach to model DNA copy number amplifications using a data set of binary amplification records at chromosome sub-band resolution from 4400 cases that represent 82 cancer types. Amplification data was fused with background data: clinical, histological and biological classifications, and cytogenetic annotations. Statistical hypothesis testing was used to mine associations between the data sets.

Results: Probabilistic clustering of each chromosome identified 111 amplification models and divided the cancer cases into clusters. The distribution of classification terms in the amplification-model based clustering of cancer cases revealed cancer classes that were associated with specific DNA copy number amplification models. Amplification patterns - finite or bounded descriptions of the ranges of the amplifications in the chromosome - were extracted from the clustered data and expressed according to the original cytogenetic nomenclature. This was achieved by maximal frequent itemset mining using the cluster-specific data sets. The boundaries of amplification patterns were shown to be enriched with fragile sites, telomeres, centromeres, and light chromosome bands.

Conclusions: Our results demonstrate that amplifications are non-random chromosomal changes and specifically selected in tumor tissue microenvironment. Furthermore, statistical evidence showed that specific chromosomal features co-localize with amplification breakpoints and link them in the amplification process.

No MeSH data available.


Related in: MedlinePlus