Limits...
Cancer progression modeling using static sample data.

Sun Y, Yao J, Nowak NJ, Goodison S - Genome Biol. (2014)

Bottom Line: We demonstrate the reliability of the method with simulated data, and describe the application to breast cancer data.Our findings support a linear, branching model for breast cancer progression.An interactive model facilitates the identification of key molecular events in the advance of disease to malignancy.

View Article: PubMed Central - PubMed

ABSTRACT
As molecular profiling data continues to accumulate, the design of integrative computational analyses that can provide insights into the dynamic aspects of cancer progression becomes feasible. Here, we present a novel computational method for the construction of cancer progression models based on the analysis of static tumor samples. We demonstrate the reliability of the method with simulated data, and describe the application to breast cancer data. Our findings support a linear, branching model for breast cancer progression. An interactive model facilitates the identification of key molecular events in the advance of disease to malignancy.

Show MeSH

Related in: MedlinePlus

Model constructed using the METABRIC dataset and its association with clinical and genetic variables.(A) Breast cancer progression tree. Each node represents a cluster and the node size is proportional to the number of samples in the corresponding cluster. Nodes are color-coded based on the PAM50 labels of the majority of the samples in the node. (B,C) Molecular grade and CNA frequency were highly correlated with the N-B (first column) and N-H (second column) progression branches. (D) Spearman’s rank correlation analysis of histological grade, molecular grade, mutation rate and patient age with the two main progression paths. The numbers in parenthesis are P values. CNA, copy number alteration; N-B, normal through luminal to basal phenotype; N-H, normal through luminal to HER2+ phenotype; TCGA, The Cancer Genome Atlas.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4196119&req=5

Fig5: Model constructed using the METABRIC dataset and its association with clinical and genetic variables.(A) Breast cancer progression tree. Each node represents a cluster and the node size is proportional to the number of samples in the corresponding cluster. Nodes are color-coded based on the PAM50 labels of the majority of the samples in the node. (B,C) Molecular grade and CNA frequency were highly correlated with the N-B (first column) and N-H (second column) progression branches. (D) Spearman’s rank correlation analysis of histological grade, molecular grade, mutation rate and patient age with the two main progression paths. The numbers in parenthesis are P values. CNA, copy number alteration; N-B, normal through luminal to basal phenotype; N-H, normal through luminal to HER2+ phenotype; TCGA, The Cancer Genome Atlas.

Mentions: The data visualization analysis provided an overview of data distribution, and informed the design of the novel two-pronged method used to model the cancer progression process formally. Application of the spectral clustering method [28] to the METABRIC data revealed 13 distinct clusters (Figure 4A). To promote a robust clustering assignment, a resampling-based consensus clustering analysis [32] was performed. From the generated consensus matrix shown in Figure 4B, we can clearly identify 13 diagonal blocks, which suggests that the clustering assignment is very stable. This result was further confirmed by silhouette width analysis. The clustering analysis classified 1,900 out of 1,989 (96%) samples with a positive silhouette width and yielded an average silhouette width of 0.47 (Figure 4C). Cluster 11 contains only three samples and thus was omitted in downstream analyses. The second step was to extract a principal curve to define mathematically the general trend of the data. To overcome the difficulty of extracting a self-intersected curve embedded in a high-dimensional space, we applied our new principal curve method (described above). The parameter was estimated using the elbow method [39] (Additional file 2: Figure S11). Finally, we combined the clustering and principal curve results using the principal curve as a backbone to build a breast cancer progression trajectory (Figure 5). Each node on the figure represents an identified cluster and the node size is proportional to the number of samples in the corresponding cluster. Two connected nodes indicate a possible inter-relationship, and the length of an edge connecting two nodes is proportional to the distance between the centers of the two nodes. We note that the overall structure of the model is consistent with the results of our data visualization analysis, suggesting that the constructed model faithfully reflects the data distribution.Figure 4


Cancer progression modeling using static sample data.

Sun Y, Yao J, Nowak NJ, Goodison S - Genome Biol. (2014)

Model constructed using the METABRIC dataset and its association with clinical and genetic variables.(A) Breast cancer progression tree. Each node represents a cluster and the node size is proportional to the number of samples in the corresponding cluster. Nodes are color-coded based on the PAM50 labels of the majority of the samples in the node. (B,C) Molecular grade and CNA frequency were highly correlated with the N-B (first column) and N-H (second column) progression branches. (D) Spearman’s rank correlation analysis of histological grade, molecular grade, mutation rate and patient age with the two main progression paths. The numbers in parenthesis are P values. CNA, copy number alteration; N-B, normal through luminal to basal phenotype; N-H, normal through luminal to HER2+ phenotype; TCGA, The Cancer Genome Atlas.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4196119&req=5

Fig5: Model constructed using the METABRIC dataset and its association with clinical and genetic variables.(A) Breast cancer progression tree. Each node represents a cluster and the node size is proportional to the number of samples in the corresponding cluster. Nodes are color-coded based on the PAM50 labels of the majority of the samples in the node. (B,C) Molecular grade and CNA frequency were highly correlated with the N-B (first column) and N-H (second column) progression branches. (D) Spearman’s rank correlation analysis of histological grade, molecular grade, mutation rate and patient age with the two main progression paths. The numbers in parenthesis are P values. CNA, copy number alteration; N-B, normal through luminal to basal phenotype; N-H, normal through luminal to HER2+ phenotype; TCGA, The Cancer Genome Atlas.
Mentions: The data visualization analysis provided an overview of data distribution, and informed the design of the novel two-pronged method used to model the cancer progression process formally. Application of the spectral clustering method [28] to the METABRIC data revealed 13 distinct clusters (Figure 4A). To promote a robust clustering assignment, a resampling-based consensus clustering analysis [32] was performed. From the generated consensus matrix shown in Figure 4B, we can clearly identify 13 diagonal blocks, which suggests that the clustering assignment is very stable. This result was further confirmed by silhouette width analysis. The clustering analysis classified 1,900 out of 1,989 (96%) samples with a positive silhouette width and yielded an average silhouette width of 0.47 (Figure 4C). Cluster 11 contains only three samples and thus was omitted in downstream analyses. The second step was to extract a principal curve to define mathematically the general trend of the data. To overcome the difficulty of extracting a self-intersected curve embedded in a high-dimensional space, we applied our new principal curve method (described above). The parameter was estimated using the elbow method [39] (Additional file 2: Figure S11). Finally, we combined the clustering and principal curve results using the principal curve as a backbone to build a breast cancer progression trajectory (Figure 5). Each node on the figure represents an identified cluster and the node size is proportional to the number of samples in the corresponding cluster. Two connected nodes indicate a possible inter-relationship, and the length of an edge connecting two nodes is proportional to the distance between the centers of the two nodes. We note that the overall structure of the model is consistent with the results of our data visualization analysis, suggesting that the constructed model faithfully reflects the data distribution.Figure 4

Bottom Line: We demonstrate the reliability of the method with simulated data, and describe the application to breast cancer data.Our findings support a linear, branching model for breast cancer progression.An interactive model facilitates the identification of key molecular events in the advance of disease to malignancy.

View Article: PubMed Central - PubMed

ABSTRACT
As molecular profiling data continues to accumulate, the design of integrative computational analyses that can provide insights into the dynamic aspects of cancer progression becomes feasible. Here, we present a novel computational method for the construction of cancer progression models based on the analysis of static tumor samples. We demonstrate the reliability of the method with simulated data, and describe the application to breast cancer data. Our findings support a linear, branching model for breast cancer progression. An interactive model facilitates the identification of key molecular events in the advance of disease to malignancy.

Show MeSH
Related in: MedlinePlus