Cancer progression modeling using static sample data.
Bottom Line: We demonstrate the reliability of the method with simulated data, and describe the application to breast cancer data.Our findings support a linear, branching model for breast cancer progression.An interactive model facilitates the identification of key molecular events in the advance of disease to malignancy.
As molecular profiling data continues to accumulate, the design of integrative computational analyses that can provide insights into the dynamic aspects of cancer progression becomes feasible. Here, we present a novel computational method for the construction of cancer progression models based on the analysis of static tumor samples. We demonstrate the reliability of the method with simulated data, and describe the application to breast cancer data. Our findings support a linear, branching model for breast cancer progression. An interactive model facilitates the identification of key molecular events in the advance of disease to malignancy.
Related in: MedlinePlus
License 1 - License 2
Mentions: The data visualization analysis provided an overview of data distribution, and informed the design of the novel two-pronged method used to model the cancer progression process formally. Application of the spectral clustering method  to the METABRIC data revealed 13 distinct clusters (Figure 4A). To promote a robust clustering assignment, a resampling-based consensus clustering analysis  was performed. From the generated consensus matrix shown in Figure 4B, we can clearly identify 13 diagonal blocks, which suggests that the clustering assignment is very stable. This result was further confirmed by silhouette width analysis. The clustering analysis classified 1,900 out of 1,989 (96%) samples with a positive silhouette width and yielded an average silhouette width of 0.47 (Figure 4C). Cluster 11 contains only three samples and thus was omitted in downstream analyses. The second step was to extract a principal curve to define mathematically the general trend of the data. To overcome the difficulty of extracting a self-intersected curve embedded in a high-dimensional space, we applied our new principal curve method (described above). The parameter was estimated using the elbow method  (Additional file 2: Figure S11). Finally, we combined the clustering and principal curve results using the principal curve as a backbone to build a breast cancer progression trajectory (Figure 5). Each node on the figure represents an identified cluster and the node size is proportional to the number of samples in the corresponding cluster. Two connected nodes indicate a possible inter-relationship, and the length of an edge connecting two nodes is proportional to the distance between the centers of the two nodes. We note that the overall structure of the model is consistent with the results of our data visualization analysis, suggesting that the constructed model faithfully reflects the data distribution.Figure 4