Limits...
Applying unmixing to gene expression data for tumor phylogeny inference.

Schwartz R, Shackney SE - BMC Bioinformatics (2010)

Bottom Line: Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them.Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development.Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA USA. russells@andrew.cmu.edu

ABSTRACT

Background: While in principle a seemingly infinite variety of combinations of mutations could result in tumor development, in practice it appears that most human cancers fall into a relatively small number of "sub-types," each characterized a roughly equivalent sequence of mutations by which it progresses in different patients. There is currently great interest in identifying the common sub-types and applying them to the development of diagnostics or therapeutics. Phylogenetic methods have shown great promise for inferring common patterns of tumor progression, but suffer from limits of the technologies available for assaying differences between and within tumors. One approach to tumor phylogenetics uses differences between single cells within tumors, gaining valuable information about intra-tumor heterogeneity but allowing only a few markers per cell. An alternative approach uses tissue-wide measures of whole tumors to provide a detailed picture of averaged tumor state but at the cost of losing information about intra-tumor heterogeneity.

Results: The present work applies "unmixing" methods, which separate complex data sets into combinations of simpler components, to attempt to gain advantages of both tissue-wide and single-cell approaches to cancer phylogenetics. We develop an unmixing method to infer recurring cell states from microarray measurements of tumor populations and use the inferred mixtures of states in individual tumors to identify possible evolutionary relationships among tumor cells. Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them. Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development.

Conclusions: Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop. These reconstructions are likely to have future value in discovering and diagnosing novel cancer sub-types and in identifying targets for therapeutic development.

Show MeSH

Related in: MedlinePlus

Examples of mixture components inferred from simulated data sets. Green circles show the true mixture components, red points the simulated data points that serve as the input to the algorithms, and blue X's the inferred mixture components. (a) A uniform mixture of three independent components with no noise. Each data point is a mixture of all three components. Inferred mixture fractions for the three components, averaged over all points, are (0.295 0.367 0.339). (b) A tree-embedded mixture of three components with noise equal to signal. Each data point is a mixture of a root component (top, labeled 1) and one of two leaf components (bottom, labeled 2 and 3). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.410 0.567 0.025) and (0.410 0.020 0.535) (c) A tree-embedded mixture of five components with 10% noise. Each data point contains a portion of the root component (bottom, labeled 1), a subset contain portions of one of two internal components (far left, labeled 2, and far right, labeled 4), and subsets of these contain portions of one of two leaf components (center left, labeled 3, and center right, labeled 5). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.356 0.462 0.141 0.006 0.005) and (0.387 0.072 0.008 0.187 0.378).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2823708&req=5

Figure 2: Examples of mixture components inferred from simulated data sets. Green circles show the true mixture components, red points the simulated data points that serve as the input to the algorithms, and blue X's the inferred mixture components. (a) A uniform mixture of three independent components with no noise. Each data point is a mixture of all three components. Inferred mixture fractions for the three components, averaged over all points, are (0.295 0.367 0.339). (b) A tree-embedded mixture of three components with noise equal to signal. Each data point is a mixture of a root component (top, labeled 1) and one of two leaf components (bottom, labeled 2 and 3). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.410 0.567 0.025) and (0.410 0.020 0.535) (c) A tree-embedded mixture of five components with 10% noise. Each data point contains a portion of the root component (bottom, labeled 1), a subset contain portions of one of two internal components (far left, labeled 2, and far right, labeled 4), and subsets of these contain portions of one of two leaf components (center left, labeled 3, and center right, labeled 5). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.356 0.462 0.141 0.006 0.005) and (0.387 0.072 0.008 0.187 0.378).

Mentions: Fig. 2 shows a few illustrative examples of simulated data sets along with their true and inferred mixture components. Fig. 2(a) shows a trivial case of the problem, a uniform mixture of three components without noise, resulting in a triangular point cloud. The close overlap of the true mixture components (circles) and the inferred components (X's) shows that method could infer the mixture components in this case with high accuracy. Fig. 2(b) shows a tree-embedded sample of three components in the presence of high noise (signal equal to noise). Performance was somewhat degraded, apparently primarily because the simplex produced by the true mixture components was a poorer fit to the noisy data. Fig. 2(c) shows a more complicated evolutionary scenario consisting of five tree-embedded mixture components, with low (10%) noise. The scenario models two progression lineages, with each sample consisting of a component of the root state and zero, one, or two states along a single progression lineage. The result is a simplicial complex consisting of two triangular faces joined at the root point. While there was a clear correspondence between true and inferred mixture components, performance quality was noticeably lower than that for the simpler scenarios.


Applying unmixing to gene expression data for tumor phylogeny inference.

Schwartz R, Shackney SE - BMC Bioinformatics (2010)

Examples of mixture components inferred from simulated data sets. Green circles show the true mixture components, red points the simulated data points that serve as the input to the algorithms, and blue X's the inferred mixture components. (a) A uniform mixture of three independent components with no noise. Each data point is a mixture of all three components. Inferred mixture fractions for the three components, averaged over all points, are (0.295 0.367 0.339). (b) A tree-embedded mixture of three components with noise equal to signal. Each data point is a mixture of a root component (top, labeled 1) and one of two leaf components (bottom, labeled 2 and 3). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.410 0.567 0.025) and (0.410 0.020 0.535) (c) A tree-embedded mixture of five components with 10% noise. Each data point contains a portion of the root component (bottom, labeled 1), a subset contain portions of one of two internal components (far left, labeled 2, and far right, labeled 4), and subsets of these contain portions of one of two leaf components (center left, labeled 3, and center right, labeled 5). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.356 0.462 0.141 0.006 0.005) and (0.387 0.072 0.008 0.187 0.378).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2823708&req=5

Figure 2: Examples of mixture components inferred from simulated data sets. Green circles show the true mixture components, red points the simulated data points that serve as the input to the algorithms, and blue X's the inferred mixture components. (a) A uniform mixture of three independent components with no noise. Each data point is a mixture of all three components. Inferred mixture fractions for the three components, averaged over all points, are (0.295 0.367 0.339). (b) A tree-embedded mixture of three components with noise equal to signal. Each data point is a mixture of a root component (top, labeled 1) and one of two leaf components (bottom, labeled 2 and 3). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.410 0.567 0.025) and (0.410 0.020 0.535) (c) A tree-embedded mixture of five components with 10% noise. Each data point contains a portion of the root component (bottom, labeled 1), a subset contain portions of one of two internal components (far left, labeled 2, and far right, labeled 4), and subsets of these contain portions of one of two leaf components (center left, labeled 3, and center right, labeled 5). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.356 0.462 0.141 0.006 0.005) and (0.387 0.072 0.008 0.187 0.378).
Mentions: Fig. 2 shows a few illustrative examples of simulated data sets along with their true and inferred mixture components. Fig. 2(a) shows a trivial case of the problem, a uniform mixture of three components without noise, resulting in a triangular point cloud. The close overlap of the true mixture components (circles) and the inferred components (X's) shows that method could infer the mixture components in this case with high accuracy. Fig. 2(b) shows a tree-embedded sample of three components in the presence of high noise (signal equal to noise). Performance was somewhat degraded, apparently primarily because the simplex produced by the true mixture components was a poorer fit to the noisy data. Fig. 2(c) shows a more complicated evolutionary scenario consisting of five tree-embedded mixture components, with low (10%) noise. The scenario models two progression lineages, with each sample consisting of a component of the root state and zero, one, or two states along a single progression lineage. The result is a simplicial complex consisting of two triangular faces joined at the root point. While there was a clear correspondence between true and inferred mixture components, performance quality was noticeably lower than that for the simpler scenarios.

Bottom Line: Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them.Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development.Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA USA. russells@andrew.cmu.edu

ABSTRACT

Background: While in principle a seemingly infinite variety of combinations of mutations could result in tumor development, in practice it appears that most human cancers fall into a relatively small number of "sub-types," each characterized a roughly equivalent sequence of mutations by which it progresses in different patients. There is currently great interest in identifying the common sub-types and applying them to the development of diagnostics or therapeutics. Phylogenetic methods have shown great promise for inferring common patterns of tumor progression, but suffer from limits of the technologies available for assaying differences between and within tumors. One approach to tumor phylogenetics uses differences between single cells within tumors, gaining valuable information about intra-tumor heterogeneity but allowing only a few markers per cell. An alternative approach uses tissue-wide measures of whole tumors to provide a detailed picture of averaged tumor state but at the cost of losing information about intra-tumor heterogeneity.

Results: The present work applies "unmixing" methods, which separate complex data sets into combinations of simpler components, to attempt to gain advantages of both tissue-wide and single-cell approaches to cancer phylogenetics. We develop an unmixing method to infer recurring cell states from microarray measurements of tumor populations and use the inferred mixtures of states in individual tumors to identify possible evolutionary relationships among tumor cells. Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them. Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development.

Conclusions: Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop. These reconstructions are likely to have future value in discovering and diagnosing novel cancer sub-types and in identifying targets for therapeutic development.

Show MeSH
Related in: MedlinePlus