Limits...
Reconstructing genome mixtures from partial adjacencies.

Mahmoody A, Kahn CL, Raphael BJ - BMC Bioinformatics (2012)

Bottom Line: In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes.We show that the 1-MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome.We also show that the k-MCP problem, for k ≥ 3 in general, and the 2-MCP problem with the double-cut-and-join (DCJ) distance are NP-complete, when there are no restriction on the chromosomal structure of the measured, incomplete genome.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Brown University, Providence, RI, USA. ahmad@cs.brown.edu

ABSTRACT
Many cancer genome sequencing efforts are underway with the goal of identifying the somatic mutations that drive cancer progression. A major difficulty in these studies is that tumors are typically heterogeneous, with individual cells in a tumor having different complements of somatic mutations. However, nearly all DNA sequencing technologies sequence DNA from multiple cells, thus resulting in measurement of mutations from a mixture of genomes. Genome rearrangements are a major class of somatic mutations in many tumors, and the novel adjacencies (i.e. breakpoints) resulting from these rearrangements are readily detected from DNA sequencing reads. However, the assignment of each rearrangement, or adjacency, to an individual cancer genome in the mixture is not known. Moreover, the quantity of DNA sequence reads may be insufficient to measure all rearrangements in all genomes in the tumor. Motivated by this application, we formulate the k-minimum completion problem (k-MCP). In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes. We show that the 1-MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome. We also show that the k-MCP problem, for k ≥ 3 in general, and the 2-MCP problem with the double-cut-and-join (DCJ) distance are NP-complete, when there are no restriction on the chromosomal structure of the measured, incomplete genome. These results lay the foundation for future algorithmic studies of the k-MCP and the application of these algorithms to real cancer sequencing data.

Show MeSH

Related in: MedlinePlus

Genome and genome graph. (a) A genome  on the set of genes {1, 2, 3, 4, 5} with two chromosomes (one linear and one circular). . (b) The genome graph (black edges) of  with additional edges (dotted) connecting the extremities of the same gene. There is one cycle component and one path component.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526445&req=5

Figure 1: Genome and genome graph. (a) A genome on the set of genes {1, 2, 3, 4, 5} with two chromosomes (one linear and one circular). . (b) The genome graph (black edges) of with additional edges (dotted) connecting the extremities of the same gene. There is one cycle component and one path component.

Mentions: A gene g is an oriented sequence of nucleotides, with two extremities: a head gh and a tail gt. An adjacency is an unordered pair of gene extremities. A genome on n genes is a set of adjacencies such that each of the 2n gene extremities in is a member of at most one adjacency in . The gene extremities which are not members of any adjacency in are called telomeres of , and we denote the set of all telomeres by (Figure 1-a). Through this work, we assume that the genes of a genome are distinct.


Reconstructing genome mixtures from partial adjacencies.

Mahmoody A, Kahn CL, Raphael BJ - BMC Bioinformatics (2012)

Genome and genome graph. (a) A genome  on the set of genes {1, 2, 3, 4, 5} with two chromosomes (one linear and one circular). . (b) The genome graph (black edges) of  with additional edges (dotted) connecting the extremities of the same gene. There is one cycle component and one path component.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526445&req=5

Figure 1: Genome and genome graph. (a) A genome on the set of genes {1, 2, 3, 4, 5} with two chromosomes (one linear and one circular). . (b) The genome graph (black edges) of with additional edges (dotted) connecting the extremities of the same gene. There is one cycle component and one path component.
Mentions: A gene g is an oriented sequence of nucleotides, with two extremities: a head gh and a tail gt. An adjacency is an unordered pair of gene extremities. A genome on n genes is a set of adjacencies such that each of the 2n gene extremities in is a member of at most one adjacency in . The gene extremities which are not members of any adjacency in are called telomeres of , and we denote the set of all telomeres by (Figure 1-a). Through this work, we assume that the genes of a genome are distinct.

Bottom Line: In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes.We show that the 1-MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome.We also show that the k-MCP problem, for k ≥ 3 in general, and the 2-MCP problem with the double-cut-and-join (DCJ) distance are NP-complete, when there are no restriction on the chromosomal structure of the measured, incomplete genome.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Brown University, Providence, RI, USA. ahmad@cs.brown.edu

ABSTRACT
Many cancer genome sequencing efforts are underway with the goal of identifying the somatic mutations that drive cancer progression. A major difficulty in these studies is that tumors are typically heterogeneous, with individual cells in a tumor having different complements of somatic mutations. However, nearly all DNA sequencing technologies sequence DNA from multiple cells, thus resulting in measurement of mutations from a mixture of genomes. Genome rearrangements are a major class of somatic mutations in many tumors, and the novel adjacencies (i.e. breakpoints) resulting from these rearrangements are readily detected from DNA sequencing reads. However, the assignment of each rearrangement, or adjacency, to an individual cancer genome in the mixture is not known. Moreover, the quantity of DNA sequence reads may be insufficient to measure all rearrangements in all genomes in the tumor. Motivated by this application, we formulate the k-minimum completion problem (k-MCP). In this problem, we aim to reconstruct k genomes derived from a single reference genome, given partial information about the adjacencies present in the mixture of these genomes. We show that the 1-MCP is solvable in linear time in the cases where: (i) the measured, incomplete genome has a single circular or linear chromosome; (ii) there are no restrictions on the chromosomal content of the measured, incomplete genome. We also show that the k-MCP problem, for k ≥ 3 in general, and the 2-MCP problem with the double-cut-and-join (DCJ) distance are NP-complete, when there are no restriction on the chromosomal structure of the measured, incomplete genome. These results lay the foundation for future algorithmic studies of the k-MCP and the application of these algorithms to real cancer sequencing data.

Show MeSH
Related in: MedlinePlus