Limits...
A minimal descriptor of an ancestral recombinations graph.

Parida L, Palamara PF, Javed A - BMC Bioinformatics (2011)

Bottom Line: Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of G and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of G.Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms.We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics, IBM T J Watson Research, Yorktown, New York, USA. parida@us.ibm.com

ABSTRACT

Background: Ancestral Recombinations Graph (ARG) is a phylogenetic structure that encodes both duplication events, such as mutations, as well as genetic exchange events, such as recombinations: this captures the (genetic) dynamics of a population evolving over generations.

Results: In this paper, we identify structure-preserving and samples-preserving core of an ARG G and call it the minimal descriptor ARG of G. Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of G and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of G. We also prove that even an unbounded G has a finite minimal descriptor, that continues to preserve certain (graph-theoretic) properties of G and for an appropriate class of ARGs, our estimate (Eqn 8) as well as empirical observation is that the expected reduction in the number of vertices is exponential.

Conclusions: Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms. We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).

Show MeSH
(a) Unbounded G (b) G′←G\U (c) Gmd Bounded Gmd of unbounded G: (a) An unbounded G. Here K = 2 corresponding to the samples numbered 1 and 2 and M = 2, for the two segments colored red and blue. The pattern of vertices and edges can be repeated along the dashed edges to give an unbounded structure. The coalescent nodes that are not t-coalescent are marked by circles: let U be the set of all such nodes. (b) G′ is obtained from G after removing the vertices in U. (c) Gmd is obtained by the 1-vertex compactification of G′ with the new vertex v′.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3044314&req=5

Figure 8: (a) Unbounded G (b) G′←G\U (c) Gmd Bounded Gmd of unbounded G: (a) An unbounded G. Here K = 2 corresponding to the samples numbered 1 and 2 and M = 2, for the two segments colored red and blue. The pattern of vertices and edges can be repeated along the dashed edges to give an unbounded structure. The coalescent nodes that are not t-coalescent are marked by circles: let U be the set of all such nodes. (b) G′ is obtained from G after removing the vertices in U. (c) Gmd is obtained by the 1-vertex compactification of G′ with the new vertex v′.

Mentions: By the definition of the parameters of the ARG, G must have at least one genetic exchange vertex encoding (M – 1) genetic exchange events. However, the above count excludes the leaf nodes and it is possible to encode these (M – 1) events in a single genetic exchange leaf node. Hence a lower bound of 0 for ne. When a vertex in the ARG is displayed as a linear ordering of the non-mixing segments with distinct colors for each segment (as in Fig 6(b)), then the potential junctions for the exchange events are at the junction of the colored segments. Recall that by the definition of the ARG, each unit has at most M non-mixing segments, hence can have no more than M – 1 genetic exchange events. Thus there are K(M – 1) such junctions potentially each representing a recombination (or exchange) event. We adopt the following convention: each non-mixing segment in a vertex v contributes to a junction to its left. Thus, by this convention, the left-most segment has no associated junction. See vertex v in Fig 8 as an illustration. Each distinct non-mixing segment is shown by a distinct color; gap is shown as a white segment and junctions are marked by arrows of the same color as that of the segment associated with it. Thus the green, red, blue, magenta colored segments show the associated junctions, and the leftmost green segment has none in Fig 8(b). We call the non leftmost as interior segments. Also, note that a gap does not contribute to a potential junction by our convention. For a recombination event to occur multiple times at the same junction involving the same or similar set of samples, it is clear that a coalescence must occur. Then the following can be verified.


A minimal descriptor of an ancestral recombinations graph.

Parida L, Palamara PF, Javed A - BMC Bioinformatics (2011)

(a) Unbounded G (b) G′←G\U (c) Gmd Bounded Gmd of unbounded G: (a) An unbounded G. Here K = 2 corresponding to the samples numbered 1 and 2 and M = 2, for the two segments colored red and blue. The pattern of vertices and edges can be repeated along the dashed edges to give an unbounded structure. The coalescent nodes that are not t-coalescent are marked by circles: let U be the set of all such nodes. (b) G′ is obtained from G after removing the vertices in U. (c) Gmd is obtained by the 1-vertex compactification of G′ with the new vertex v′.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3044314&req=5

Figure 8: (a) Unbounded G (b) G′←G\U (c) Gmd Bounded Gmd of unbounded G: (a) An unbounded G. Here K = 2 corresponding to the samples numbered 1 and 2 and M = 2, for the two segments colored red and blue. The pattern of vertices and edges can be repeated along the dashed edges to give an unbounded structure. The coalescent nodes that are not t-coalescent are marked by circles: let U be the set of all such nodes. (b) G′ is obtained from G after removing the vertices in U. (c) Gmd is obtained by the 1-vertex compactification of G′ with the new vertex v′.
Mentions: By the definition of the parameters of the ARG, G must have at least one genetic exchange vertex encoding (M – 1) genetic exchange events. However, the above count excludes the leaf nodes and it is possible to encode these (M – 1) events in a single genetic exchange leaf node. Hence a lower bound of 0 for ne. When a vertex in the ARG is displayed as a linear ordering of the non-mixing segments with distinct colors for each segment (as in Fig 6(b)), then the potential junctions for the exchange events are at the junction of the colored segments. Recall that by the definition of the ARG, each unit has at most M non-mixing segments, hence can have no more than M – 1 genetic exchange events. Thus there are K(M – 1) such junctions potentially each representing a recombination (or exchange) event. We adopt the following convention: each non-mixing segment in a vertex v contributes to a junction to its left. Thus, by this convention, the left-most segment has no associated junction. See vertex v in Fig 8 as an illustration. Each distinct non-mixing segment is shown by a distinct color; gap is shown as a white segment and junctions are marked by arrows of the same color as that of the segment associated with it. Thus the green, red, blue, magenta colored segments show the associated junctions, and the leftmost green segment has none in Fig 8(b). We call the non leftmost as interior segments. Also, note that a gap does not contribute to a potential junction by our convention. For a recombination event to occur multiple times at the same junction involving the same or similar set of samples, it is clear that a coalescence must occur. Then the following can be verified.

Bottom Line: Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of G and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of G.Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms.We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics, IBM T J Watson Research, Yorktown, New York, USA. parida@us.ibm.com

ABSTRACT

Background: Ancestral Recombinations Graph (ARG) is a phylogenetic structure that encodes both duplication events, such as mutations, as well as genetic exchange events, such as recombinations: this captures the (genetic) dynamics of a population evolving over generations.

Results: In this paper, we identify structure-preserving and samples-preserving core of an ARG G and call it the minimal descriptor ARG of G. Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of G and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of G. We also prove that even an unbounded G has a finite minimal descriptor, that continues to preserve certain (graph-theoretic) properties of G and for an appropriate class of ARGs, our estimate (Eqn 8) as well as empirical observation is that the expected reduction in the number of vertices is exponential.

Conclusions: Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms. We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).

Show MeSH