Limits...
Extracting conflict-free information from multi-labeled trees.

Deepak A, Fernández-Baca D, McMahon MM - Algorithms Mol Biol (2013)

Bottom Line: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content.We show that any two MUL-trees with the same information content exhibit the same reduced form.In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Iowa State University, Ames, Iowa, USA. akshayd@iastate.edu.

ABSTRACT

Background: A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious.

Results: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation among MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved.

Conclusions: Our measure of conflict-free information content based on quartets is simple and topologically appealing. In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa. The reduction algorithm is quadratic in the number of leaves and its complexity is unaffected by the multiplicity of leaf labels or the degree of the nodes.

No MeSH data available.


Related in: MedlinePlus

Experimental results: Quality of reduced singly-labeled trees.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3716922&req=5

Figure 11: Experimental results: Quality of reduced singly-labeled trees.

Mentions: In addition to the issue of taxon loss, we investigated the effect of our reduction on edge loss, i.e., the level of resolution within the resulting singly-labeled tree. Input MUL-trees were binary and therefore had more nodes than twice the number of taxa (Figure11, solid line), whereas a binary tree on singly labeled taxa would have approximately as many nodes as twice the number of taxa (Figure11, dashed line). We found that, although there was some edge loss, the number of nodes in the reduced singly-labeled trees (Figure11, dotted line) corresponded well to the total possible, indicating low levels of edge loss. Note that each point on the dotted or solid lines represents an average over all trees with the same number of taxa.


Extracting conflict-free information from multi-labeled trees.

Deepak A, Fernández-Baca D, McMahon MM - Algorithms Mol Biol (2013)

Experimental results: Quality of reduced singly-labeled trees.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3716922&req=5

Figure 11: Experimental results: Quality of reduced singly-labeled trees.
Mentions: In addition to the issue of taxon loss, we investigated the effect of our reduction on edge loss, i.e., the level of resolution within the resulting singly-labeled tree. Input MUL-trees were binary and therefore had more nodes than twice the number of taxa (Figure11, solid line), whereas a binary tree on singly labeled taxa would have approximately as many nodes as twice the number of taxa (Figure11, dashed line). We found that, although there was some edge loss, the number of nodes in the reduced singly-labeled trees (Figure11, dotted line) corresponded well to the total possible, indicating low levels of edge loss. Note that each point on the dotted or solid lines represents an average over all trees with the same number of taxa.

Bottom Line: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content.We show that any two MUL-trees with the same information content exhibit the same reduced form.In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Iowa State University, Ames, Iowa, USA. akshayd@iastate.edu.

ABSTRACT

Background: A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious.

Results: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation among MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved.

Conclusions: Our measure of conflict-free information content based on quartets is simple and topologically appealing. In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa. The reduction algorithm is quadratic in the number of leaves and its complexity is unaffected by the multiplicity of leaf labels or the degree of the nodes.

No MeSH data available.


Related in: MedlinePlus