Limits...
Genome alignment with graph data structures: a comparison.

Kehr B, Trappe K, Holtgrewe M, Reinert K - BMC Bioinformatics (2014)

Bottom Line: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs.Still, many ideas are shared among all graph-based approaches.Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Freie Universit├Ąt Berlin, Takustr, 9, 14195 Berlin, Germany. birte.kehr@fu-berlin.de.

ABSTRACT

Background: Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference.Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment.

Results: We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures.

Conclusion: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

Show MeSH

Related in: MedlinePlus

Alternative alignments of the sequences CATCGA and CCGATA. The alignment on the left is colinear if the dinucleotides AT (red) are interpreted as insertion or deletion. Alternatively, the AT dinucleotides can be aligned and the CG dinucleotides interpreted as insertion or deletion. Non-colinear aligners that allow for translocations may align the AT dinucleotides in addition to the CG dinucleotides. The alignment on the right shows a non-colinear alternative that interprets the four nucleotides ATCG as inversion (reverse complement). In this example, we expect non-colinear aligners to prefer the inversion (right) over the translocation (left) since it creates fewer segments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4020321&req=5

Figure 1: Alternative alignments of the sequences CATCGA and CCGATA. The alignment on the left is colinear if the dinucleotides AT (red) are interpreted as insertion or deletion. Alternatively, the AT dinucleotides can be aligned and the CG dinucleotides interpreted as insertion or deletion. Non-colinear aligners that allow for translocations may align the AT dinucleotides in addition to the CG dinucleotides. The alignment on the right shows a non-colinear alternative that interprets the four nucleotides ATCG as inversion (reverse complement). In this example, we expect non-colinear aligners to prefer the inversion (right) over the translocation (left) since it creates fewer segments.

Mentions: Considering the large search space, genome alignment is an ambitious task and is usually accomplished using heuristic approaches. The first step in genome alignment is commonly the computation of a set of local alignments. It is essential for most methods that the set of local alignments covers all main genomic similarities, whereas additional spurious similarities have a smaller impact. In colinear alignment, such a set usually constitutes a superposition of several alignment possibilities with some local alignments in conflict regarding the colinearity constraint (see Figure 1). The task is then to select the best conflict-free subset according to a given optimization function. In genome alignment, as opposed to colinear alignment, any set of local alignments can be viewed as a valid solution, one that induces a segmentation. However, the induced segmentation can be improved by selecting a subset of local alignments. The subset should contain those local alignments that are most likely to represent homologies when viewed in the context of the whole set of local alignments. The final step is then to find the best segmentation according to the set of local alignments and possibly a subsequent realignment of segments with a colinear alignment method.


Genome alignment with graph data structures: a comparison.

Kehr B, Trappe K, Holtgrewe M, Reinert K - BMC Bioinformatics (2014)

Alternative alignments of the sequences CATCGA and CCGATA. The alignment on the left is colinear if the dinucleotides AT (red) are interpreted as insertion or deletion. Alternatively, the AT dinucleotides can be aligned and the CG dinucleotides interpreted as insertion or deletion. Non-colinear aligners that allow for translocations may align the AT dinucleotides in addition to the CG dinucleotides. The alignment on the right shows a non-colinear alternative that interprets the four nucleotides ATCG as inversion (reverse complement). In this example, we expect non-colinear aligners to prefer the inversion (right) over the translocation (left) since it creates fewer segments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4020321&req=5

Figure 1: Alternative alignments of the sequences CATCGA and CCGATA. The alignment on the left is colinear if the dinucleotides AT (red) are interpreted as insertion or deletion. Alternatively, the AT dinucleotides can be aligned and the CG dinucleotides interpreted as insertion or deletion. Non-colinear aligners that allow for translocations may align the AT dinucleotides in addition to the CG dinucleotides. The alignment on the right shows a non-colinear alternative that interprets the four nucleotides ATCG as inversion (reverse complement). In this example, we expect non-colinear aligners to prefer the inversion (right) over the translocation (left) since it creates fewer segments.
Mentions: Considering the large search space, genome alignment is an ambitious task and is usually accomplished using heuristic approaches. The first step in genome alignment is commonly the computation of a set of local alignments. It is essential for most methods that the set of local alignments covers all main genomic similarities, whereas additional spurious similarities have a smaller impact. In colinear alignment, such a set usually constitutes a superposition of several alignment possibilities with some local alignments in conflict regarding the colinearity constraint (see Figure 1). The task is then to select the best conflict-free subset according to a given optimization function. In genome alignment, as opposed to colinear alignment, any set of local alignments can be viewed as a valid solution, one that induces a segmentation. However, the induced segmentation can be improved by selecting a subset of local alignments. The subset should contain those local alignments that are most likely to represent homologies when viewed in the context of the whole set of local alignments. The final step is then to find the best segmentation according to the set of local alignments and possibly a subsequent realignment of segments with a colinear alignment method.

Bottom Line: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs.Still, many ideas are shared among all graph-based approaches.Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Freie Universit├Ąt Berlin, Takustr, 9, 14195 Berlin, Germany. birte.kehr@fu-berlin.de.

ABSTRACT

Background: Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference.Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment.

Results: We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures.

Conclusion: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

Show MeSH
Related in: MedlinePlus