Limits...
Genome alignment with graph data structures: a comparison.

Kehr B, Trappe K, Holtgrewe M, Reinert K - BMC Bioinformatics (2014)

Bottom Line: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs.Still, many ideas are shared among all graph-based approaches.Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Freie Universit├Ąt Berlin, Takustr, 9, 14195 Berlin, Germany. birte.kehr@fu-berlin.de.

ABSTRACT

Background: Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference.Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment.

Results: We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures.

Conclusion: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

Show MeSH

Related in: MedlinePlus

Genome alignment cycles have a one-to-one correspondence only in the structure of Enredo graphs.(A) A cycle in the Enredo graph structure may correspond to several overlapping cycles in the alignment graph structure. In this example, two cycles in the alignment graph structure are shaded in red and blue. (B) Cycles caused by inversions appear only in the Enredo graph structure. In this example, the upper cycle in the Enredo graph structure is due to an inversion in block A, hence, does not appear in the A-Bruijn graph structure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4020321&req=5

Figure 9: Genome alignment cycles have a one-to-one correspondence only in the structure of Enredo graphs.(A) A cycle in the Enredo graph structure may correspond to several overlapping cycles in the alignment graph structure. In this example, two cycles in the alignment graph structure are shaded in red and blue. (B) Cycles caused by inversions appear only in the Enredo graph structure. In this example, the upper cycle in the Enredo graph structure is due to an inversion in block A, hence, does not appear in the A-Bruijn graph structure.

Mentions: The definition of genome alignment cycles corresponds to simple mixed cycles in the Enredo graph structure. They mostly appear in the A-Bruijn graph structure and alignment graph structure as (mixed) simple cycles, too, but there is no one-to-one correspondence: The alignment graph structure can have more than one cycle for a single genome alignment cycle (see Figure 9A); and genome alignment cycles that are caused by inversions are not visible in the alignment graph structure and A-Bruijn graph structure. Figure 9B shows an example for two genome alignment cycles that appear as a single cycle in the A-Bruijn graph structure. Despite these essential differences, cycles in the alignment graph structure, A-Bruijn graph structure, and Enredo graph structure always correspond to genome alignment cycles as opposed to cycles in the cactus graph structure. Subgraphs in the structures of alignment graphs, A-Bruijn graphs, and Enredo graphs that correspond to cycles in the cactus graph structure are not even necessarily connected (see below).


Genome alignment with graph data structures: a comparison.

Kehr B, Trappe K, Holtgrewe M, Reinert K - BMC Bioinformatics (2014)

Genome alignment cycles have a one-to-one correspondence only in the structure of Enredo graphs.(A) A cycle in the Enredo graph structure may correspond to several overlapping cycles in the alignment graph structure. In this example, two cycles in the alignment graph structure are shaded in red and blue. (B) Cycles caused by inversions appear only in the Enredo graph structure. In this example, the upper cycle in the Enredo graph structure is due to an inversion in block A, hence, does not appear in the A-Bruijn graph structure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4020321&req=5

Figure 9: Genome alignment cycles have a one-to-one correspondence only in the structure of Enredo graphs.(A) A cycle in the Enredo graph structure may correspond to several overlapping cycles in the alignment graph structure. In this example, two cycles in the alignment graph structure are shaded in red and blue. (B) Cycles caused by inversions appear only in the Enredo graph structure. In this example, the upper cycle in the Enredo graph structure is due to an inversion in block A, hence, does not appear in the A-Bruijn graph structure.
Mentions: The definition of genome alignment cycles corresponds to simple mixed cycles in the Enredo graph structure. They mostly appear in the A-Bruijn graph structure and alignment graph structure as (mixed) simple cycles, too, but there is no one-to-one correspondence: The alignment graph structure can have more than one cycle for a single genome alignment cycle (see Figure 9A); and genome alignment cycles that are caused by inversions are not visible in the alignment graph structure and A-Bruijn graph structure. Figure 9B shows an example for two genome alignment cycles that appear as a single cycle in the A-Bruijn graph structure. Despite these essential differences, cycles in the alignment graph structure, A-Bruijn graph structure, and Enredo graph structure always correspond to genome alignment cycles as opposed to cycles in the cactus graph structure. Subgraphs in the structures of alignment graphs, A-Bruijn graphs, and Enredo graphs that correspond to cycles in the cactus graph structure are not even necessarily connected (see below).

Bottom Line: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs.Still, many ideas are shared among all graph-based approaches.Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Freie Universit├Ąt Berlin, Takustr, 9, 14195 Berlin, Germany. birte.kehr@fu-berlin.de.

ABSTRACT

Background: Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference.Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment.

Results: We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures.

Conclusion: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

Show MeSH
Related in: MedlinePlus