Limits...
Genome alignment with graph data structures: a comparison.

Kehr B, Trappe K, Holtgrewe M, Reinert K - BMC Bioinformatics (2014)

Bottom Line: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs.Still, many ideas are shared among all graph-based approaches.Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Freie Universit├Ąt Berlin, Takustr, 9, 14195 Berlin, Germany. birte.kehr@fu-berlin.de.

ABSTRACT

Background: Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference.Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment.

Results: We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures.

Conclusion: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

Show MeSH

Related in: MedlinePlus

An alignment of three genomes with eleven blocks in all four graph representations. The example covers multiple non-colinear events: Blocks A, E, I, K are conserved in all three genomes without large structural changes. Blocks B, C, and D, as well as G, and H appear in different orders and orientations. Block F is missing in the red genome and J occurs twice. Colors denote the three genomes. In the alignment graph, dashed edges indicate the alignment of a segment with its reverse complement. We consider the information provided by line styles not to be part of the graph structures. In the Enredo graph, components connected by adjacency edges are shaded in gray. For the cactus graph, the figure additionally shows a precursor. Furthermore, enlarged vertices for the precursor and the final cactus graph show adjacencies in vertices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4020321&req=5

Figure 4: An alignment of three genomes with eleven blocks in all four graph representations. The example covers multiple non-colinear events: Blocks A, E, I, K are conserved in all three genomes without large structural changes. Blocks B, C, and D, as well as G, and H appear in different orders and orientations. Block F is missing in the red genome and J occurs twice. Colors denote the three genomes. In the alignment graph, dashed edges indicate the alignment of a segment with its reverse complement. We consider the information provided by line styles not to be part of the graph structures. In the Enredo graph, components connected by adjacency edges are shaded in gray. For the cactus graph, the figure additionally shows a precursor. Furthermore, enlarged vertices for the precursor and the final cactus graph show adjacencies in vertices.

Mentions: We limit our comparison to alignment graphs, A-Bruijn graphs, Enredo graphs, and cactus graphs. The original publications of these graphs use varying terminology. We describe all four graphs using the same terminology, namely the above defined terms segment, block and adjacency. Figure 4 displays an example alignment with eleven blocks as alignment graph, A-Bruijn graph, Enredo graph, and cactus graph.


Genome alignment with graph data structures: a comparison.

Kehr B, Trappe K, Holtgrewe M, Reinert K - BMC Bioinformatics (2014)

An alignment of three genomes with eleven blocks in all four graph representations. The example covers multiple non-colinear events: Blocks A, E, I, K are conserved in all three genomes without large structural changes. Blocks B, C, and D, as well as G, and H appear in different orders and orientations. Block F is missing in the red genome and J occurs twice. Colors denote the three genomes. In the alignment graph, dashed edges indicate the alignment of a segment with its reverse complement. We consider the information provided by line styles not to be part of the graph structures. In the Enredo graph, components connected by adjacency edges are shaded in gray. For the cactus graph, the figure additionally shows a precursor. Furthermore, enlarged vertices for the precursor and the final cactus graph show adjacencies in vertices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4020321&req=5

Figure 4: An alignment of three genomes with eleven blocks in all four graph representations. The example covers multiple non-colinear events: Blocks A, E, I, K are conserved in all three genomes without large structural changes. Blocks B, C, and D, as well as G, and H appear in different orders and orientations. Block F is missing in the red genome and J occurs twice. Colors denote the three genomes. In the alignment graph, dashed edges indicate the alignment of a segment with its reverse complement. We consider the information provided by line styles not to be part of the graph structures. In the Enredo graph, components connected by adjacency edges are shaded in gray. For the cactus graph, the figure additionally shows a precursor. Furthermore, enlarged vertices for the precursor and the final cactus graph show adjacencies in vertices.
Mentions: We limit our comparison to alignment graphs, A-Bruijn graphs, Enredo graphs, and cactus graphs. The original publications of these graphs use varying terminology. We describe all four graphs using the same terminology, namely the above defined terms segment, block and adjacency. Figure 4 displays an example alignment with eleven blocks as alignment graph, A-Bruijn graph, Enredo graph, and cactus graph.

Bottom Line: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs.Still, many ideas are shared among all graph-based approaches.Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Freie Universit├Ąt Berlin, Takustr, 9, 14195 Berlin, Germany. birte.kehr@fu-berlin.de.

ABSTRACT

Background: Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference.Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment.

Results: We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures.

Conclusion: We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools.

Show MeSH
Related in: MedlinePlus