Limits...
Metassembler: merging and optimizing de novo genome assemblies.

Wences AH, Schatz MC - Genome Biol. (2015)

Bottom Line: Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses.We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly.We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition.

View Article: PubMed Central - PubMed

Affiliation: Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA. alhernan@cshl.edu.

ABSTRACT
Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net .

No MeSH data available.


Related in: MedlinePlus

Metassembly of fish BCM scaffold FISH00033861. A representation of the changes made to a single scaffold throughout the metassembler pipeline is shown. Scaffold FISH00033861 of the BCM fish assembly (bottom) is taken as the starting point in the metassembly corresponding to the Assemblathon 2 Z score ordering. Vertical blue and green lines represent indel corrections and gap closures made at each merging step
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4581417&req=5

Fig4: Metassembly of fish BCM scaffold FISH00033861. A representation of the changes made to a single scaffold throughout the metassembler pipeline is shown. Scaffold FISH00033861 of the BCM fish assembly (bottom) is taken as the starting point in the metassembly corresponding to the Assemblathon 2 Z score ordering. Vertical blue and green lines represent indel corrections and gap closures made at each merging step

Mentions: Finally, in order to show that metassembler is capable of integrating information from different input assemblies, and to illustrate the power of using assemblies computed with different algorithms and heuristics, we picked the largest scaffold of the BCM fish assembly (FISH00033861) and followed the number of corrected indels and gaps closed throughout the metassembly corresponding to the Assemblathon 2 Z score ordering. As shown in Fig. 4, sequences from all input assemblies is used to improve the original scaffold, to collectively close hundreds of gaps and fix dozens of mis-assemblies in this single scaffold. Moreover, some regions of the original scaffold seem to be preferentially corrected by a particular input assembly or set of input assemblies, thus showing the power of combining multiple assemblies into a single superior metassembly.Fig. 4


Metassembler: merging and optimizing de novo genome assemblies.

Wences AH, Schatz MC - Genome Biol. (2015)

Metassembly of fish BCM scaffold FISH00033861. A representation of the changes made to a single scaffold throughout the metassembler pipeline is shown. Scaffold FISH00033861 of the BCM fish assembly (bottom) is taken as the starting point in the metassembly corresponding to the Assemblathon 2 Z score ordering. Vertical blue and green lines represent indel corrections and gap closures made at each merging step
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4581417&req=5

Fig4: Metassembly of fish BCM scaffold FISH00033861. A representation of the changes made to a single scaffold throughout the metassembler pipeline is shown. Scaffold FISH00033861 of the BCM fish assembly (bottom) is taken as the starting point in the metassembly corresponding to the Assemblathon 2 Z score ordering. Vertical blue and green lines represent indel corrections and gap closures made at each merging step
Mentions: Finally, in order to show that metassembler is capable of integrating information from different input assemblies, and to illustrate the power of using assemblies computed with different algorithms and heuristics, we picked the largest scaffold of the BCM fish assembly (FISH00033861) and followed the number of corrected indels and gaps closed throughout the metassembly corresponding to the Assemblathon 2 Z score ordering. As shown in Fig. 4, sequences from all input assemblies is used to improve the original scaffold, to collectively close hundreds of gaps and fix dozens of mis-assemblies in this single scaffold. Moreover, some regions of the original scaffold seem to be preferentially corrected by a particular input assembly or set of input assemblies, thus showing the power of combining multiple assemblies into a single superior metassembly.Fig. 4

Bottom Line: Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses.We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly.We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition.

View Article: PubMed Central - PubMed

Affiliation: Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA. alhernan@cshl.edu.

ABSTRACT
Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net .

No MeSH data available.


Related in: MedlinePlus