Limits...
Metassembler: merging and optimizing de novo genome assemblies.

Wences AH, Schatz MC - Genome Biol. (2015)

Bottom Line: Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses.We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly.We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition.

View Article: PubMed Central - PubMed

Affiliation: Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA. alhernan@cshl.edu.

ABSTRACT
Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net .

No MeSH data available.


Related in: MedlinePlus

Schematic diagram of the progressive metassembly of three assemblies. All three input assemblies have gap sequences and a variety of errors such that no pair of assemblies will create a perfect assembly. However, the final metassembly of all three assemblies together will reconstruct the entire correct genome. Gap Seq gap sequence, Scf scaffold
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4581417&req=5

Fig6: Schematic diagram of the progressive metassembly of three assemblies. All three input assemblies have gap sequences and a variety of errors such that no pair of assemblies will create a perfect assembly. However, the final metassembly of all three assemblies together will reconstruct the entire correct genome. Gap Seq gap sequence, Scf scaffold

Mentions: After the pairwise merging has been completed with the top two assemblies, the algorithm iterates the procedure using that newly formed metassembly and the next best assembly as inputs (Fig. 6). Assemblies are processed according to the user-specified ordering or ranking scheme, such as ordering by assembly contiguity (N50 size, etc.) or completeness metrics (CEGMA, etc.). For example, in our analysis below we have found that ranking assemblies from largest to smallest by their contig N50 size is a generally effective heuristic.Fig. 6


Metassembler: merging and optimizing de novo genome assemblies.

Wences AH, Schatz MC - Genome Biol. (2015)

Schematic diagram of the progressive metassembly of three assemblies. All three input assemblies have gap sequences and a variety of errors such that no pair of assemblies will create a perfect assembly. However, the final metassembly of all three assemblies together will reconstruct the entire correct genome. Gap Seq gap sequence, Scf scaffold
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4581417&req=5

Fig6: Schematic diagram of the progressive metassembly of three assemblies. All three input assemblies have gap sequences and a variety of errors such that no pair of assemblies will create a perfect assembly. However, the final metassembly of all three assemblies together will reconstruct the entire correct genome. Gap Seq gap sequence, Scf scaffold
Mentions: After the pairwise merging has been completed with the top two assemblies, the algorithm iterates the procedure using that newly formed metassembly and the next best assembly as inputs (Fig. 6). Assemblies are processed according to the user-specified ordering or ranking scheme, such as ordering by assembly contiguity (N50 size, etc.) or completeness metrics (CEGMA, etc.). For example, in our analysis below we have found that ranking assemblies from largest to smallest by their contig N50 size is a generally effective heuristic.Fig. 6

Bottom Line: Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses.We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly.We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition.

View Article: PubMed Central - PubMed

Affiliation: Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA. alhernan@cshl.edu.

ABSTRACT
Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net .

No MeSH data available.


Related in: MedlinePlus