Limits...
Metassembler: merging and optimizing de novo genome assemblies.

Wences AH, Schatz MC - Genome Biol. (2015)

Bottom Line: Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses.We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly.We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition.

View Article: PubMed Central - PubMed

Affiliation: Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA. alhernan@cshl.edu.

ABSTRACT
Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net .

No MeSH data available.


Related in: MedlinePlus

Schematic representation of the pairwise merging process. Dark color represents alignment blocks between the primary and secondary assemblies. Light color represents unaligned sequences. 1) For blocks of aligned sequence, the algorithm inserts the primary sequence to the new metassembly. 2) Insertion in the primary with respect to the secondary assembly: because the CE statistic is a large positive value (>3) for the primary sequence, the algorithm skips the primary insertion and chooses the secondary sequence instead. 3) Both assemblies have an unaligned insertion: because the primary insertion is shorter than the secondary insertion, and because the primary has a large negative CE statistic (< −3), the algorithm will choose the secondary insertion over the primary, thus correcting the CE statistic
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4581417&req=5

Fig5: Schematic representation of the pairwise merging process. Dark color represents alignment blocks between the primary and secondary assemblies. Light color represents unaligned sequences. 1) For blocks of aligned sequence, the algorithm inserts the primary sequence to the new metassembly. 2) Insertion in the primary with respect to the secondary assembly: because the CE statistic is a large positive value (>3) for the primary sequence, the algorithm skips the primary insertion and chooses the secondary sequence instead. 3) Both assemblies have an unaligned insertion: because the primary insertion is shorter than the secondary insertion, and because the primary has a large negative CE statistic (< −3), the algorithm will choose the secondary insertion over the primary, thus correcting the CE statistic

Mentions: The metassembler algorithm scans each primary sequence to identify segments of aligned and unaligned sequences indicating gaps or discrepancies. Every aligned segment of the primary sequence is automatically added to the metassembly; in contrast, when a difference is found, the algorithm compares the CE statistic and coverage at the corresponding breakpoint positions to determine which of the two sequences will be added to the metassembly sequence (Fig. 5).Fig. 5


Metassembler: merging and optimizing de novo genome assemblies.

Wences AH, Schatz MC - Genome Biol. (2015)

Schematic representation of the pairwise merging process. Dark color represents alignment blocks between the primary and secondary assemblies. Light color represents unaligned sequences. 1) For blocks of aligned sequence, the algorithm inserts the primary sequence to the new metassembly. 2) Insertion in the primary with respect to the secondary assembly: because the CE statistic is a large positive value (>3) for the primary sequence, the algorithm skips the primary insertion and chooses the secondary sequence instead. 3) Both assemblies have an unaligned insertion: because the primary insertion is shorter than the secondary insertion, and because the primary has a large negative CE statistic (< −3), the algorithm will choose the secondary insertion over the primary, thus correcting the CE statistic
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4581417&req=5

Fig5: Schematic representation of the pairwise merging process. Dark color represents alignment blocks between the primary and secondary assemblies. Light color represents unaligned sequences. 1) For blocks of aligned sequence, the algorithm inserts the primary sequence to the new metassembly. 2) Insertion in the primary with respect to the secondary assembly: because the CE statistic is a large positive value (>3) for the primary sequence, the algorithm skips the primary insertion and chooses the secondary sequence instead. 3) Both assemblies have an unaligned insertion: because the primary insertion is shorter than the secondary insertion, and because the primary has a large negative CE statistic (< −3), the algorithm will choose the secondary insertion over the primary, thus correcting the CE statistic
Mentions: The metassembler algorithm scans each primary sequence to identify segments of aligned and unaligned sequences indicating gaps or discrepancies. Every aligned segment of the primary sequence is automatically added to the metassembly; in contrast, when a difference is found, the algorithm compares the CE statistic and coverage at the corresponding breakpoint positions to determine which of the two sequences will be added to the metassembly sequence (Fig. 5).Fig. 5

Bottom Line: Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses.We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly.We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition.

View Article: PubMed Central - PubMed

Affiliation: Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA. alhernan@cshl.edu.

ABSTRACT
Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net .

No MeSH data available.


Related in: MedlinePlus