Limits...
Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool.

Shelton JM, Coleman MC, Herndon N, Lu N, Lam ET, Anantharaman T, Sheth P, Brown SJ - BMC Genomics (2015)

Bottom Line: We used a custom assembly workflow to optimize consensus genome map assembly, resulting in an assembly equal to the estimated length of the Tribolium castaneum genome and with an N50 of more than 1 Mb.We used this map for super scaffolding the T. castaneum sequence assembly, more than tripling its N50 with the program Stitch.We report the results of applying these tools to validate and improve a 7x Sanger draft of the T. castaneum genome.

View Article: PubMed Central - PubMed

Affiliation: KSU/K-INBRE Bioinformatics Center, Division of Biology, Kansas State University, Manhattan, KS, USA. sheltonj@ksu.edu.

ABSTRACT

Background: Genome assembly remains an unsolved problem. Assembly projects face a range of hurdles that confound assembly. Thus a variety of tools and approaches are needed to improve draft genomes.

Results: We used a custom assembly workflow to optimize consensus genome map assembly, resulting in an assembly equal to the estimated length of the Tribolium castaneum genome and with an N50 of more than 1 Mb. We used this map for super scaffolding the T. castaneum sequence assembly, more than tripling its N50 with the program Stitch.

Conclusions: In this article we present software that leverages consensus genome maps assembled from extremely long single molecule maps to increase the contiguity of sequence assemblies. We report the results of applying these tools to validate and improve a 7x Sanger draft of the T. castaneum genome.

No MeSH data available.


Related in: MedlinePlus

Steps of the stitch.pl algorithm. Consensus genome maps (blue) are shown aligned to in silico maps (green). Alignments are indicated with grey lines. CMAP orientation for in silico maps is indicated with a “+” or “-” for positive or negative orientation respectively. a The in silico maps are used as the reference. b The alignment is inverted and used as input for stitch.pl. c The alignments are filtered based on alignment length (purple) relative to total possible alignment length (black) and confidence. Here assuming all alignments have high confidence scores and the minimum percent aligned is 30 % two alignments fail for aligning over less than 30 % of the potential alignment length for that alignment. d Filtering produces an XMAP of high quality alignments with short (local) alignments removed. e High quality scaffolding alignments are filtered for longest and highest confidence alignment for each in silico map. The third alignment (unshaded) is filtered because the second alignment is the longest alignment for in silico map 2. f Passing alignments are used to super scaffold (captured gaps indicated in dark green). g Stitch is iterated and additional super scaffolding alignments are found using second best scaffolding alignments. h Iteration takes advantage of cases where in silico maps scaffold consensus genome maps as in silico map 2 does. Stitch is run iteratively until all super scaffolding alignments are found
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4587741&req=5

Fig3: Steps of the stitch.pl algorithm. Consensus genome maps (blue) are shown aligned to in silico maps (green). Alignments are indicated with grey lines. CMAP orientation for in silico maps is indicated with a “+” or “-” for positive or negative orientation respectively. a The in silico maps are used as the reference. b The alignment is inverted and used as input for stitch.pl. c The alignments are filtered based on alignment length (purple) relative to total possible alignment length (black) and confidence. Here assuming all alignments have high confidence scores and the minimum percent aligned is 30 % two alignments fail for aligning over less than 30 % of the potential alignment length for that alignment. d Filtering produces an XMAP of high quality alignments with short (local) alignments removed. e High quality scaffolding alignments are filtered for longest and highest confidence alignment for each in silico map. The third alignment (unshaded) is filtered because the second alignment is the longest alignment for in silico map 2. f Passing alignments are used to super scaffold (captured gaps indicated in dark green). g Stitch is iterated and additional super scaffolding alignments are found using second best scaffolding alignments. h Iteration takes advantage of cases where in silico maps scaffold consensus genome maps as in silico map 2 does. Stitch is run iteratively until all super scaffolding alignments are found

Mentions: The tools described below take raw molecule maps as input, assemble genome maps, and then use these genome maps to super scaffold draft sequence assemblies. The tool AssembleIrysCluster generates consensus genome maps for a range of assembly parameters. We developed AssembleIrysCluster to prepare BNX files for assembly and produce nine customized assembly scripts (Fig. 2c–g). Next, genome maps from the user-selected best assembly are used by the tool Stitch to validate and super scaffold sequence assemblies (Fig. 3).Fig. 2


Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool.

Shelton JM, Coleman MC, Herndon N, Lu N, Lam ET, Anantharaman T, Sheth P, Brown SJ - BMC Genomics (2015)

Steps of the stitch.pl algorithm. Consensus genome maps (blue) are shown aligned to in silico maps (green). Alignments are indicated with grey lines. CMAP orientation for in silico maps is indicated with a “+” or “-” for positive or negative orientation respectively. a The in silico maps are used as the reference. b The alignment is inverted and used as input for stitch.pl. c The alignments are filtered based on alignment length (purple) relative to total possible alignment length (black) and confidence. Here assuming all alignments have high confidence scores and the minimum percent aligned is 30 % two alignments fail for aligning over less than 30 % of the potential alignment length for that alignment. d Filtering produces an XMAP of high quality alignments with short (local) alignments removed. e High quality scaffolding alignments are filtered for longest and highest confidence alignment for each in silico map. The third alignment (unshaded) is filtered because the second alignment is the longest alignment for in silico map 2. f Passing alignments are used to super scaffold (captured gaps indicated in dark green). g Stitch is iterated and additional super scaffolding alignments are found using second best scaffolding alignments. h Iteration takes advantage of cases where in silico maps scaffold consensus genome maps as in silico map 2 does. Stitch is run iteratively until all super scaffolding alignments are found
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4587741&req=5

Fig3: Steps of the stitch.pl algorithm. Consensus genome maps (blue) are shown aligned to in silico maps (green). Alignments are indicated with grey lines. CMAP orientation for in silico maps is indicated with a “+” or “-” for positive or negative orientation respectively. a The in silico maps are used as the reference. b The alignment is inverted and used as input for stitch.pl. c The alignments are filtered based on alignment length (purple) relative to total possible alignment length (black) and confidence. Here assuming all alignments have high confidence scores and the minimum percent aligned is 30 % two alignments fail for aligning over less than 30 % of the potential alignment length for that alignment. d Filtering produces an XMAP of high quality alignments with short (local) alignments removed. e High quality scaffolding alignments are filtered for longest and highest confidence alignment for each in silico map. The third alignment (unshaded) is filtered because the second alignment is the longest alignment for in silico map 2. f Passing alignments are used to super scaffold (captured gaps indicated in dark green). g Stitch is iterated and additional super scaffolding alignments are found using second best scaffolding alignments. h Iteration takes advantage of cases where in silico maps scaffold consensus genome maps as in silico map 2 does. Stitch is run iteratively until all super scaffolding alignments are found
Mentions: The tools described below take raw molecule maps as input, assemble genome maps, and then use these genome maps to super scaffold draft sequence assemblies. The tool AssembleIrysCluster generates consensus genome maps for a range of assembly parameters. We developed AssembleIrysCluster to prepare BNX files for assembly and produce nine customized assembly scripts (Fig. 2c–g). Next, genome maps from the user-selected best assembly are used by the tool Stitch to validate and super scaffold sequence assemblies (Fig. 3).Fig. 2

Bottom Line: We used a custom assembly workflow to optimize consensus genome map assembly, resulting in an assembly equal to the estimated length of the Tribolium castaneum genome and with an N50 of more than 1 Mb.We used this map for super scaffolding the T. castaneum sequence assembly, more than tripling its N50 with the program Stitch.We report the results of applying these tools to validate and improve a 7x Sanger draft of the T. castaneum genome.

View Article: PubMed Central - PubMed

Affiliation: KSU/K-INBRE Bioinformatics Center, Division of Biology, Kansas State University, Manhattan, KS, USA. sheltonj@ksu.edu.

ABSTRACT

Background: Genome assembly remains an unsolved problem. Assembly projects face a range of hurdles that confound assembly. Thus a variety of tools and approaches are needed to improve draft genomes.

Results: We used a custom assembly workflow to optimize consensus genome map assembly, resulting in an assembly equal to the estimated length of the Tribolium castaneum genome and with an N50 of more than 1 Mb. We used this map for super scaffolding the T. castaneum sequence assembly, more than tripling its N50 with the program Stitch.

Conclusions: In this article we present software that leverages consensus genome maps assembled from extremely long single molecule maps to increase the contiguity of sequence assemblies. We report the results of applying these tools to validate and improve a 7x Sanger draft of the T. castaneum genome.

No MeSH data available.


Related in: MedlinePlus