Limits...
Misassembly detection using paired-end sequence reads and optical mapping data.

Muggli MD, Puglisi SJ, Ronen R, Boucher C - Bioinformatics (2015)

Bottom Line: A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes.We develop a method called misSEQuel that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical mapping data.Our method also fulfills the critical need for open source computational methods for analyzing optical mapping data.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Colorado State University, Fort Collins, CO 80526, USA, Department of Computer Science, University of Helsinki, Finland and Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA.

Show MeSH

Related in: MedlinePlus

An illustration about the systematic alterations that occur with rearrangements, inversions, collapsed repeats and expanded repeats. (a) Proper read alignment where mate-pair reads have the correct orientation and distance from each other. A rearrangement or inversion will present itself by the orientation of the reads being incorrect and/or the distance of the mate-pairs being significantly smaller or significantly larger than the expected insert size. This is shown in (b) and (c), respectively. (d) The proper read depth, which is uniform across the genome. (e) A collapsed repeat, which results in the read depth being greater than expected. (f) A expanded repeat, which results in the read depth being lower than expected
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4542784&req=5

btv262-F1: An illustration about the systematic alterations that occur with rearrangements, inversions, collapsed repeats and expanded repeats. (a) Proper read alignment where mate-pair reads have the correct orientation and distance from each other. A rearrangement or inversion will present itself by the orientation of the reads being incorrect and/or the distance of the mate-pairs being significantly smaller or significantly larger than the expected insert size. This is shown in (b) and (c), respectively. (d) The proper read depth, which is uniform across the genome. (e) A collapsed repeat, which results in the read depth being greater than expected. (f) A expanded repeat, which results in the read depth being lower than expected

Mentions: misSEQuel first aligns reads to contigs to identify regions that contain abnormal read alignments. Collapsed or expanded repeats will present as the read coverage being greater or lower than the expected genome coverage in the region that has been misassembled. Similarly, inversion and rearrangement errors will present as the alignment of the mate-pairs being rearranged. Figure 1 illustrates these concordant and discordant read alignments. More specifically, this step consists of aligning all the (paired-end) reads to all the contigs and then calculating three thresholds, ΔL, ΔU and Γ. The range defines the acceptable read depth, and Γ defines the maximum allowable number of reads whose mate-pair aligns in an inverted orientation. To calculate these thresholds, we consider all alignments of each read as opposed to just the best alignment of each read since misassembly errors frequently occur within repetitive regions where the reads will align to multiple locations. misSEQuel performs this step using BWA (version 0.5.9) in paired-end mode with default parameters (Li and Durbin 2009). Subsequently, after alignment, each contig is treated as a series of consecutive 200-bp regions. These are sampled uniformly at random times, and the mean (µd) and the standard deviation (σd) of the read depth and the mean (µi) and the standard deviation (σi) of the number of alignments where a discordant mate-pair orientation is witnessed are calculated from these sampled regions. ΔL is set to the maximum of , ΔU is set to and Γ is set to . The default for is th of the contig length, and this parameter can be changed in the input to misSEQuel.Fig. 1.


Misassembly detection using paired-end sequence reads and optical mapping data.

Muggli MD, Puglisi SJ, Ronen R, Boucher C - Bioinformatics (2015)

An illustration about the systematic alterations that occur with rearrangements, inversions, collapsed repeats and expanded repeats. (a) Proper read alignment where mate-pair reads have the correct orientation and distance from each other. A rearrangement or inversion will present itself by the orientation of the reads being incorrect and/or the distance of the mate-pairs being significantly smaller or significantly larger than the expected insert size. This is shown in (b) and (c), respectively. (d) The proper read depth, which is uniform across the genome. (e) A collapsed repeat, which results in the read depth being greater than expected. (f) A expanded repeat, which results in the read depth being lower than expected
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4542784&req=5

btv262-F1: An illustration about the systematic alterations that occur with rearrangements, inversions, collapsed repeats and expanded repeats. (a) Proper read alignment where mate-pair reads have the correct orientation and distance from each other. A rearrangement or inversion will present itself by the orientation of the reads being incorrect and/or the distance of the mate-pairs being significantly smaller or significantly larger than the expected insert size. This is shown in (b) and (c), respectively. (d) The proper read depth, which is uniform across the genome. (e) A collapsed repeat, which results in the read depth being greater than expected. (f) A expanded repeat, which results in the read depth being lower than expected
Mentions: misSEQuel first aligns reads to contigs to identify regions that contain abnormal read alignments. Collapsed or expanded repeats will present as the read coverage being greater or lower than the expected genome coverage in the region that has been misassembled. Similarly, inversion and rearrangement errors will present as the alignment of the mate-pairs being rearranged. Figure 1 illustrates these concordant and discordant read alignments. More specifically, this step consists of aligning all the (paired-end) reads to all the contigs and then calculating three thresholds, ΔL, ΔU and Γ. The range defines the acceptable read depth, and Γ defines the maximum allowable number of reads whose mate-pair aligns in an inverted orientation. To calculate these thresholds, we consider all alignments of each read as opposed to just the best alignment of each read since misassembly errors frequently occur within repetitive regions where the reads will align to multiple locations. misSEQuel performs this step using BWA (version 0.5.9) in paired-end mode with default parameters (Li and Durbin 2009). Subsequently, after alignment, each contig is treated as a series of consecutive 200-bp regions. These are sampled uniformly at random times, and the mean (µd) and the standard deviation (σd) of the read depth and the mean (µi) and the standard deviation (σi) of the number of alignments where a discordant mate-pair orientation is witnessed are calculated from these sampled regions. ΔL is set to the maximum of , ΔU is set to and Γ is set to . The default for is th of the contig length, and this parameter can be changed in the input to misSEQuel.Fig. 1.

Bottom Line: A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes.We develop a method called misSEQuel that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical mapping data.Our method also fulfills the critical need for open source computational methods for analyzing optical mapping data.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Colorado State University, Fort Collins, CO 80526, USA, Department of Computer Science, University of Helsinki, Finland and Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA.

Show MeSH
Related in: MedlinePlus