Limits...
Improving ancient DNA genome assembly

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Most reconstruction methods for genomes of ancient origin that are used today require a closely related reference. In order to identify genomic rearrangements or the deletion of whole genes, de novo assembly has to be used. However, because of inherent problems with ancient DNA, its de novo assembly is highly complicated. In order to tackle the diversity in the length of the input reads, we propose a two-layer approach, where multiple assemblies are generated in the first layer, which are then combined in the second layer. We used this two-layer assembly to generate assemblies for two different ancient samples and compared the results to current de novo assembly approaches. We are able to improve the assembly with respect to the length of the contigs and can resolve more repetitive regions.

No MeSH data available.


Related in: MedlinePlus

The percentage of the reference genome of Mycobacterium leprae TN that could be covered using only contigs longer than the minimum contig length.Results from the first and second layer assemblies are shown.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5384568&req=5

fig-4: The percentage of the reference genome of Mycobacterium leprae TN that could be covered using only contigs longer than the minimum contig length.Results from the first and second layer assemblies are shown.

Mentions: Since one normally is interested in one genome of interest, we computed the genome coverage after mapping all contigs of length at least 1,000 bases against the reference genome of Mycobacterium leprae TN. We used Qualimap2 (Okonechnikov, Conesa & García-Alcalde, 2015) for the analysis of the mapping. We also analyzed the coverage of the leprosy genome, that could be achieved using only contigs longer than 1,000, 1,500, …, 10,000 bp (see Fig. 4). It shows that the percentage of the genome that could be covered is always higher after the second layer assembly than using only the results generated in the first layer assemblies. This becomes more and more pronounced with increasing filter threshold for the minimum contig length. The drop in coverage that results from the removal of shorter contigs is lower for our two-layer approach than using only first-layer assemblies. When using only contigs longer than 1,000 bp, the results are almost the same. Using only contigs longer than 10,000 bp, around 90% of the genome can be covered using the second layer assembly with SGA, whereas at most 80% of the genome is covered by contigs from assemblies generated in the first layer. This means that the same percentage of coverage of the reference genome can be achieved with longer contigs in comparison to the results generated in the first layer. When filtering the pestis data for contigs with a minimum length of 1,000 bp, the best coverage by assemblies of the first layer that could be achieved was 60%. The coverages that could be achieved by the second layer assemblies range between 70 and 83%, where each assembly improved on the ones of the first layer by at least 16% (see Supplemental Information 1). Analyzing the mapped contigs that were generated by the second layer, we found that they mapped almost perfectly (with some small insertions and deletions) against the reference genome.


Improving ancient DNA genome assembly
The percentage of the reference genome of Mycobacterium leprae TN that could be covered using only contigs longer than the minimum contig length.Results from the first and second layer assemblies are shown.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5384568&req=5

fig-4: The percentage of the reference genome of Mycobacterium leprae TN that could be covered using only contigs longer than the minimum contig length.Results from the first and second layer assemblies are shown.
Mentions: Since one normally is interested in one genome of interest, we computed the genome coverage after mapping all contigs of length at least 1,000 bases against the reference genome of Mycobacterium leprae TN. We used Qualimap2 (Okonechnikov, Conesa & García-Alcalde, 2015) for the analysis of the mapping. We also analyzed the coverage of the leprosy genome, that could be achieved using only contigs longer than 1,000, 1,500, …, 10,000 bp (see Fig. 4). It shows that the percentage of the genome that could be covered is always higher after the second layer assembly than using only the results generated in the first layer assemblies. This becomes more and more pronounced with increasing filter threshold for the minimum contig length. The drop in coverage that results from the removal of shorter contigs is lower for our two-layer approach than using only first-layer assemblies. When using only contigs longer than 1,000 bp, the results are almost the same. Using only contigs longer than 10,000 bp, around 90% of the genome can be covered using the second layer assembly with SGA, whereas at most 80% of the genome is covered by contigs from assemblies generated in the first layer. This means that the same percentage of coverage of the reference genome can be achieved with longer contigs in comparison to the results generated in the first layer. When filtering the pestis data for contigs with a minimum length of 1,000 bp, the best coverage by assemblies of the first layer that could be achieved was 60%. The coverages that could be achieved by the second layer assemblies range between 70 and 83%, where each assembly improved on the ones of the first layer by at least 16% (see Supplemental Information 1). Analyzing the mapped contigs that were generated by the second layer, we found that they mapped almost perfectly (with some small insertions and deletions) against the reference genome.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Most reconstruction methods for genomes of ancient origin that are used today require a closely related reference. In order to identify genomic rearrangements or the deletion of whole genes, de novo assembly has to be used. However, because of inherent problems with ancient DNA, its de novo assembly is highly complicated. In order to tackle the diversity in the length of the input reads, we propose a two-layer approach, where multiple assemblies are generated in the first layer, which are then combined in the second layer. We used this two-layer assembly to generate assemblies for two different ancient samples and compared the results to current de novo assembly approaches. We are able to improve the assembly with respect to the length of the contigs and can resolve more repetitive regions.

No MeSH data available.


Related in: MedlinePlus