Limits...
Assessment of de novo assemblers for draft genomes: a case study with fungal genomes.

Abbas MM, Malluhi QM, Balakrishnan P - BMC Genomics (2014)

Bottom Line: We compared the performance of these assemblers by considering both computational as well as quality metrics.Our results demonstrate that the assemblers ABySS and IDBA-UD exhibit a good performance for the studied data from fungal genomes in terms of running time, memory, and quality.The results suggest that whole genome shotgun sequencing projects should make use of different assemblers by considering their merits.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Recently, large bio-projects dealing with the release of different genomes have transpired. Most of these projects use next-generation sequencing platforms. As a consequence, many de novo assembly tools have evolved to assemble the reads generated by these platforms. Each tool has its own inherent advantages and disadvantages, which make the selection of an appropriate tool a challenging task.

Results: We have evaluated the performance of frequently used de novo assemblers namely ABySS, IDBA-UD, Minia, SOAP, SPAdes, Sparse, and Velvet. These assemblers are assessed based on their output quality during the assembly process conducted over fungal data. We compared the performance of these assemblers by considering both computational as well as quality metrics. By analyzing these performance metrics, the assemblers are ranked and a procedure for choosing the candidate assembler is illustrated.

Conclusions: In this study, we propose an assessment method for the selection of de novo assemblers by considering their computational as well as quality metrics at the draft genome level. We divide the quality metrics into three groups: g1 measures the goodness of the assemblies, g2 measures the problems of the assemblies, and g3 measures the conservation elements in the assemblies. Our results demonstrate that the assemblers ABySS and IDBA-UD exhibit a good performance for the studied data from fungal genomes in terms of running time, memory, and quality. The results suggest that whole genome shotgun sequencing projects should make use of different assemblers by considering their merits.

Show MeSH
Comparison for N50 size metric for the studied assemblers at contigs level. (a): BcDw1, (b): UCRNP2, (c): UCRPA7, (d): UCREL1, (e): PST21.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4290589&req=5

Figure 1: Comparison for N50 size metric for the studied assemblers at contigs level. (a): BcDw1, (b): UCRNP2, (c): UCRPA7, (d): UCREL1, (e): PST21.

Mentions: In (Table-S1; Additional file 1), which gives the quality metrics of all assemblers for the BcDw1 dataset, the assemblers ABySS, IDBA-UD, and SPAdes have better goodness (g1) metrics performance than the current draft genome (df_1) at the contigs level, while the assemblers ABySS, IDBA-UD, and Velvet have better g1 metrics performance than df_1 at the scaffolds level (see Fig. S1; Additional file 2). For example, the assemblies of ABySS, IDBA-UD, and SPAdes have superior N50 size (see Figure 1(a)). Based on the g1 metrics at the contigs level, ABySS is the highest quality assembler, whereas Minia is the lowest quality assembler for the BcDw1 dataset. Similarly for g1 metrics, IDBA-UD is the highest quality assembler, whereas SOAP is the lowest quality assembler at the scaffolds level. Furthermore, SOAP obtains consistent quality at both contigs as well as scaffolds levels. Sparse generates the large percentage of chaff bases length at both contigs and scaffolds levels, which makes it a low quality assembler in g2 metrics. There are no gaps at contigs level for all assemblers except ABySS. Velvet, on the other hand, produced a huge number of gaps with respect to other assemblers at the scaffolds level. Based on the g2 metrics, at the scaffolds level, IDBA-UD and SPAdes are high quality assemblers, whereas ABySS, Sparse, and Velvet become low quality assemblers from the problems (g2) metrics point of view. At the contigs level, the ABySS, IDBA-UD, and Velvet assemblers have better conservation metrics (g3) rank with respect to other assemblers followed by SOAP. While at the scaffolds level, SOAP has better g3 rank followed by SPAdes. Overall, SPAdes and IDBA-UD have the best quality ranks at the contigs and scaffolds level, respectively (see Tables 5, 6).


Assessment of de novo assemblers for draft genomes: a case study with fungal genomes.

Abbas MM, Malluhi QM, Balakrishnan P - BMC Genomics (2014)

Comparison for N50 size metric for the studied assemblers at contigs level. (a): BcDw1, (b): UCRNP2, (c): UCRPA7, (d): UCREL1, (e): PST21.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4290589&req=5

Figure 1: Comparison for N50 size metric for the studied assemblers at contigs level. (a): BcDw1, (b): UCRNP2, (c): UCRPA7, (d): UCREL1, (e): PST21.
Mentions: In (Table-S1; Additional file 1), which gives the quality metrics of all assemblers for the BcDw1 dataset, the assemblers ABySS, IDBA-UD, and SPAdes have better goodness (g1) metrics performance than the current draft genome (df_1) at the contigs level, while the assemblers ABySS, IDBA-UD, and Velvet have better g1 metrics performance than df_1 at the scaffolds level (see Fig. S1; Additional file 2). For example, the assemblies of ABySS, IDBA-UD, and SPAdes have superior N50 size (see Figure 1(a)). Based on the g1 metrics at the contigs level, ABySS is the highest quality assembler, whereas Minia is the lowest quality assembler for the BcDw1 dataset. Similarly for g1 metrics, IDBA-UD is the highest quality assembler, whereas SOAP is the lowest quality assembler at the scaffolds level. Furthermore, SOAP obtains consistent quality at both contigs as well as scaffolds levels. Sparse generates the large percentage of chaff bases length at both contigs and scaffolds levels, which makes it a low quality assembler in g2 metrics. There are no gaps at contigs level for all assemblers except ABySS. Velvet, on the other hand, produced a huge number of gaps with respect to other assemblers at the scaffolds level. Based on the g2 metrics, at the scaffolds level, IDBA-UD and SPAdes are high quality assemblers, whereas ABySS, Sparse, and Velvet become low quality assemblers from the problems (g2) metrics point of view. At the contigs level, the ABySS, IDBA-UD, and Velvet assemblers have better conservation metrics (g3) rank with respect to other assemblers followed by SOAP. While at the scaffolds level, SOAP has better g3 rank followed by SPAdes. Overall, SPAdes and IDBA-UD have the best quality ranks at the contigs and scaffolds level, respectively (see Tables 5, 6).

Bottom Line: We compared the performance of these assemblers by considering both computational as well as quality metrics.Our results demonstrate that the assemblers ABySS and IDBA-UD exhibit a good performance for the studied data from fungal genomes in terms of running time, memory, and quality.The results suggest that whole genome shotgun sequencing projects should make use of different assemblers by considering their merits.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Recently, large bio-projects dealing with the release of different genomes have transpired. Most of these projects use next-generation sequencing platforms. As a consequence, many de novo assembly tools have evolved to assemble the reads generated by these platforms. Each tool has its own inherent advantages and disadvantages, which make the selection of an appropriate tool a challenging task.

Results: We have evaluated the performance of frequently used de novo assemblers namely ABySS, IDBA-UD, Minia, SOAP, SPAdes, Sparse, and Velvet. These assemblers are assessed based on their output quality during the assembly process conducted over fungal data. We compared the performance of these assemblers by considering both computational as well as quality metrics. By analyzing these performance metrics, the assemblers are ranked and a procedure for choosing the candidate assembler is illustrated.

Conclusions: In this study, we propose an assessment method for the selection of de novo assemblers by considering their computational as well as quality metrics at the draft genome level. We divide the quality metrics into three groups: g1 measures the goodness of the assemblies, g2 measures the problems of the assemblies, and g3 measures the conservation elements in the assemblies. Our results demonstrate that the assemblers ABySS and IDBA-UD exhibit a good performance for the studied data from fungal genomes in terms of running time, memory, and quality. The results suggest that whole genome shotgun sequencing projects should make use of different assemblers by considering their merits.

Show MeSH