diArk--the database for eukaryotic genome and transcriptome assemblies in 2014.
Bottom Line: Eukaryotic genomes are the basis for understanding the complexity of life from populations to the molecular level.Recent technological innovations have revolutionized the speed of data generation enabling the sequencing of eukaryotic genomes and transcriptomes within days.In this new version of the database we have also integrated species, for which transcriptome assemblies are available, and we provide more analyses of assemblies.
Affiliation: Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, 37085, Germany firstname.lastname@example.org.Show MeSH
Mentions: diArk hosts and analyses whole-genome and transcriptome assemblies. Currently, the about 6000 assemblies comprise mitochondrial, chloroplast, apicoplast, nucleolar and nuclear genomic DNA and are made available to other services such as the gene reconstruction software WebScipio (21). The quality of genome assemblies can vary significantly (22). However, approaches resulting in excellent genomes for one species might not produce assemblies of similar quality in other cases. Therefore, diArk provides access to alternative assemblies and several measures for direct comparison such as number of contigs, genome size (larger = better), N50 value (higher = better), N50 length (higher = better), contig length distributions (A50 and N50 plots), sequencing coverage (higher = better), sequencing methods and used assembly software. Not only the number of alternative assemblies increased in the last years, but also the number of redundant species in terms of species diversity (Figure 2) increased. Redundant species include, for example, different strains of the same fungal species, different breeds of animals, different varieties of plants and different isolates of protozoa. Within diArk, the respective genome and transcriptome assemblies can directly be compared and the most suitable for a certain research hypothesis be identified. diArk also provides chaos game representations (CGRs), which are fingerprints of genomes, and frequency chaos game representations (FCGRs) at different resolutions, which can be used, for example, for phylogenetic reconstructions (23).
Affiliation: Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, 37085, Germany email@example.com.