Limits...
Analysis of the transcriptome of the Indonesian coelacanth Latimeria menadoensis.

Pallavicini A, Canapa A, Barucca M, Alfőldi J, Biscotti MA, Buonocore F, De Moro G, Di Palma F, Fausto AM, Forconi M, Gerdol M, Makapedua DM, Turner-Meier J, Olmo E, Scapigliati G - BMC Genomics (2013)

Bottom Line: The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy.The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome.Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Latimeria menadoensis is a coelacanth species first identified in 1997 in Indonesia, at 10,000 Km of distance from its African congener. To date, only six specimens have been caught and just a very limited molecular data is available. In the present work we describe the de novo transcriptome assembly obtained from liver and testis samples collected from the fifth specimen ever caught of this species.

Results: The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy. The assembly output was processed and filtered producing a set of 66,308 contigs, whose quality was thoroughly assessed. The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome.

Conclusion: Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.

Show MeSH

Related in: MedlinePlus

Comparison of contig length distribution before (red) and after (blue) the filtering step based on average sequence coverage. The reduction of the fraction of short contigs is represented by the shift of distribution towards the right side of the graph. x-axis: length categories, organized in 100 bp intervals. y-axis: percentage of contigs observed per length category.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750513&req=5

Figure 3: Comparison of contig length distribution before (red) and after (blue) the filtering step based on average sequence coverage. The reduction of the fraction of short contigs is represented by the shift of distribution towards the right side of the graph. x-axis: length categories, organized in 100 bp intervals. y-axis: percentage of contigs observed per length category.

Mentions: The goal of these assembly processing steps was to reduce redundancy without losing any valuable sequence data (Figure 2). Despite making use of a large fraction of the original sequencing reads (65.41% of the intact sequence pairs -fragments- could be mapped to the contigs), the raw Trinity assembly was largely redundant, as the mapping of the reads on the assembled contigs revealed 75% of non-specific matches. On the contrary the raw CLC assembly showed virtually no redundancy (~0.01%) but only 33% of sequenced fragments were used to produce the assembly. The sequence redundancy was drastically reduced to 19.21% after the removal of Trinity redundant contigs by MIRA with no loss of sequence data, as the total number of reads mapped on the updated assembly slightly increased (+1.19%) due to the elongation of 8,496 Trinity contigs by CLC. Although a large portion of contigs with low expression was discarded (39,342, accounting for 37.24% out of the total), this did not significantly affect the total number of mapped reads (which only decreased by 0.34%) and contributed to a further reduction of sequence redundancy (which dropped to 17.39%). The comparison between sequence length categories based on average coverage, before and after the contig filtering step (Figure 3), revealed that this procedure was able to sensibly reduce the amount of short sequences, especially those shorter than 500 bp, moving the distribution of contig length towards longer and more reliable sequences.


Analysis of the transcriptome of the Indonesian coelacanth Latimeria menadoensis.

Pallavicini A, Canapa A, Barucca M, Alfőldi J, Biscotti MA, Buonocore F, De Moro G, Di Palma F, Fausto AM, Forconi M, Gerdol M, Makapedua DM, Turner-Meier J, Olmo E, Scapigliati G - BMC Genomics (2013)

Comparison of contig length distribution before (red) and after (blue) the filtering step based on average sequence coverage. The reduction of the fraction of short contigs is represented by the shift of distribution towards the right side of the graph. x-axis: length categories, organized in 100 bp intervals. y-axis: percentage of contigs observed per length category.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750513&req=5

Figure 3: Comparison of contig length distribution before (red) and after (blue) the filtering step based on average sequence coverage. The reduction of the fraction of short contigs is represented by the shift of distribution towards the right side of the graph. x-axis: length categories, organized in 100 bp intervals. y-axis: percentage of contigs observed per length category.
Mentions: The goal of these assembly processing steps was to reduce redundancy without losing any valuable sequence data (Figure 2). Despite making use of a large fraction of the original sequencing reads (65.41% of the intact sequence pairs -fragments- could be mapped to the contigs), the raw Trinity assembly was largely redundant, as the mapping of the reads on the assembled contigs revealed 75% of non-specific matches. On the contrary the raw CLC assembly showed virtually no redundancy (~0.01%) but only 33% of sequenced fragments were used to produce the assembly. The sequence redundancy was drastically reduced to 19.21% after the removal of Trinity redundant contigs by MIRA with no loss of sequence data, as the total number of reads mapped on the updated assembly slightly increased (+1.19%) due to the elongation of 8,496 Trinity contigs by CLC. Although a large portion of contigs with low expression was discarded (39,342, accounting for 37.24% out of the total), this did not significantly affect the total number of mapped reads (which only decreased by 0.34%) and contributed to a further reduction of sequence redundancy (which dropped to 17.39%). The comparison between sequence length categories based on average coverage, before and after the contig filtering step (Figure 3), revealed that this procedure was able to sensibly reduce the amount of short sequences, especially those shorter than 500 bp, moving the distribution of contig length towards longer and more reliable sequences.

Bottom Line: The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy.The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome.Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Latimeria menadoensis is a coelacanth species first identified in 1997 in Indonesia, at 10,000 Km of distance from its African congener. To date, only six specimens have been caught and just a very limited molecular data is available. In the present work we describe the de novo transcriptome assembly obtained from liver and testis samples collected from the fifth specimen ever caught of this species.

Results: The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy. The assembly output was processed and filtered producing a set of 66,308 contigs, whose quality was thoroughly assessed. The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome.

Conclusion: Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.

Show MeSH
Related in: MedlinePlus