Limits...
Analysis of the transcriptome of the Indonesian coelacanth Latimeria menadoensis.

Pallavicini A, Canapa A, Barucca M, Alfőldi J, Biscotti MA, Buonocore F, De Moro G, Di Palma F, Fausto AM, Forconi M, Gerdol M, Makapedua DM, Turner-Meier J, Olmo E, Scapigliati G - BMC Genomics (2013)

Bottom Line: The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy.The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome.Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Latimeria menadoensis is a coelacanth species first identified in 1997 in Indonesia, at 10,000 Km of distance from its African congener. To date, only six specimens have been caught and just a very limited molecular data is available. In the present work we describe the de novo transcriptome assembly obtained from liver and testis samples collected from the fifth specimen ever caught of this species.

Results: The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy. The assembly output was processed and filtered producing a set of 66,308 contigs, whose quality was thoroughly assessed. The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome.

Conclusion: Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.

Show MeSH

Related in: MedlinePlus

Top BLAST hit species distribution, obtained by BLASTx against the NCBI non-redundant (nr) protein database. The number of top BLAST hits per species is shown on the x-axis. Only the 15 most represented species are shown. The complete number of top hits of other organisms is 7,572.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750513&req=5

Figure 4: Top BLAST hit species distribution, obtained by BLASTx against the NCBI non-redundant (nr) protein database. The number of top BLAST hits per species is shown on the x-axis. Only the 15 most represented species are shown. The complete number of top hits of other organisms is 7,572.

Mentions: The annotation performed with BLASTx to the NCBI non-redundant (nr) protein database revealed that 23,564 of the assembled contigs (35.54%) had at least one positive hit. 42,744 contigs did not give any BLAST hit by the cutoff of 1x10-6. The BLAST top hit species distribution is shown in Figure 4. The BLAST2GO annotation, directly performed on the high quality set of transcripts translated into the six possible reading frames, revealed 42,667 out of 66,308 total sequences bearing at least one InterPro domain, accounting for 64.35% of annotated transcripts. The list of the 25 most abundant InterPro domains is displayed in Table 3, with IPR000719 (Protein kinase, catalytic domain) being the most represented one, with 2,041 annotated transcripts, followed by IPR007087 (Zinc finger, C2H2) and IPR002290 (Serine/threonine-/dual-specificity protein kinase, catalytic domain). The assembled sequences were also annotated with Gene Ontology (GO) terms as described in the materials and methods section, according to the three major GO categories: Cell Component, Molecular Function, and Biological Process. A total of 28,502 transcripts (42.98%) were associated with at least one GO term; concerning the second level of ontology, 6,698 were assigned to a Cell Component category, 13,061 to a Molecular Function category, and 13,030 to a Biological Process category. The summary of Gene Ontology mappings is reported in Additional file 1: Figure S1f. Concerning the cellular localization, the majority of annotated transcripts was assigned to cell (GO:0005623), followed by organelle (GO:0043226) and macromolecular complex (GO:0032991). The largely predominant molecular functions resulted to be binding (GO:0005488) and catalytic activity (GO:0003824). Finally, concerning biological processes, cellular process (GO:0009987) and metabolic process (GO:0008152) were the two GO terms most represented.


Analysis of the transcriptome of the Indonesian coelacanth Latimeria menadoensis.

Pallavicini A, Canapa A, Barucca M, Alfőldi J, Biscotti MA, Buonocore F, De Moro G, Di Palma F, Fausto AM, Forconi M, Gerdol M, Makapedua DM, Turner-Meier J, Olmo E, Scapigliati G - BMC Genomics (2013)

Top BLAST hit species distribution, obtained by BLASTx against the NCBI non-redundant (nr) protein database. The number of top BLAST hits per species is shown on the x-axis. Only the 15 most represented species are shown. The complete number of top hits of other organisms is 7,572.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750513&req=5

Figure 4: Top BLAST hit species distribution, obtained by BLASTx against the NCBI non-redundant (nr) protein database. The number of top BLAST hits per species is shown on the x-axis. Only the 15 most represented species are shown. The complete number of top hits of other organisms is 7,572.
Mentions: The annotation performed with BLASTx to the NCBI non-redundant (nr) protein database revealed that 23,564 of the assembled contigs (35.54%) had at least one positive hit. 42,744 contigs did not give any BLAST hit by the cutoff of 1x10-6. The BLAST top hit species distribution is shown in Figure 4. The BLAST2GO annotation, directly performed on the high quality set of transcripts translated into the six possible reading frames, revealed 42,667 out of 66,308 total sequences bearing at least one InterPro domain, accounting for 64.35% of annotated transcripts. The list of the 25 most abundant InterPro domains is displayed in Table 3, with IPR000719 (Protein kinase, catalytic domain) being the most represented one, with 2,041 annotated transcripts, followed by IPR007087 (Zinc finger, C2H2) and IPR002290 (Serine/threonine-/dual-specificity protein kinase, catalytic domain). The assembled sequences were also annotated with Gene Ontology (GO) terms as described in the materials and methods section, according to the three major GO categories: Cell Component, Molecular Function, and Biological Process. A total of 28,502 transcripts (42.98%) were associated with at least one GO term; concerning the second level of ontology, 6,698 were assigned to a Cell Component category, 13,061 to a Molecular Function category, and 13,030 to a Biological Process category. The summary of Gene Ontology mappings is reported in Additional file 1: Figure S1f. Concerning the cellular localization, the majority of annotated transcripts was assigned to cell (GO:0005623), followed by organelle (GO:0043226) and macromolecular complex (GO:0032991). The largely predominant molecular functions resulted to be binding (GO:0005488) and catalytic activity (GO:0003824). Finally, concerning biological processes, cellular process (GO:0009987) and metabolic process (GO:0008152) were the two GO terms most represented.

Bottom Line: The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy.The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome.Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Latimeria menadoensis is a coelacanth species first identified in 1997 in Indonesia, at 10,000 Km of distance from its African congener. To date, only six specimens have been caught and just a very limited molecular data is available. In the present work we describe the de novo transcriptome assembly obtained from liver and testis samples collected from the fifth specimen ever caught of this species.

Results: The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy. The assembly output was processed and filtered producing a set of 66,308 contigs, whose quality was thoroughly assessed. The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome.

Conclusion: Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.

Show MeSH
Related in: MedlinePlus