Limits...
De novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome to identify putative genes involved in the aquatic adaptation and immune response.

Gui D, Jia K, Xia J, Yang L, Chen J, Wu Y, Yi M - PLoS ONE (2013)

Bottom Line: We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences.A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits.The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.

View Article: PubMed Central - PubMed

Affiliation: School of Life Sciences, Sun Yat-sen University, Guangzhou, P. R. China ; School of Marine Sciences, Sun Yat-sen University, Guangzhou, P. R. China.

ABSTRACT

Background: The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes.

Principal findings: We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10(-5)), respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits.

Conclusion: This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.

Show MeSH

Related in: MedlinePlus

Characterization of the assembled unigenes against NR protein databases.(A) E-value distribution of BLAST hits for the assembled unigenes with a cutoff of 1E-5. (B) Similarity distribution of the top BLAST hits for the assembled unigenes with a cutoff of 1E-5. (C) Species distribution of the top BLAST hits for the assembled unigenes with a cutoff of 1E-5.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3756080&req=5

pone-0072417-g002: Characterization of the assembled unigenes against NR protein databases.(A) E-value distribution of BLAST hits for the assembled unigenes with a cutoff of 1E-5. (B) Similarity distribution of the top BLAST hits for the assembled unigenes with a cutoff of 1E-5. (C) Species distribution of the top BLAST hits for the assembled unigenes with a cutoff of 1E-5.

Mentions: For validation and annotation of the assembled unigenes, all the assembled unigenes were searched against the NR, Swiss-Prot protein databases and NCBI nucleotide sequences database (NT) using BLASTx program (E-value<10−5). The results showed that 48,868 and 46,587 unigene sequences had BLAST hits to annotated proteins in NR and Swiss-Prot protein databases, respectively (Table 2). Analysis of the distributions of E-values indicated that 82.7% of the aligned sequences showed significant homologies to the entries in the NR database (E-value<10−15) (Fig. 2A). Further analysis of the similarity distributions indicated that 73.3% of matched sequences had alignment identities greater than 80% (Fig. 2B). A large part of the hits matched the sequences of Bos Taurus (24.8%), susscrofa (18.1%), and the others were identified within the reference protein databases of Equuscaballus (7.3%), Saimiriboliviensis (5.7%), Ailuropodamelanoleuca (5.4%), Canis lupus familiaris (4.8%), and Homo spapiens (4.7%), respectively (Fig. 2C). There were also many unigenes without any BLAST hit, which might represent additional genes that had not represented in the annotated protein databases or sequences that were too short to produce hits. In addition, BLASTx of the assembled unigene sequences against NT database resulted in the identification of 83,676 sequences with at least one significant alignment to an existing gene model (Table 2).


De novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome to identify putative genes involved in the aquatic adaptation and immune response.

Gui D, Jia K, Xia J, Yang L, Chen J, Wu Y, Yi M - PLoS ONE (2013)

Characterization of the assembled unigenes against NR protein databases.(A) E-value distribution of BLAST hits for the assembled unigenes with a cutoff of 1E-5. (B) Similarity distribution of the top BLAST hits for the assembled unigenes with a cutoff of 1E-5. (C) Species distribution of the top BLAST hits for the assembled unigenes with a cutoff of 1E-5.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3756080&req=5

pone-0072417-g002: Characterization of the assembled unigenes against NR protein databases.(A) E-value distribution of BLAST hits for the assembled unigenes with a cutoff of 1E-5. (B) Similarity distribution of the top BLAST hits for the assembled unigenes with a cutoff of 1E-5. (C) Species distribution of the top BLAST hits for the assembled unigenes with a cutoff of 1E-5.
Mentions: For validation and annotation of the assembled unigenes, all the assembled unigenes were searched against the NR, Swiss-Prot protein databases and NCBI nucleotide sequences database (NT) using BLASTx program (E-value<10−5). The results showed that 48,868 and 46,587 unigene sequences had BLAST hits to annotated proteins in NR and Swiss-Prot protein databases, respectively (Table 2). Analysis of the distributions of E-values indicated that 82.7% of the aligned sequences showed significant homologies to the entries in the NR database (E-value<10−15) (Fig. 2A). Further analysis of the similarity distributions indicated that 73.3% of matched sequences had alignment identities greater than 80% (Fig. 2B). A large part of the hits matched the sequences of Bos Taurus (24.8%), susscrofa (18.1%), and the others were identified within the reference protein databases of Equuscaballus (7.3%), Saimiriboliviensis (5.7%), Ailuropodamelanoleuca (5.4%), Canis lupus familiaris (4.8%), and Homo spapiens (4.7%), respectively (Fig. 2C). There were also many unigenes without any BLAST hit, which might represent additional genes that had not represented in the annotated protein databases or sequences that were too short to produce hits. In addition, BLASTx of the assembled unigene sequences against NT database resulted in the identification of 83,676 sequences with at least one significant alignment to an existing gene model (Table 2).

Bottom Line: We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences.A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits.The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.

View Article: PubMed Central - PubMed

Affiliation: School of Life Sciences, Sun Yat-sen University, Guangzhou, P. R. China ; School of Marine Sciences, Sun Yat-sen University, Guangzhou, P. R. China.

ABSTRACT

Background: The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes.

Principal findings: We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10(-5)), respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits.

Conclusion: This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.

Show MeSH
Related in: MedlinePlus