Limits...
De Novo Assembly and Characterization of the Invasive Northern Pacific Seastar Transcriptome.

Richardson MF, Sherman CD - PLoS ONE (2015)

Bottom Line: Error correction resulted in small but important improvements to the final assembly in terms of mapping statistics and core eukaryotic genes representation.The error-corrected de novo assembly resulted in 115,654 contigs after redundancy clustering. 41,667 assembled contigs were homologous to sequences from NCBI's non-redundant protein and UniProt databases.Our data can be used to study the genetic basis of adaptive change and other important evolutionary processes during a successful invasion.

View Article: PubMed Central - PubMed

Affiliation: Deakin University, Geelong, Australia. School of Life and Environmental Sciences, Centre for Integrative Ecology, (Waurn Ponds Campus). 75 Pigdons Road. Locked Bag 20000, Geelong, VIC 3220, Australia.

ABSTRACT
Invasive species are a major threat to global biodiversity but can also serve as valuable model systems to examine important evolutionary processes. While the ecological aspects of invasions have been well documented, the genetic basis of adaptive change during the invasion process has been hampered by a lack of genomic resources for the majority of invasive species. Here we report the first larval transcriptomic resource for the Northern Pacific Seastar, Asterias amurensis, an invasive marine predator in Australia. Approximately 117.5 million 100 base-pair (bp) paired-end reads were sequenced from a single RNA-Seq library from a pooled set of full-sibling A. amurensis bipinnaria larvae. We evaluated the efficacy of a pre-assembly error correction pipeline on subsequent de novo assembly. Error correction resulted in small but important improvements to the final assembly in terms of mapping statistics and core eukaryotic genes representation. The error-corrected de novo assembly resulted in 115,654 contigs after redundancy clustering. 41,667 assembled contigs were homologous to sequences from NCBI's non-redundant protein and UniProt databases. We assigned Gene Ontology, KEGG Orthology, Pfam protein domain terms and predicted protein-coding sequences to > 36,000 contigs. The final transcriptome dataset generated here provides functional information for 18,319 unique proteins, comprising at least 11,355 expressed genes. Furthermore, we identified 9,739 orthologs to P. miniata proteins, evaluated our annotation pipeline and generated a list of 150 candidate genes for responses to several environmental stressors that may be important for adaptation of A. amurensis in the invasive range. Our study has produced a large set of A. amurensis RNA contigs with functional annotations that can serve as a resource for future comparisons to other echinoderm transcriptomes and gene expression studies. Our data can be used to study the genetic basis of adaptive change and other important evolutionary processes during a successful invasion.

No MeSH data available.


Gene Ontology (GO) annotations.The top 10 represented GO terms for each of the GO categories: Biological Process, Molecular Function and Cellular Component. GO functional annotations are derived from similarity to the protein databases (Swiss-Prot, TrEMBL and NCBI’s non-redundant database).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4631335&req=5

pone.0142003.g003: Gene Ontology (GO) annotations.The top 10 represented GO terms for each of the GO categories: Biological Process, Molecular Function and Cellular Component. GO functional annotations are derived from similarity to the protein databases (Swiss-Prot, TrEMBL and NCBI’s non-redundant database).

Mentions: To functionally categorize the A. amurensis contigs, we mapped the associated GO terms to the 41,667 contigs that had BLAST matches. In total, 258,322 GO terms were mapped to 36,465 annotated contigs. GO terms are divided into three GO categories, biological process, molecular function and cellular component, each containing 7,144; 2,704 and 1,091 unique GO terms, respectively. The top 10 GO assignments for each of the three categories are detailed in Fig 3. The top represented GO terms for biological process were transcription (2,346), regulation of transcription (1,423) and proteolysis (1,128). For molecular function the top represented terms are from binding domains; ATP binding (4,226), zinc ion binding (3,012) and metal ion binding (2,598). Lastly, the top cellular component GO terms were, integral to membrane (6,927), cytoplasm (6,338) and nucleus (6,294). We used the KEGG Automatic Annotation Server (KASS) to provide KEGG Orthology (KO) annotations to the annotated contigs. This resulted in 5,533 unique KO annotations to 24,929 contigs. The top 10 represented KO annotations are provided in (Fig 4) with the most represented being the KRAB-domain containing zinc finger protein (208), Notch (193) and DNAH: dynein heavy chain, axonemal (165).


De Novo Assembly and Characterization of the Invasive Northern Pacific Seastar Transcriptome.

Richardson MF, Sherman CD - PLoS ONE (2015)

Gene Ontology (GO) annotations.The top 10 represented GO terms for each of the GO categories: Biological Process, Molecular Function and Cellular Component. GO functional annotations are derived from similarity to the protein databases (Swiss-Prot, TrEMBL and NCBI’s non-redundant database).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4631335&req=5

pone.0142003.g003: Gene Ontology (GO) annotations.The top 10 represented GO terms for each of the GO categories: Biological Process, Molecular Function and Cellular Component. GO functional annotations are derived from similarity to the protein databases (Swiss-Prot, TrEMBL and NCBI’s non-redundant database).
Mentions: To functionally categorize the A. amurensis contigs, we mapped the associated GO terms to the 41,667 contigs that had BLAST matches. In total, 258,322 GO terms were mapped to 36,465 annotated contigs. GO terms are divided into three GO categories, biological process, molecular function and cellular component, each containing 7,144; 2,704 and 1,091 unique GO terms, respectively. The top 10 GO assignments for each of the three categories are detailed in Fig 3. The top represented GO terms for biological process were transcription (2,346), regulation of transcription (1,423) and proteolysis (1,128). For molecular function the top represented terms are from binding domains; ATP binding (4,226), zinc ion binding (3,012) and metal ion binding (2,598). Lastly, the top cellular component GO terms were, integral to membrane (6,927), cytoplasm (6,338) and nucleus (6,294). We used the KEGG Automatic Annotation Server (KASS) to provide KEGG Orthology (KO) annotations to the annotated contigs. This resulted in 5,533 unique KO annotations to 24,929 contigs. The top 10 represented KO annotations are provided in (Fig 4) with the most represented being the KRAB-domain containing zinc finger protein (208), Notch (193) and DNAH: dynein heavy chain, axonemal (165).

Bottom Line: Error correction resulted in small but important improvements to the final assembly in terms of mapping statistics and core eukaryotic genes representation.The error-corrected de novo assembly resulted in 115,654 contigs after redundancy clustering. 41,667 assembled contigs were homologous to sequences from NCBI's non-redundant protein and UniProt databases.Our data can be used to study the genetic basis of adaptive change and other important evolutionary processes during a successful invasion.

View Article: PubMed Central - PubMed

Affiliation: Deakin University, Geelong, Australia. School of Life and Environmental Sciences, Centre for Integrative Ecology, (Waurn Ponds Campus). 75 Pigdons Road. Locked Bag 20000, Geelong, VIC 3220, Australia.

ABSTRACT
Invasive species are a major threat to global biodiversity but can also serve as valuable model systems to examine important evolutionary processes. While the ecological aspects of invasions have been well documented, the genetic basis of adaptive change during the invasion process has been hampered by a lack of genomic resources for the majority of invasive species. Here we report the first larval transcriptomic resource for the Northern Pacific Seastar, Asterias amurensis, an invasive marine predator in Australia. Approximately 117.5 million 100 base-pair (bp) paired-end reads were sequenced from a single RNA-Seq library from a pooled set of full-sibling A. amurensis bipinnaria larvae. We evaluated the efficacy of a pre-assembly error correction pipeline on subsequent de novo assembly. Error correction resulted in small but important improvements to the final assembly in terms of mapping statistics and core eukaryotic genes representation. The error-corrected de novo assembly resulted in 115,654 contigs after redundancy clustering. 41,667 assembled contigs were homologous to sequences from NCBI's non-redundant protein and UniProt databases. We assigned Gene Ontology, KEGG Orthology, Pfam protein domain terms and predicted protein-coding sequences to > 36,000 contigs. The final transcriptome dataset generated here provides functional information for 18,319 unique proteins, comprising at least 11,355 expressed genes. Furthermore, we identified 9,739 orthologs to P. miniata proteins, evaluated our annotation pipeline and generated a list of 150 candidate genes for responses to several environmental stressors that may be important for adaptation of A. amurensis in the invasive range. Our study has produced a large set of A. amurensis RNA contigs with functional annotations that can serve as a resource for future comparisons to other echinoderm transcriptomes and gene expression studies. Our data can be used to study the genetic basis of adaptive change and other important evolutionary processes during a successful invasion.

No MeSH data available.