Limits...
De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.).

Fu N, Wang Q, Shen HL - PLoS ONE (2013)

Bottom Line: Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.Our results provide a valuable resource for celery research.The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

View Article: PubMed Central - PubMed

Affiliation: College of Agriculture and Biotechnology, China Agricultural University, Beijing, China.

ABSTRACT

Background: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding.

Principal findings: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp) that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI) non-redundant protein database (Nr) and Swiss-Prot database respectively, and 10,473 (24.77%) unigenes were assigned to Clusters of Orthologous Groups (COG). 21,126 (49.97%) unigenes harboring Interpro domains were annotated, in which 15,409 (36.45%) were assigned to Gene Ontology(GO) categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.

Conclusions: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

Show MeSH
Length distribution of the celery unigenes de novo assembled from 42280 ESTs.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3585167&req=5

pone-0057686-g001: Length distribution of the celery unigenes de novo assembled from 42280 ESTs.

Mentions: The paired-end sequencing yielded 2×101-bp reads from either end of the cDNA fragment. In this study, a total of 27,154,728 raw sequencing reads were generated from a 200 bp insert library, encompassing 2.7 Gb of sequence data. After stringent quality assessment and data filtering, reads with Q20 bases (those with a base quality greater than 20) were selected as high quality reads for further analysis. 25,915,104 (89.33%) reads were deemed high quality reads, of which 621,320 (2.4%) were ribosomal. The sequences have been deposited in DDBJ Sequence Read Archive (DRA, http://trace.ddbj.nig.ac.jp/dra/) with accession number DRA000903. We used the Trinity method with optimized k-mer length of 31 for de novo assembly. Finally all short sequences were assembled into 42,280 unigenes with an average length of 502.6 bp and a median length of 604 bp. There were 22,262,653 (85.91%) reads assembled into transcripts. The number of reads aligned to each unigene ranges form 1 to 92,345, with an average number of 228. The majority of the reads were in the range of 200–400 bp, which accounted for 53.46%. There were 15,906 unigenes (37.62%) in the length range of 401 to 1000 bp and 3,769 unigenes (8.91%) with length more than 1000 bp (Figure 1).


De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.).

Fu N, Wang Q, Shen HL - PLoS ONE (2013)

Length distribution of the celery unigenes de novo assembled from 42280 ESTs.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3585167&req=5

pone-0057686-g001: Length distribution of the celery unigenes de novo assembled from 42280 ESTs.
Mentions: The paired-end sequencing yielded 2×101-bp reads from either end of the cDNA fragment. In this study, a total of 27,154,728 raw sequencing reads were generated from a 200 bp insert library, encompassing 2.7 Gb of sequence data. After stringent quality assessment and data filtering, reads with Q20 bases (those with a base quality greater than 20) were selected as high quality reads for further analysis. 25,915,104 (89.33%) reads were deemed high quality reads, of which 621,320 (2.4%) were ribosomal. The sequences have been deposited in DDBJ Sequence Read Archive (DRA, http://trace.ddbj.nig.ac.jp/dra/) with accession number DRA000903. We used the Trinity method with optimized k-mer length of 31 for de novo assembly. Finally all short sequences were assembled into 42,280 unigenes with an average length of 502.6 bp and a median length of 604 bp. There were 22,262,653 (85.91%) reads assembled into transcripts. The number of reads aligned to each unigene ranges form 1 to 92,345, with an average number of 228. The majority of the reads were in the range of 200–400 bp, which accounted for 53.46%. There were 15,906 unigenes (37.62%) in the length range of 401 to 1000 bp and 3,769 unigenes (8.91%) with length more than 1000 bp (Figure 1).

Bottom Line: Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.Our results provide a valuable resource for celery research.The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

View Article: PubMed Central - PubMed

Affiliation: College of Agriculture and Biotechnology, China Agricultural University, Beijing, China.

ABSTRACT

Background: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding.

Principal findings: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp) that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI) non-redundant protein database (Nr) and Swiss-Prot database respectively, and 10,473 (24.77%) unigenes were assigned to Clusters of Orthologous Groups (COG). 21,126 (49.97%) unigenes harboring Interpro domains were annotated, in which 15,409 (36.45%) were assigned to Gene Ontology(GO) categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.

Conclusions: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

Show MeSH