Limits...
De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.).

Fu N, Wang Q, Shen HL - PLoS ONE (2013)

Bottom Line: Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.Our results provide a valuable resource for celery research.The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

View Article: PubMed Central - PubMed

Affiliation: College of Agriculture and Biotechnology, China Agricultural University, Beijing, China.

ABSTRACT

Background: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding.

Principal findings: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp) that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI) non-redundant protein database (Nr) and Swiss-Prot database respectively, and 10,473 (24.77%) unigenes were assigned to Clusters of Orthologous Groups (COG). 21,126 (49.97%) unigenes harboring Interpro domains were annotated, in which 15,409 (36.45%) were assigned to Gene Ontology(GO) categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.

Conclusions: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

Show MeSH
Frequency distribution of SSRs based on motif types.The GA/TC di-nucleotide repeat motif was the most abundant motif detected.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3585167&req=5

pone-0057686-g007: Frequency distribution of SSRs based on motif types.The GA/TC di-nucleotide repeat motif was the most abundant motif detected.

Mentions: All of the 42,280 unigenes generated in this study were used to mine potential microsatellites using MISA software. A total of 2,997 SSRs were identified in 2,601 unigenes. Of all the SSR containing unigenes, 339 sequences contained more than one SSR and 472 SSRs were present in compound form (Table 1). And we found 1 SSR per 10 Kb on average in this study, which was significantly different from the frequency in coffee (1 SSR per 1.5 Kb) [50] and in cotton (1 SSR per 20 Kb) [51]. Among the 2,997 SSRs identified, the di-nucleotide repeat motifs were the most abundant types (43.98%), followed by mono- (32.93%), tri- (21.62%), tetra- (0.97%), hexa- (0.13%) and penta-nucleotide (0.37%) repeat motifs. The frequencies of EST-SSRs with different numbers of tandem repeats were summarized in Table 2. SSRs with six tandem repeats (22.62%) were the most common, followed by ten tandem repeats (18.75%), five tandem repeats (12.98%), and seven tandem repeats (12.95%). Given that the mono-nucleotide repeats may not be accurate because of the sequencing errors and assembly mistakes, mono-nucleotide repeats were not used for further analysis. The most common type of di-nucleotide was GA/TC which accounted for 29.55% of the repeats, followed by AG/CT (20.15%) and CA/TG (5.57%). Among the tri-nucleotide repeats, the TGA/TCA (2.74%) and CCA/TGG (2.39%) were the most frequent motifs (Figure 7).


De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.).

Fu N, Wang Q, Shen HL - PLoS ONE (2013)

Frequency distribution of SSRs based on motif types.The GA/TC di-nucleotide repeat motif was the most abundant motif detected.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3585167&req=5

pone-0057686-g007: Frequency distribution of SSRs based on motif types.The GA/TC di-nucleotide repeat motif was the most abundant motif detected.
Mentions: All of the 42,280 unigenes generated in this study were used to mine potential microsatellites using MISA software. A total of 2,997 SSRs were identified in 2,601 unigenes. Of all the SSR containing unigenes, 339 sequences contained more than one SSR and 472 SSRs were present in compound form (Table 1). And we found 1 SSR per 10 Kb on average in this study, which was significantly different from the frequency in coffee (1 SSR per 1.5 Kb) [50] and in cotton (1 SSR per 20 Kb) [51]. Among the 2,997 SSRs identified, the di-nucleotide repeat motifs were the most abundant types (43.98%), followed by mono- (32.93%), tri- (21.62%), tetra- (0.97%), hexa- (0.13%) and penta-nucleotide (0.37%) repeat motifs. The frequencies of EST-SSRs with different numbers of tandem repeats were summarized in Table 2. SSRs with six tandem repeats (22.62%) were the most common, followed by ten tandem repeats (18.75%), five tandem repeats (12.98%), and seven tandem repeats (12.95%). Given that the mono-nucleotide repeats may not be accurate because of the sequencing errors and assembly mistakes, mono-nucleotide repeats were not used for further analysis. The most common type of di-nucleotide was GA/TC which accounted for 29.55% of the repeats, followed by AG/CT (20.15%) and CA/TG (5.57%). Among the tri-nucleotide repeats, the TGA/TCA (2.74%) and CCA/TGG (2.39%) were the most frequent motifs (Figure 7).

Bottom Line: Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.Our results provide a valuable resource for celery research.The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

View Article: PubMed Central - PubMed

Affiliation: College of Agriculture and Biotechnology, China Agricultural University, Beijing, China.

ABSTRACT

Background: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding.

Principal findings: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp) that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI) non-redundant protein database (Nr) and Swiss-Prot database respectively, and 10,473 (24.77%) unigenes were assigned to Clusters of Orthologous Groups (COG). 21,126 (49.97%) unigenes harboring Interpro domains were annotated, in which 15,409 (36.45%) were assigned to Gene Ontology(GO) categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.

Conclusions: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

Show MeSH