Limits...
De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.).

Fu N, Wang Q, Shen HL - PLoS ONE (2013)

Bottom Line: Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.Our results provide a valuable resource for celery research.The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

View Article: PubMed Central - PubMed

Affiliation: College of Agriculture and Biotechnology, China Agricultural University, Beijing, China.

ABSTRACT

Background: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding.

Principal findings: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp) that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI) non-redundant protein database (Nr) and Swiss-Prot database respectively, and 10,473 (24.77%) unigenes were assigned to Clusters of Orthologous Groups (COG). 21,126 (49.97%) unigenes harboring Interpro domains were annotated, in which 15,409 (36.45%) were assigned to Gene Ontology(GO) categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.

Conclusions: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

Show MeSH

Related in: MedlinePlus

Characteristics of similarity search of unigenes against Nr and Swiss-Prot databases.(A) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Nr database. (B) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Swiss-Prot database. (C) Similarity distribution of the top BLAST hits for each unigene in Nr database. (D) Similarity distribution of the top BLAST hits for each unigene in Swiss-Prot database.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3585167&req=5

pone-0057686-g003: Characteristics of similarity search of unigenes against Nr and Swiss-Prot databases.(A) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Nr database. (B) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Swiss-Prot database. (C) Similarity distribution of the top BLAST hits for each unigene in Nr database. (D) Similarity distribution of the top BLAST hits for each unigene in Swiss-Prot database.

Mentions: A sequence similarity search was conducted against the Nr database (E-value<1e−5) and Swiss-Prot database (E-value<1e−10) using the BLASTx algorithm, with the outcome that 33,160 (78.43%) and 20,686 (48.93%) unigenes showed homology with sequences in the Nr and Swiss-Prot database respectively. Our results showed that more than 80% of unigenes over 400 bp in length had BLAST matches against Nr database, whereas only 65.34% of unigenes shorter than 300 bp did (Figure 2). The same tendency was observed in the BLAST against Swiss-Prot database. We also made analysis of E-value and similarity distributions of the top hits in the Nr database. There were 35.92% and 35.49% of the sequences showing significant homology (E-value<1e−50) and high similarity (greater than 80%), respectively (Figure 3A and 3C). For E-value and similarity distributions of the top hits in the Swiss-Prot database, the percentages were 32.87% and 28.71% (Figure 3B and 3D).


De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.).

Fu N, Wang Q, Shen HL - PLoS ONE (2013)

Characteristics of similarity search of unigenes against Nr and Swiss-Prot databases.(A) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Nr database. (B) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Swiss-Prot database. (C) Similarity distribution of the top BLAST hits for each unigene in Nr database. (D) Similarity distribution of the top BLAST hits for each unigene in Swiss-Prot database.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3585167&req=5

pone-0057686-g003: Characteristics of similarity search of unigenes against Nr and Swiss-Prot databases.(A) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Nr database. (B) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Swiss-Prot database. (C) Similarity distribution of the top BLAST hits for each unigene in Nr database. (D) Similarity distribution of the top BLAST hits for each unigene in Swiss-Prot database.
Mentions: A sequence similarity search was conducted against the Nr database (E-value<1e−5) and Swiss-Prot database (E-value<1e−10) using the BLASTx algorithm, with the outcome that 33,160 (78.43%) and 20,686 (48.93%) unigenes showed homology with sequences in the Nr and Swiss-Prot database respectively. Our results showed that more than 80% of unigenes over 400 bp in length had BLAST matches against Nr database, whereas only 65.34% of unigenes shorter than 300 bp did (Figure 2). The same tendency was observed in the BLAST against Swiss-Prot database. We also made analysis of E-value and similarity distributions of the top hits in the Nr database. There were 35.92% and 35.49% of the sequences showing significant homology (E-value<1e−50) and high similarity (greater than 80%), respectively (Figure 3A and 3C). For E-value and similarity distributions of the top hits in the Swiss-Prot database, the percentages were 32.87% and 28.71% (Figure 3B and 3D).

Bottom Line: Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.Our results provide a valuable resource for celery research.The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

View Article: PubMed Central - PubMed

Affiliation: College of Agriculture and Biotechnology, China Agricultural University, Beijing, China.

ABSTRACT

Background: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding.

Principal findings: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp) that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI) non-redundant protein database (Nr) and Swiss-Prot database respectively, and 10,473 (24.77%) unigenes were assigned to Clusters of Orthologous Groups (COG). 21,126 (49.97%) unigenes harboring Interpro domains were annotated, in which 15,409 (36.45%) were assigned to Gene Ontology(GO) categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.

Conclusions: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

Show MeSH
Related in: MedlinePlus