Limits...
De novo transcriptome assembly of the wild relative of tea tree (Camellia taliensis) and comparative analysis with tea transcriptome identified putative genes associated with tea quality and stress response.

Zhang HB, Xia EH, Huang H, Jiang JJ, Liu BY, Gao LZ - BMC Genomics (2015)

Bottom Line: To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species.Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

View Article: PubMed Central - PubMed

Affiliation: Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, the Chinese Academy of Sciences, Kunming, 650204, China. zhanghaibin@mail.kib.ac.cn.

ABSTRACT

Background: Camellia taliensis is one of the most important wild relatives of cultivated tea tree, C. sinensis. The species extensively occupies mountainous habitats representing a wide-range abiotic tolerance and biotic resistance and thus harbors valuable gene resources that may greatly benefit genetic improvement of cultivated tea tree. However, owning to a large genome size of ~3 Gb and structurally complex genome, there are fairly limited genetic information and particularly few genomic resources publicly available for this species. To better understand the key pathways determining tea flavor and enhance tea tree breeding programs, we performed a high-throughput transcriptome sequencing for C. taliensis.

Results: In this study, approximate 241.5 million high-quality paired-end reads, accounting for ~24 Gb of sequence data, were generated from tender shoots, young leaves, flower buds and flowers using Illumina HiSeq 2000 platform. De novo assembly with further processing and filtering yielded a set of 67,923 transcripts with an average length of 685 bp and an N50 of 995 bp. Based on sequence similarity searches against public databases, a total of 39,475 transcripts were annotated with gene descriptions, conserved protein domains or gene ontology (GO) terms. Candidate genes for major metabolic pathways involved in tea quality were identified and experimentally validated using RT-qPCR. Further gene expression profiles showed that they are differentially regulated at different developmental stages. To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species. Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.

Conclusion: We reported the first large-coverage transcriptome datasets for C. taliensis using the next-generation sequencing technology. Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

No MeSH data available.


Related in: MedlinePlus

Characteristics of the homology search of unigenes against the NR database. (a) Effects of query sequence length on percentage of significant matches. The cut-off value was set at 1.0e-5. The proportion of sequences with matches in the NR database at NCBI is greater among the longer assembled sequences. (b) Similarity distribution of the best BLAST hits for each unigene. (c) E-value distribution of the top BLAST hits for each unigene. (d) Species distribution is shown as the percentage of the total homologous sequences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4404113&req=5

Fig2: Characteristics of the homology search of unigenes against the NR database. (a) Effects of query sequence length on percentage of significant matches. The cut-off value was set at 1.0e-5. The proportion of sequences with matches in the NR database at NCBI is greater among the longer assembled sequences. (b) Similarity distribution of the best BLAST hits for each unigene. (c) E-value distribution of the top BLAST hits for each unigene. (d) Species distribution is shown as the percentage of the total homologous sequences.

Mentions: To predict and analyze the function of the 67,923 transcripts, all transcripts sequences were first aligned against those sequences in the NCBI non-redundant (NR) protein database using BLASTx. A total of 38,947 significant BLAST top hits were returned with a cut-off E-value of 1e-5 (57.3% of all transcripts; see Table 2 and Additional file 1). As reported in the previous study [4], the length of transcript sequences is crucial in determining the efficiency of BLAST searches. Our results showed that 98% of the matching efficiency was observed for sequences longer than 2,000 bp, whereas the matching efficiency decreased to about 68% for those ranging from 500 to 1,000 bp and to 40% for sequences between 200 to 500 bp (see Figure 2a). The similarity distribution of the top hits in the nr database displayed that 38.9% of the mapped sequences had similarities higher than 80%, while 61.1% of the hits had similarities ranging from 20% to 80% (see Figure 2b). The E-value distribution had a comparable pattern with 49.6% of the mapped sequences with high homologies (smaller than 1e-50), whereas 50.4% of the homologous sequences ranged between 1e-5 and 1e-50 (see Figure 2c). For species distribution, 30.4% of the distinct sequences had the top matches (first hit) trained with sequences from the Vitis vinifera, followed by the Arabidopsis thaliana (12.4%), Theobroma cacao (9.3%), Solanum lycopersicum (5.9%) and Prunus persica (5.8%) (see Figure 2d).Figure 2


De novo transcriptome assembly of the wild relative of tea tree (Camellia taliensis) and comparative analysis with tea transcriptome identified putative genes associated with tea quality and stress response.

Zhang HB, Xia EH, Huang H, Jiang JJ, Liu BY, Gao LZ - BMC Genomics (2015)

Characteristics of the homology search of unigenes against the NR database. (a) Effects of query sequence length on percentage of significant matches. The cut-off value was set at 1.0e-5. The proportion of sequences with matches in the NR database at NCBI is greater among the longer assembled sequences. (b) Similarity distribution of the best BLAST hits for each unigene. (c) E-value distribution of the top BLAST hits for each unigene. (d) Species distribution is shown as the percentage of the total homologous sequences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4404113&req=5

Fig2: Characteristics of the homology search of unigenes against the NR database. (a) Effects of query sequence length on percentage of significant matches. The cut-off value was set at 1.0e-5. The proportion of sequences with matches in the NR database at NCBI is greater among the longer assembled sequences. (b) Similarity distribution of the best BLAST hits for each unigene. (c) E-value distribution of the top BLAST hits for each unigene. (d) Species distribution is shown as the percentage of the total homologous sequences.
Mentions: To predict and analyze the function of the 67,923 transcripts, all transcripts sequences were first aligned against those sequences in the NCBI non-redundant (NR) protein database using BLASTx. A total of 38,947 significant BLAST top hits were returned with a cut-off E-value of 1e-5 (57.3% of all transcripts; see Table 2 and Additional file 1). As reported in the previous study [4], the length of transcript sequences is crucial in determining the efficiency of BLAST searches. Our results showed that 98% of the matching efficiency was observed for sequences longer than 2,000 bp, whereas the matching efficiency decreased to about 68% for those ranging from 500 to 1,000 bp and to 40% for sequences between 200 to 500 bp (see Figure 2a). The similarity distribution of the top hits in the nr database displayed that 38.9% of the mapped sequences had similarities higher than 80%, while 61.1% of the hits had similarities ranging from 20% to 80% (see Figure 2b). The E-value distribution had a comparable pattern with 49.6% of the mapped sequences with high homologies (smaller than 1e-50), whereas 50.4% of the homologous sequences ranged between 1e-5 and 1e-50 (see Figure 2c). For species distribution, 30.4% of the distinct sequences had the top matches (first hit) trained with sequences from the Vitis vinifera, followed by the Arabidopsis thaliana (12.4%), Theobroma cacao (9.3%), Solanum lycopersicum (5.9%) and Prunus persica (5.8%) (see Figure 2d).Figure 2

Bottom Line: To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species.Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

View Article: PubMed Central - PubMed

Affiliation: Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, the Chinese Academy of Sciences, Kunming, 650204, China. zhanghaibin@mail.kib.ac.cn.

ABSTRACT

Background: Camellia taliensis is one of the most important wild relatives of cultivated tea tree, C. sinensis. The species extensively occupies mountainous habitats representing a wide-range abiotic tolerance and biotic resistance and thus harbors valuable gene resources that may greatly benefit genetic improvement of cultivated tea tree. However, owning to a large genome size of ~3 Gb and structurally complex genome, there are fairly limited genetic information and particularly few genomic resources publicly available for this species. To better understand the key pathways determining tea flavor and enhance tea tree breeding programs, we performed a high-throughput transcriptome sequencing for C. taliensis.

Results: In this study, approximate 241.5 million high-quality paired-end reads, accounting for ~24 Gb of sequence data, were generated from tender shoots, young leaves, flower buds and flowers using Illumina HiSeq 2000 platform. De novo assembly with further processing and filtering yielded a set of 67,923 transcripts with an average length of 685 bp and an N50 of 995 bp. Based on sequence similarity searches against public databases, a total of 39,475 transcripts were annotated with gene descriptions, conserved protein domains or gene ontology (GO) terms. Candidate genes for major metabolic pathways involved in tea quality were identified and experimentally validated using RT-qPCR. Further gene expression profiles showed that they are differentially regulated at different developmental stages. To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species. Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.

Conclusion: We reported the first large-coverage transcriptome datasets for C. taliensis using the next-generation sequencing technology. Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

No MeSH data available.


Related in: MedlinePlus