Limits...
De novo transcriptome assembly of the wild relative of tea tree (Camellia taliensis) and comparative analysis with tea transcriptome identified putative genes associated with tea quality and stress response.

Zhang HB, Xia EH, Huang H, Jiang JJ, Liu BY, Gao LZ - BMC Genomics (2015)

Bottom Line: To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species.Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

View Article: PubMed Central - PubMed

Affiliation: Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, the Chinese Academy of Sciences, Kunming, 650204, China. zhanghaibin@mail.kib.ac.cn.

ABSTRACT

Background: Camellia taliensis is one of the most important wild relatives of cultivated tea tree, C. sinensis. The species extensively occupies mountainous habitats representing a wide-range abiotic tolerance and biotic resistance and thus harbors valuable gene resources that may greatly benefit genetic improvement of cultivated tea tree. However, owning to a large genome size of ~3 Gb and structurally complex genome, there are fairly limited genetic information and particularly few genomic resources publicly available for this species. To better understand the key pathways determining tea flavor and enhance tea tree breeding programs, we performed a high-throughput transcriptome sequencing for C. taliensis.

Results: In this study, approximate 241.5 million high-quality paired-end reads, accounting for ~24 Gb of sequence data, were generated from tender shoots, young leaves, flower buds and flowers using Illumina HiSeq 2000 platform. De novo assembly with further processing and filtering yielded a set of 67,923 transcripts with an average length of 685 bp and an N50 of 995 bp. Based on sequence similarity searches against public databases, a total of 39,475 transcripts were annotated with gene descriptions, conserved protein domains or gene ontology (GO) terms. Candidate genes for major metabolic pathways involved in tea quality were identified and experimentally validated using RT-qPCR. Further gene expression profiles showed that they are differentially regulated at different developmental stages. To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species. Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.

Conclusion: We reported the first large-coverage transcriptome datasets for C. taliensis using the next-generation sequencing technology. Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

No MeSH data available.


Summary of the C. taliensis transcriptome assembly. (a) Size distribution of the assembled unigenes. (b) Random distribution of the sequencing reads in the unigenes. The x-axis indicates the relative position in the unigenes. The orientation is from 5’ end to 3’ end.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4404113&req=5

Fig1: Summary of the C. taliensis transcriptome assembly. (a) Size distribution of the assembled unigenes. (b) Random distribution of the sequencing reads in the unigenes. The x-axis indicates the relative position in the unigenes. The orientation is from 5’ end to 3’ end.

Mentions: To comprehensively construct the complete transcriptome of C. taliensis, four tissues representing various development stages, including tender shoots, young leaves, flower buds and flowers, were harvested for RNA isolation. Following the Illumina manufacturer’s instructions (Illumina, San Diego, CA, USA), the shotgun libraries were constructed and used for sequencing with the Illumina High-Seq 2000 platform. In total, ~241.5 million paired-end reads with a read length of 100 bp were generated (see Table 1). After quality checks, trimming of adapter, and size selection, de novo assembly was performed using Trinity [12] and 278,085 transcripts were reconstructed. To reduce redundancy and potential assembly errors, we clustered 278,085 transcripts into 145,738 unigenes using CD-HIT [13], and then filtered out those likely artifact transcripts with its FPKM (Fragments Per Kilobase per Million mapped fragments) values less than 1. As a result, a final high-quality dataset of 67,923 transcripts longer than 200 bp with an average length of 685 bp and an N50 of 995 bp was obtained (see Table 2). The size distribution for them is shown in Figure 1a. To evaluate the quality of the assembly, we randomly selected six transcripts to design primer pairs for RT-PCR amplifications. In this experiment, 5 out of 6 primer pairs experimentally resulted in bands of the expected sizes, and the identity of all five PCR products were confirmed by Sanger sequencing (data not shown). In addition, we analyzed the sequencing bias via detecting random distribution of reads in ORF from the assembled transcripts (see Figure 1b). Although the 3’ ends of all ORFs contained relatively fewer numbers of reads, other positions of all ORFs showed greater numbers and more even distribution. These experimental validation and data analyses suggest that the quality of our dataset is comparable to similar reports in other non-model plant species [4,14].Table 1


De novo transcriptome assembly of the wild relative of tea tree (Camellia taliensis) and comparative analysis with tea transcriptome identified putative genes associated with tea quality and stress response.

Zhang HB, Xia EH, Huang H, Jiang JJ, Liu BY, Gao LZ - BMC Genomics (2015)

Summary of the C. taliensis transcriptome assembly. (a) Size distribution of the assembled unigenes. (b) Random distribution of the sequencing reads in the unigenes. The x-axis indicates the relative position in the unigenes. The orientation is from 5’ end to 3’ end.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4404113&req=5

Fig1: Summary of the C. taliensis transcriptome assembly. (a) Size distribution of the assembled unigenes. (b) Random distribution of the sequencing reads in the unigenes. The x-axis indicates the relative position in the unigenes. The orientation is from 5’ end to 3’ end.
Mentions: To comprehensively construct the complete transcriptome of C. taliensis, four tissues representing various development stages, including tender shoots, young leaves, flower buds and flowers, were harvested for RNA isolation. Following the Illumina manufacturer’s instructions (Illumina, San Diego, CA, USA), the shotgun libraries were constructed and used for sequencing with the Illumina High-Seq 2000 platform. In total, ~241.5 million paired-end reads with a read length of 100 bp were generated (see Table 1). After quality checks, trimming of adapter, and size selection, de novo assembly was performed using Trinity [12] and 278,085 transcripts were reconstructed. To reduce redundancy and potential assembly errors, we clustered 278,085 transcripts into 145,738 unigenes using CD-HIT [13], and then filtered out those likely artifact transcripts with its FPKM (Fragments Per Kilobase per Million mapped fragments) values less than 1. As a result, a final high-quality dataset of 67,923 transcripts longer than 200 bp with an average length of 685 bp and an N50 of 995 bp was obtained (see Table 2). The size distribution for them is shown in Figure 1a. To evaluate the quality of the assembly, we randomly selected six transcripts to design primer pairs for RT-PCR amplifications. In this experiment, 5 out of 6 primer pairs experimentally resulted in bands of the expected sizes, and the identity of all five PCR products were confirmed by Sanger sequencing (data not shown). In addition, we analyzed the sequencing bias via detecting random distribution of reads in ORF from the assembled transcripts (see Figure 1b). Although the 3’ ends of all ORFs contained relatively fewer numbers of reads, other positions of all ORFs showed greater numbers and more even distribution. These experimental validation and data analyses suggest that the quality of our dataset is comparable to similar reports in other non-model plant species [4,14].Table 1

Bottom Line: To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species.Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

View Article: PubMed Central - PubMed

Affiliation: Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, the Chinese Academy of Sciences, Kunming, 650204, China. zhanghaibin@mail.kib.ac.cn.

ABSTRACT

Background: Camellia taliensis is one of the most important wild relatives of cultivated tea tree, C. sinensis. The species extensively occupies mountainous habitats representing a wide-range abiotic tolerance and biotic resistance and thus harbors valuable gene resources that may greatly benefit genetic improvement of cultivated tea tree. However, owning to a large genome size of ~3 Gb and structurally complex genome, there are fairly limited genetic information and particularly few genomic resources publicly available for this species. To better understand the key pathways determining tea flavor and enhance tea tree breeding programs, we performed a high-throughput transcriptome sequencing for C. taliensis.

Results: In this study, approximate 241.5 million high-quality paired-end reads, accounting for ~24 Gb of sequence data, were generated from tender shoots, young leaves, flower buds and flowers using Illumina HiSeq 2000 platform. De novo assembly with further processing and filtering yielded a set of 67,923 transcripts with an average length of 685 bp and an N50 of 995 bp. Based on sequence similarity searches against public databases, a total of 39,475 transcripts were annotated with gene descriptions, conserved protein domains or gene ontology (GO) terms. Candidate genes for major metabolic pathways involved in tea quality were identified and experimentally validated using RT-qPCR. Further gene expression profiles showed that they are differentially regulated at different developmental stages. To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species. Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.

Conclusion: We reported the first large-coverage transcriptome datasets for C. taliensis using the next-generation sequencing technology. Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

No MeSH data available.