Limits...
De novo transcriptome assembly of the wild relative of tea tree (Camellia taliensis) and comparative analysis with tea transcriptome identified putative genes associated with tea quality and stress response.

Zhang HB, Xia EH, Huang H, Jiang JJ, Liu BY, Gao LZ - BMC Genomics (2015)

Bottom Line: To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species.Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

View Article: PubMed Central - PubMed

Affiliation: Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, the Chinese Academy of Sciences, Kunming, 650204, China. zhanghaibin@mail.kib.ac.cn.

ABSTRACT

Background: Camellia taliensis is one of the most important wild relatives of cultivated tea tree, C. sinensis. The species extensively occupies mountainous habitats representing a wide-range abiotic tolerance and biotic resistance and thus harbors valuable gene resources that may greatly benefit genetic improvement of cultivated tea tree. However, owning to a large genome size of ~3 Gb and structurally complex genome, there are fairly limited genetic information and particularly few genomic resources publicly available for this species. To better understand the key pathways determining tea flavor and enhance tea tree breeding programs, we performed a high-throughput transcriptome sequencing for C. taliensis.

Results: In this study, approximate 241.5 million high-quality paired-end reads, accounting for ~24 Gb of sequence data, were generated from tender shoots, young leaves, flower buds and flowers using Illumina HiSeq 2000 platform. De novo assembly with further processing and filtering yielded a set of 67,923 transcripts with an average length of 685 bp and an N50 of 995 bp. Based on sequence similarity searches against public databases, a total of 39,475 transcripts were annotated with gene descriptions, conserved protein domains or gene ontology (GO) terms. Candidate genes for major metabolic pathways involved in tea quality were identified and experimentally validated using RT-qPCR. Further gene expression profiles showed that they are differentially regulated at different developmental stages. To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species. Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.

Conclusion: We reported the first large-coverage transcriptome datasets for C. taliensis using the next-generation sequencing technology. Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

No MeSH data available.


Related in: MedlinePlus

Venn diagram showing the BLAST results of C. taliensis transcriptome against five databases. De novo reconstructed transcript sequences were used to search against public databases including NR, UniRef90, TAIR10, KOG and PFAM. The number of transcripts that have significant hits against the five databases is shown in each intersection of the Venn diagram.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4404113&req=5

Fig4: Venn diagram showing the BLAST results of C. taliensis transcriptome against five databases. De novo reconstructed transcript sequences were used to search against public databases including NR, UniRef90, TAIR10, KOG and PFAM. The number of transcripts that have significant hits against the five databases is shown in each intersection of the Venn diagram.

Mentions: Considering that the information of conserved domains within a gene was indicative of deducing genes’ function, we performed the annotation of potential domains inside the assembled transcripts. To facilitate this procedure, the open reading frame (ORF) for each transcript was first extracted using a set of programs included in the Trinity package (see Methods), and then all the transcripts with predicted ORFs were searched against the PFAM database using profile hidden Markov model methods. Overall, a total of 20,748 transcripts were categorized into 3,707 domains/families. Figure 3a shows the size distribution of each domains/families, suggesting that most domains were found to contain a small number of transcripts, with a small proportion seeming more frequently. Based on the frequency of the occurrence of transcripts contained in each Pfam domain, we ranked the Pfam domains/families and listed the top ten abundant domains/families in Figure 3b, with hit results similar to the previous study [5]. Among these domains/families, “protein kinase domain” and its subclass “protein tyrosine kinase” are known to regulate the majority of cellular pathways, proteins with “leucine-rich repeats” domain are recognized to be frequently involved in the formation of protein–protein interactions, and “PPR repeat” has been reported to be a large protein family in plants with versatile functions [15]. Other protein families, such as “RNA recognition motif”, “WD domain, G-beta repeat”, and “cytochrome P450”, which have some basic functions in plants, were also found in the top ten of the list. Taken together, 39,475 transcripts got the best hits with known proteins in at least one of the five databases, and 13,698 transcripts exhibited the similarity to proteins in all of the five databases (see Figure 4 and Table 2).Figure 3


De novo transcriptome assembly of the wild relative of tea tree (Camellia taliensis) and comparative analysis with tea transcriptome identified putative genes associated with tea quality and stress response.

Zhang HB, Xia EH, Huang H, Jiang JJ, Liu BY, Gao LZ - BMC Genomics (2015)

Venn diagram showing the BLAST results of C. taliensis transcriptome against five databases. De novo reconstructed transcript sequences were used to search against public databases including NR, UniRef90, TAIR10, KOG and PFAM. The number of transcripts that have significant hits against the five databases is shown in each intersection of the Venn diagram.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4404113&req=5

Fig4: Venn diagram showing the BLAST results of C. taliensis transcriptome against five databases. De novo reconstructed transcript sequences were used to search against public databases including NR, UniRef90, TAIR10, KOG and PFAM. The number of transcripts that have significant hits against the five databases is shown in each intersection of the Venn diagram.
Mentions: Considering that the information of conserved domains within a gene was indicative of deducing genes’ function, we performed the annotation of potential domains inside the assembled transcripts. To facilitate this procedure, the open reading frame (ORF) for each transcript was first extracted using a set of programs included in the Trinity package (see Methods), and then all the transcripts with predicted ORFs were searched against the PFAM database using profile hidden Markov model methods. Overall, a total of 20,748 transcripts were categorized into 3,707 domains/families. Figure 3a shows the size distribution of each domains/families, suggesting that most domains were found to contain a small number of transcripts, with a small proportion seeming more frequently. Based on the frequency of the occurrence of transcripts contained in each Pfam domain, we ranked the Pfam domains/families and listed the top ten abundant domains/families in Figure 3b, with hit results similar to the previous study [5]. Among these domains/families, “protein kinase domain” and its subclass “protein tyrosine kinase” are known to regulate the majority of cellular pathways, proteins with “leucine-rich repeats” domain are recognized to be frequently involved in the formation of protein–protein interactions, and “PPR repeat” has been reported to be a large protein family in plants with versatile functions [15]. Other protein families, such as “RNA recognition motif”, “WD domain, G-beta repeat”, and “cytochrome P450”, which have some basic functions in plants, were also found in the top ten of the list. Taken together, 39,475 transcripts got the best hits with known proteins in at least one of the five databases, and 13,698 transcripts exhibited the similarity to proteins in all of the five databases (see Figure 4 and Table 2).Figure 3

Bottom Line: To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species.Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

View Article: PubMed Central - PubMed

Affiliation: Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, the Chinese Academy of Sciences, Kunming, 650204, China. zhanghaibin@mail.kib.ac.cn.

ABSTRACT

Background: Camellia taliensis is one of the most important wild relatives of cultivated tea tree, C. sinensis. The species extensively occupies mountainous habitats representing a wide-range abiotic tolerance and biotic resistance and thus harbors valuable gene resources that may greatly benefit genetic improvement of cultivated tea tree. However, owning to a large genome size of ~3 Gb and structurally complex genome, there are fairly limited genetic information and particularly few genomic resources publicly available for this species. To better understand the key pathways determining tea flavor and enhance tea tree breeding programs, we performed a high-throughput transcriptome sequencing for C. taliensis.

Results: In this study, approximate 241.5 million high-quality paired-end reads, accounting for ~24 Gb of sequence data, were generated from tender shoots, young leaves, flower buds and flowers using Illumina HiSeq 2000 platform. De novo assembly with further processing and filtering yielded a set of 67,923 transcripts with an average length of 685 bp and an N50 of 995 bp. Based on sequence similarity searches against public databases, a total of 39,475 transcripts were annotated with gene descriptions, conserved protein domains or gene ontology (GO) terms. Candidate genes for major metabolic pathways involved in tea quality were identified and experimentally validated using RT-qPCR. Further gene expression profiles showed that they are differentially regulated at different developmental stages. To gain insights into the evolution of these genes, we aligned them to the previously cloned orthologous genes in C. sinensis, and found that considerable nucleotide variation within several genes involved in important secondary metabolic biosynthesis pathways, of which flavone synthase II gene (FNSII) is the most variable between these two species. Moreover, comparative analyses revealed that C. taliensis shows a remarkable expansion of LEA genes, compared to C. sinensis, which might contribute to the observed stronger stress resistance of C. taliensis.

Conclusion: We reported the first large-coverage transcriptome datasets for C. taliensis using the next-generation sequencing technology. Such comprehensive EST datasets provide an unprecedented opportunity for identifying genes involved in several major metabolic pathways and will accelerate functional genomic studies and genetic improvement efforts of tea trees in the future.

No MeSH data available.


Related in: MedlinePlus