Limits...
De novo sequencing and comparative analysis of holy and sweet basil transcriptomes.

Rastogi S, Meena S, Bhattacharya A, Ghosh S, Shukla RK, Sangwan NS, Lal RK, Gupta MM, Lavania UC, Gupta V, Nagegowda DA, Shasany AK - BMC Genomics (2014)

Bottom Line: The sequence assembly resulted in 69117 and 130043 transcripts with an average length of 1646 ± 1210.1 bp and 1363 ± 1139.3 bp for O. sanctum and O. basilicum, respectively.Several CYP450 (26) and TF (40) families were identified having probable roles in primary and secondary metabolism.Also SSR and SNP markers were identified in the transcriptomes of both species with many SSRs linked to phenylpropanoid and terpenoid pathway genes.

View Article: PubMed Central - PubMed

Affiliation: Biotechnology Divison, CSIR-Central Institute of Medicinal and Aromatic Plants, P,O, CIMAP, 226015 Lucknow, U,P, India. da.nagegowda@cimap.res.in.

ABSTRACT

Background: Ocimum L. of family Lamiaceae is a well known genus for its ethnobotanical, medicinal and aromatic properties, which are attributed to innumerable phenylpropanoid and terpenoid compounds produced by the plant. To enrich genomic resources for understanding various pathways, de novo transcriptome sequencing of two important species, O. sanctum and O. basilicum, was carried out by Illumina paired-end sequencing.

Results: The sequence assembly resulted in 69117 and 130043 transcripts with an average length of 1646 ± 1210.1 bp and 1363 ± 1139.3 bp for O. sanctum and O. basilicum, respectively. Out of the total transcripts, 59648 (86.30%) and 105470 (81.10%) from O. sanctum and O. basilicum, and respectively were annotated by uniprot blastx against Arabidopsis, rice and lamiaceae. KEGG analysis identified 501 and 952 transcripts from O. sanctum and O. basilicum, respectively, related to secondary metabolism with higher percentage of transcripts for biosynthesis of terpenoids in O. sanctum and phenylpropanoids in O. basilicum. Higher digital gene expression in O. basilicum was validated through qPCR and correlated to higher essential oil content and chromosome number (O. sanctum, 2n = 16; and O. basilicum, 2n = 48). Several CYP450 (26) and TF (40) families were identified having probable roles in primary and secondary metabolism. Also SSR and SNP markers were identified in the transcriptomes of both species with many SSRs linked to phenylpropanoid and terpenoid pathway genes.

Conclusion: This is the first report of a comparative transcriptome analysis of Ocimum species and can be utilized to characterize genes related to secondary metabolism, their regulation, and breeding special chemotypes with unique essential oil composition in Ocimum.

Show MeSH
Transcript abundance and length summary of assembled transcripts. (A) Length of the assembled transcripts vs. Number of transcripts. Venn diagram representing datasets from lamiaceae, Arabidopsis and rice databases. (B) Number of shared and unique transcripts among lamiaceae, Arabidopsis and rice databases in O. sanctum. (C) Number of shared and unique transcripts among lamiaceae, Arabidopsis and rice databases in O. basilicum.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4125705&req=5

Fig1: Transcript abundance and length summary of assembled transcripts. (A) Length of the assembled transcripts vs. Number of transcripts. Venn diagram representing datasets from lamiaceae, Arabidopsis and rice databases. (B) Number of shared and unique transcripts among lamiaceae, Arabidopsis and rice databases in O. sanctum. (C) Number of shared and unique transcripts among lamiaceae, Arabidopsis and rice databases in O. basilicum.

Mentions: In recent years, Illumina sequencing platform has been widely used for transcriptome analysis of plants devoid of reference genomes[20–22]. In order to generate transcriptome sequences, complementary DNA (cDNA) libraries prepared from leaf tissues of Ocimum were sequenced using Illumina HiSeq1000 platform. Paired-end Sequencing-by-Synthesis (SBS) yielded raw data of 4.75 Gb and 5.23 Gb for O. sanctum and O. basilicum, respectively. After filtering and removing adapter sequences from the raw data, 45969831 (45.97 million) and 50836347 (50.84 million) reads comprising of 4542127604 and 5025102762 high quality nucleotide bases for O. sanctum and O. basilicum, respectively, were retained for further assembly. Filtered reads were assembled into contigs using Velvet assembler at a hash length of 45, which generated 75978 and 290284 contigs for O. sanctum and O. basilicum, respectively. Transcript generation was carried out using Oases-0.2.08 for the same hash length that resulted in 69117 and 130043 transcripts for O. sanctum and O. basilicum, respectively. In both cases average contig lengths were of 1646 ± 1210.1 bp and 1363 ± 1139.3 bp with N50 values of 2199 and 1929 in O. sanctum and O. basilicum respectively (Table 1). The average lengths of transcripts generated using Illumina platform in Curcuma longa, cabbage and goosegrass transcriptomes have also been reported with varied lengths of 1304.1 bp, 1419 bp and 1153.74 bp respectively[21–23]. The distribution of assembled transcript length ranged from 180 to >5000 bases. Maximum number of transcripts were of 501–1000 bp size with 12640 transcripts (18.29%) followed by 12613 transcripts (18.25%) of 1001–1500 bp size in O. sanctum. Similarly in O. basilicum, 180–500 bp size transcripts were of highest in number (31594 transcripts, 24.30%) followed by 27208 transcripts (20.92%) of 501–1000 bp size. In both cases, least number of transcripts 591 (0.86%) in O. sanctum and 641 (0.49%) in O. basilicum were of 4501–5000 bp size (Figure 1A). In root transcriptome of Ipomoea batatas, 65.76% unigenes were in the range of 101–500 bp length followed by 20.79% of transcripts of 501–100 bp length[20], similarly in the case of Medicago sativa, Boehmeria nivea, Apium graveolens and C. longa, Centella asiatica the highest number of transcripts/unigenes were reported with length between 75–500 bp[21–23]. Further, transcripts from both Ocimum samples were clustered using CD-HIT-v4.5.4 at 95% identity and query coverage resulting in a total of 130996 transcripts. Blastx search was conducted for assembled sequences of O. sanctum and O. basilicum against uniprot databases and GO terms were assigned for each unigene based on the GO terms annotated to its corresponding homologue in the uniprot database with the proteins of Arabidopsis, rice and lamiaceae family (Table 2; Additional file1, Additional file2, Additional file3). In the case of O. sanctum, 59380 transcripts (86%) were annotated with Arabidopsis, 56753 (82%) with rice and 11704 (17%) with lamiaceae family whereas 104856 (81%), 102721 (79%) and 18427 (14%) O. basilicum transcripts were annotated with Arabidopsis, rice and lamiaceae family, respectively. About 442, 694 and 225 transripts of O. sanctum; and 107, 2601 and 507 transcripts in O. basilicum were uniquely annotated to lamiaceae, Arabidopsis and rice databases, respectively (Figure 1B and C). Number of total transcripts annotated from all databases were 59648 (86.30%) and 105470 (81.10%) for O. sanctum and O. basilicum, respectively.Table 1


De novo sequencing and comparative analysis of holy and sweet basil transcriptomes.

Rastogi S, Meena S, Bhattacharya A, Ghosh S, Shukla RK, Sangwan NS, Lal RK, Gupta MM, Lavania UC, Gupta V, Nagegowda DA, Shasany AK - BMC Genomics (2014)

Transcript abundance and length summary of assembled transcripts. (A) Length of the assembled transcripts vs. Number of transcripts. Venn diagram representing datasets from lamiaceae, Arabidopsis and rice databases. (B) Number of shared and unique transcripts among lamiaceae, Arabidopsis and rice databases in O. sanctum. (C) Number of shared and unique transcripts among lamiaceae, Arabidopsis and rice databases in O. basilicum.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4125705&req=5

Fig1: Transcript abundance and length summary of assembled transcripts. (A) Length of the assembled transcripts vs. Number of transcripts. Venn diagram representing datasets from lamiaceae, Arabidopsis and rice databases. (B) Number of shared and unique transcripts among lamiaceae, Arabidopsis and rice databases in O. sanctum. (C) Number of shared and unique transcripts among lamiaceae, Arabidopsis and rice databases in O. basilicum.
Mentions: In recent years, Illumina sequencing platform has been widely used for transcriptome analysis of plants devoid of reference genomes[20–22]. In order to generate transcriptome sequences, complementary DNA (cDNA) libraries prepared from leaf tissues of Ocimum were sequenced using Illumina HiSeq1000 platform. Paired-end Sequencing-by-Synthesis (SBS) yielded raw data of 4.75 Gb and 5.23 Gb for O. sanctum and O. basilicum, respectively. After filtering and removing adapter sequences from the raw data, 45969831 (45.97 million) and 50836347 (50.84 million) reads comprising of 4542127604 and 5025102762 high quality nucleotide bases for O. sanctum and O. basilicum, respectively, were retained for further assembly. Filtered reads were assembled into contigs using Velvet assembler at a hash length of 45, which generated 75978 and 290284 contigs for O. sanctum and O. basilicum, respectively. Transcript generation was carried out using Oases-0.2.08 for the same hash length that resulted in 69117 and 130043 transcripts for O. sanctum and O. basilicum, respectively. In both cases average contig lengths were of 1646 ± 1210.1 bp and 1363 ± 1139.3 bp with N50 values of 2199 and 1929 in O. sanctum and O. basilicum respectively (Table 1). The average lengths of transcripts generated using Illumina platform in Curcuma longa, cabbage and goosegrass transcriptomes have also been reported with varied lengths of 1304.1 bp, 1419 bp and 1153.74 bp respectively[21–23]. The distribution of assembled transcript length ranged from 180 to >5000 bases. Maximum number of transcripts were of 501–1000 bp size with 12640 transcripts (18.29%) followed by 12613 transcripts (18.25%) of 1001–1500 bp size in O. sanctum. Similarly in O. basilicum, 180–500 bp size transcripts were of highest in number (31594 transcripts, 24.30%) followed by 27208 transcripts (20.92%) of 501–1000 bp size. In both cases, least number of transcripts 591 (0.86%) in O. sanctum and 641 (0.49%) in O. basilicum were of 4501–5000 bp size (Figure 1A). In root transcriptome of Ipomoea batatas, 65.76% unigenes were in the range of 101–500 bp length followed by 20.79% of transcripts of 501–100 bp length[20], similarly in the case of Medicago sativa, Boehmeria nivea, Apium graveolens and C. longa, Centella asiatica the highest number of transcripts/unigenes were reported with length between 75–500 bp[21–23]. Further, transcripts from both Ocimum samples were clustered using CD-HIT-v4.5.4 at 95% identity and query coverage resulting in a total of 130996 transcripts. Blastx search was conducted for assembled sequences of O. sanctum and O. basilicum against uniprot databases and GO terms were assigned for each unigene based on the GO terms annotated to its corresponding homologue in the uniprot database with the proteins of Arabidopsis, rice and lamiaceae family (Table 2; Additional file1, Additional file2, Additional file3). In the case of O. sanctum, 59380 transcripts (86%) were annotated with Arabidopsis, 56753 (82%) with rice and 11704 (17%) with lamiaceae family whereas 104856 (81%), 102721 (79%) and 18427 (14%) O. basilicum transcripts were annotated with Arabidopsis, rice and lamiaceae family, respectively. About 442, 694 and 225 transripts of O. sanctum; and 107, 2601 and 507 transcripts in O. basilicum were uniquely annotated to lamiaceae, Arabidopsis and rice databases, respectively (Figure 1B and C). Number of total transcripts annotated from all databases were 59648 (86.30%) and 105470 (81.10%) for O. sanctum and O. basilicum, respectively.Table 1

Bottom Line: The sequence assembly resulted in 69117 and 130043 transcripts with an average length of 1646 ± 1210.1 bp and 1363 ± 1139.3 bp for O. sanctum and O. basilicum, respectively.Several CYP450 (26) and TF (40) families were identified having probable roles in primary and secondary metabolism.Also SSR and SNP markers were identified in the transcriptomes of both species with many SSRs linked to phenylpropanoid and terpenoid pathway genes.

View Article: PubMed Central - PubMed

Affiliation: Biotechnology Divison, CSIR-Central Institute of Medicinal and Aromatic Plants, P,O, CIMAP, 226015 Lucknow, U,P, India. da.nagegowda@cimap.res.in.

ABSTRACT

Background: Ocimum L. of family Lamiaceae is a well known genus for its ethnobotanical, medicinal and aromatic properties, which are attributed to innumerable phenylpropanoid and terpenoid compounds produced by the plant. To enrich genomic resources for understanding various pathways, de novo transcriptome sequencing of two important species, O. sanctum and O. basilicum, was carried out by Illumina paired-end sequencing.

Results: The sequence assembly resulted in 69117 and 130043 transcripts with an average length of 1646 ± 1210.1 bp and 1363 ± 1139.3 bp for O. sanctum and O. basilicum, respectively. Out of the total transcripts, 59648 (86.30%) and 105470 (81.10%) from O. sanctum and O. basilicum, and respectively were annotated by uniprot blastx against Arabidopsis, rice and lamiaceae. KEGG analysis identified 501 and 952 transcripts from O. sanctum and O. basilicum, respectively, related to secondary metabolism with higher percentage of transcripts for biosynthesis of terpenoids in O. sanctum and phenylpropanoids in O. basilicum. Higher digital gene expression in O. basilicum was validated through qPCR and correlated to higher essential oil content and chromosome number (O. sanctum, 2n = 16; and O. basilicum, 2n = 48). Several CYP450 (26) and TF (40) families were identified having probable roles in primary and secondary metabolism. Also SSR and SNP markers were identified in the transcriptomes of both species with many SSRs linked to phenylpropanoid and terpenoid pathway genes.

Conclusion: This is the first report of a comparative transcriptome analysis of Ocimum species and can be utilized to characterize genes related to secondary metabolism, their regulation, and breeding special chemotypes with unique essential oil composition in Ocimum.

Show MeSH