Limits...
Characterization of a second secologanin synthase isoform producing both secologanin and secoxyloganin allows enhanced de novo assembly of a Catharanthus roseus transcriptome.

Dugé de Bernonville T, Foureau E, Parage C, Lanoue A, Clastre M, Londono MA, Oudin A, Houillé B, Papon N, Besseau S, Glévarec G, Atehortùa L, Giglioli-Guivarc'h N, St-Pierre B, De Luca V, O'Connor SE, Courdavault V - BMC Genomics (2015)

Bottom Line: The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway.Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

View Article: PubMed Central - PubMed

Affiliation: Université François-Rabelais de Tours, EA2106 "Biomolécules et Biotechnologies Végétales", UFR Sciences et Techniques, 37200, Tours, France. Bernonvillethomas.duge@univ-tours.fr.

ABSTRACT

Background: Transcriptome sequencing offers a great resource for the study of non-model plants such as Catharanthus roseus, which produces valuable monoterpenoid indole alkaloids (MIAs) via a complex biosynthetic pathway whose characterization is still undergoing. Transcriptome databases dedicated to this plant were recently developed by several consortia to uncover new biosynthetic genes. However, the identification of missing steps in MIA biosynthesis based on these large datasets may be limited by the erroneous assembly of close transcripts and isoforms, even with the multiple available transcriptomes.

Results: Secologanin synthases (SLS) are P450 enzymes that catalyze an unusual ring-opening reaction of loganin in the biosynthesis of the MIA precursor secologanin. We report here the identification and characterization in C. roseus of a new isoform of SLS, SLS2, sharing 97 % nucleotide sequence identity with the previously characterized SLS1. We also discovered that both isoforms further oxidize secologanin into secoxyloganin. SLS2 had however a different expression profile, being the major isoform in aerial organs that constitute the main site of MIA accumulation. Unfortunately, we were unable to find a current C. roseus transcriptome database containing simultaneously well reconstructed sequences of SLS isoforms and accurate expression levels. After a pair of close mRNA encoding tabersonine 16-hydroxylase (T16H1 and T16H2), this is the second example of improperly assembled transcripts from the MIA pathway in the public transcriptome databases. To construct a more complete transcriptome resource for C. roseus, we re-processed previously published transcriptome data by combining new single assemblies. Care was particularly taken during clustering and filtering steps to remove redundant contigs but not transcripts encoding potential isoforms by monitoring quality reconstruction of MIA genes and specific SLS and T16H isoforms. The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.

Conclusions: The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway. Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

No MeSH data available.


Composition of the clustered dataset (CD97) resulting from the processing of the combination of all single assemblies with CD-HIT-EST at 97 % identity. a Integration of contigs from single assemblies into clusters. Contigs which cannot be grouped with others are called singletons. True clusters, i.e. containing at least two different contigs, may have been formed by the combination of contigs from one or more initial single assemblies. b Correlation plot of single assemblies. Contigs found in each cluster (singletons and true clusters) were identified and counted per initial assembly. Two initial assemblies are therefore strongly correlated (Pearson Correlation Coefficient) if their contigs are found in the same clusters. c Composition of true clusters. This graph shows how many single assemblies are represented within clusters
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4541752&req=5

Fig9: Composition of the clustered dataset (CD97) resulting from the processing of the combination of all single assemblies with CD-HIT-EST at 97 % identity. a Integration of contigs from single assemblies into clusters. Contigs which cannot be grouped with others are called singletons. True clusters, i.e. containing at least two different contigs, may have been formed by the combination of contigs from one or more initial single assemblies. b Correlation plot of single assemblies. Contigs found in each cluster (singletons and true clusters) were identified and counted per initial assembly. Two initial assemblies are therefore strongly correlated (Pearson Correlation Coefficient) if their contigs are found in the same clusters. c Composition of true clusters. This graph shows how many single assemblies are represented within clusters

Mentions: Participation of initial single assemblies in CD97 clusters was homogeneous, except for SRR122238 for which only 10 % of contigs (4.3 % in clusters, 5.7 % in singletons; Additional file 4: Table S2) were used by CD-HIT-EST. Concerning SRR122238 single assembly, the low proportion of reads used in CD97 was probably due to its very high number of contigs (1,666,984) in comparison with the other assemblies. For other single assemblies, more than 90 % of contigs were used by CD-HIT-EST, with at least 50 % in true clusters (Fig. 9a; Additional file 4: Table S2). Composition of true clusters revealed a somewhat preferential association of contigs from single assemblies obtained in a same study (Fig. 9b). Correlation coefficients calculated on the pattern of participation of each single assembly in true clusters were higher for 4 groups of samples: (i) SRR1144633 and SRR1144634 (SRP035766, leafy flower transition study), (ii) SRR646596, SRR646604 and SRR646572 (SRP017832, MeJA treatments on shoots), (iii) SRR122237 and SRR122236 (SRP005953, mixed libraries from different organs) and (iv) SRR924147, SRR924148, SRR648707 and SRR648705 (SRP026417 and SRP017947, cell suspension MeJA and ORCA overexpression). This preferential association is more likely to be due to the inherent genetic diversity between C. roseus cultivars than experimental conditions. However, high coefficient correlations (>0.6) were also observed for independent studies, as exemplified between SRR122236 and SRR1144634. The strongest differences were observed for samples of the NIPGR study (SRR1271857, SRR1271858 and SRR1271859) and for SRR122238. For the latter, this might be linked to its higher participation in singletons than in true clusters (Fig. 9a). In CD97, 105,730 clusters contained contigs from 2 to 5 different single assemblies, 31,055 clusters contained contigs from more than 10 single assemblies and 3,506 clusters were composed of contigs from the 19 single assemblies (Fig. 9c). These 31,055 clusters might represent the core transcriptome of C. roseus. Indeed, 25,692 had significant (e-value < 1e-20) identities with proteins of the UniprotKB database (Blastx) (Table 1).Fig. 9


Characterization of a second secologanin synthase isoform producing both secologanin and secoxyloganin allows enhanced de novo assembly of a Catharanthus roseus transcriptome.

Dugé de Bernonville T, Foureau E, Parage C, Lanoue A, Clastre M, Londono MA, Oudin A, Houillé B, Papon N, Besseau S, Glévarec G, Atehortùa L, Giglioli-Guivarc'h N, St-Pierre B, De Luca V, O'Connor SE, Courdavault V - BMC Genomics (2015)

Composition of the clustered dataset (CD97) resulting from the processing of the combination of all single assemblies with CD-HIT-EST at 97 % identity. a Integration of contigs from single assemblies into clusters. Contigs which cannot be grouped with others are called singletons. True clusters, i.e. containing at least two different contigs, may have been formed by the combination of contigs from one or more initial single assemblies. b Correlation plot of single assemblies. Contigs found in each cluster (singletons and true clusters) were identified and counted per initial assembly. Two initial assemblies are therefore strongly correlated (Pearson Correlation Coefficient) if their contigs are found in the same clusters. c Composition of true clusters. This graph shows how many single assemblies are represented within clusters
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4541752&req=5

Fig9: Composition of the clustered dataset (CD97) resulting from the processing of the combination of all single assemblies with CD-HIT-EST at 97 % identity. a Integration of contigs from single assemblies into clusters. Contigs which cannot be grouped with others are called singletons. True clusters, i.e. containing at least two different contigs, may have been formed by the combination of contigs from one or more initial single assemblies. b Correlation plot of single assemblies. Contigs found in each cluster (singletons and true clusters) were identified and counted per initial assembly. Two initial assemblies are therefore strongly correlated (Pearson Correlation Coefficient) if their contigs are found in the same clusters. c Composition of true clusters. This graph shows how many single assemblies are represented within clusters
Mentions: Participation of initial single assemblies in CD97 clusters was homogeneous, except for SRR122238 for which only 10 % of contigs (4.3 % in clusters, 5.7 % in singletons; Additional file 4: Table S2) were used by CD-HIT-EST. Concerning SRR122238 single assembly, the low proportion of reads used in CD97 was probably due to its very high number of contigs (1,666,984) in comparison with the other assemblies. For other single assemblies, more than 90 % of contigs were used by CD-HIT-EST, with at least 50 % in true clusters (Fig. 9a; Additional file 4: Table S2). Composition of true clusters revealed a somewhat preferential association of contigs from single assemblies obtained in a same study (Fig. 9b). Correlation coefficients calculated on the pattern of participation of each single assembly in true clusters were higher for 4 groups of samples: (i) SRR1144633 and SRR1144634 (SRP035766, leafy flower transition study), (ii) SRR646596, SRR646604 and SRR646572 (SRP017832, MeJA treatments on shoots), (iii) SRR122237 and SRR122236 (SRP005953, mixed libraries from different organs) and (iv) SRR924147, SRR924148, SRR648707 and SRR648705 (SRP026417 and SRP017947, cell suspension MeJA and ORCA overexpression). This preferential association is more likely to be due to the inherent genetic diversity between C. roseus cultivars than experimental conditions. However, high coefficient correlations (>0.6) were also observed for independent studies, as exemplified between SRR122236 and SRR1144634. The strongest differences were observed for samples of the NIPGR study (SRR1271857, SRR1271858 and SRR1271859) and for SRR122238. For the latter, this might be linked to its higher participation in singletons than in true clusters (Fig. 9a). In CD97, 105,730 clusters contained contigs from 2 to 5 different single assemblies, 31,055 clusters contained contigs from more than 10 single assemblies and 3,506 clusters were composed of contigs from the 19 single assemblies (Fig. 9c). These 31,055 clusters might represent the core transcriptome of C. roseus. Indeed, 25,692 had significant (e-value < 1e-20) identities with proteins of the UniprotKB database (Blastx) (Table 1).Fig. 9

Bottom Line: The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway.Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

View Article: PubMed Central - PubMed

Affiliation: Université François-Rabelais de Tours, EA2106 "Biomolécules et Biotechnologies Végétales", UFR Sciences et Techniques, 37200, Tours, France. Bernonvillethomas.duge@univ-tours.fr.

ABSTRACT

Background: Transcriptome sequencing offers a great resource for the study of non-model plants such as Catharanthus roseus, which produces valuable monoterpenoid indole alkaloids (MIAs) via a complex biosynthetic pathway whose characterization is still undergoing. Transcriptome databases dedicated to this plant were recently developed by several consortia to uncover new biosynthetic genes. However, the identification of missing steps in MIA biosynthesis based on these large datasets may be limited by the erroneous assembly of close transcripts and isoforms, even with the multiple available transcriptomes.

Results: Secologanin synthases (SLS) are P450 enzymes that catalyze an unusual ring-opening reaction of loganin in the biosynthesis of the MIA precursor secologanin. We report here the identification and characterization in C. roseus of a new isoform of SLS, SLS2, sharing 97 % nucleotide sequence identity with the previously characterized SLS1. We also discovered that both isoforms further oxidize secologanin into secoxyloganin. SLS2 had however a different expression profile, being the major isoform in aerial organs that constitute the main site of MIA accumulation. Unfortunately, we were unable to find a current C. roseus transcriptome database containing simultaneously well reconstructed sequences of SLS isoforms and accurate expression levels. After a pair of close mRNA encoding tabersonine 16-hydroxylase (T16H1 and T16H2), this is the second example of improperly assembled transcripts from the MIA pathway in the public transcriptome databases. To construct a more complete transcriptome resource for C. roseus, we re-processed previously published transcriptome data by combining new single assemblies. Care was particularly taken during clustering and filtering steps to remove redundant contigs but not transcripts encoding potential isoforms by monitoring quality reconstruction of MIA genes and specific SLS and T16H isoforms. The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.

Conclusions: The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway. Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

No MeSH data available.