Limits...
Characterization of a second secologanin synthase isoform producing both secologanin and secoxyloganin allows enhanced de novo assembly of a Catharanthus roseus transcriptome.

Dugé de Bernonville T, Foureau E, Parage C, Lanoue A, Clastre M, Londono MA, Oudin A, Houillé B, Papon N, Besseau S, Glévarec G, Atehortùa L, Giglioli-Guivarc'h N, St-Pierre B, De Luca V, O'Connor SE, Courdavault V - BMC Genomics (2015)

Bottom Line: The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway.Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

View Article: PubMed Central - PubMed

Affiliation: Université François-Rabelais de Tours, EA2106 "Biomolécules et Biotechnologies Végétales", UFR Sciences et Techniques, 37200, Tours, France. Bernonvillethomas.duge@univ-tours.fr.

ABSTRACT

Background: Transcriptome sequencing offers a great resource for the study of non-model plants such as Catharanthus roseus, which produces valuable monoterpenoid indole alkaloids (MIAs) via a complex biosynthetic pathway whose characterization is still undergoing. Transcriptome databases dedicated to this plant were recently developed by several consortia to uncover new biosynthetic genes. However, the identification of missing steps in MIA biosynthesis based on these large datasets may be limited by the erroneous assembly of close transcripts and isoforms, even with the multiple available transcriptomes.

Results: Secologanin synthases (SLS) are P450 enzymes that catalyze an unusual ring-opening reaction of loganin in the biosynthesis of the MIA precursor secologanin. We report here the identification and characterization in C. roseus of a new isoform of SLS, SLS2, sharing 97 % nucleotide sequence identity with the previously characterized SLS1. We also discovered that both isoforms further oxidize secologanin into secoxyloganin. SLS2 had however a different expression profile, being the major isoform in aerial organs that constitute the main site of MIA accumulation. Unfortunately, we were unable to find a current C. roseus transcriptome database containing simultaneously well reconstructed sequences of SLS isoforms and accurate expression levels. After a pair of close mRNA encoding tabersonine 16-hydroxylase (T16H1 and T16H2), this is the second example of improperly assembled transcripts from the MIA pathway in the public transcriptome databases. To construct a more complete transcriptome resource for C. roseus, we re-processed previously published transcriptome data by combining new single assemblies. Care was particularly taken during clustering and filtering steps to remove redundant contigs but not transcripts encoding potential isoforms by monitoring quality reconstruction of MIA genes and specific SLS and T16H isoforms. The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.

Conclusions: The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway. Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

No MeSH data available.


Clustering of redundant contigs in the dataset resulting from the combination of all single assemblies. Contigs sharing a given % of identity were clustered with CD-HIT-EST. a Number of clusters after CD-HIT-EST at % identity thresholds fixed from 90 to 100 %. b Reconstruction quality of MIA genes in the current resources (A = ccOrcae, B = mpgrCra, C = NIPGR, D = PMS454, E = PMSIllu) and the datasets resulting from the clustering by CD-HIT-EST at % identity thresholds. Reference MIA gene sequences were BLASTed against each assembly and the resulting bitscore was compared to that of an ideal sequence (bitscore of the reference sequence against itself, i.e. bitscore ratio = 1)
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4541752&req=5

Fig8: Clustering of redundant contigs in the dataset resulting from the combination of all single assemblies. Contigs sharing a given % of identity were clustered with CD-HIT-EST. a Number of clusters after CD-HIT-EST at % identity thresholds fixed from 90 to 100 %. b Reconstruction quality of MIA genes in the current resources (A = ccOrcae, B = mpgrCra, C = NIPGR, D = PMS454, E = PMSIllu) and the datasets resulting from the clustering by CD-HIT-EST at % identity thresholds. Reference MIA gene sequences were BLASTed against each assembly and the resulting bitscore was compared to that of an ideal sequence (bitscore of the reference sequence against itself, i.e. bitscore ratio = 1)

Mentions: In the second approach, we merged all single Trinity assemblies (see above) and ran different filtering procedures in order to decrease the resulting redundancy without altering transcript quality. A total of 3,145,245 contigs from single assemblies were then combined. This allowed combining very high quality transcripts within one new assembly which however, contained an evident redundancy due to the merging procedure. Indeed, the resulting large dataset is expected to cover a large number of isoforms. These isoforms may be real transcripts such as isoforms of SLS and T16H that have to be differentiated, or alleles of different cultivars, which should be integrated into a reference sequence. Running CD-HIT-EST with different sequence identity thresholds succeeded in combining contigs into clusters (Fig. 8a). This algorithm clusters similar sequences and uses one of them as a representative one. A weak decrease in sequence quality was observed with lower identity thresholds for 16OMT (bitscore/ideal bitscore in non-clustered dataset, 0.94; at clustering threshold 98 %, 0.92), IS and SLS2 for clustering thresholds lower than 0.94 (Fig. 8b). The transcript with lowest quality was T16H2 (0.88 for clustering threshold above 0.96). However, its quality was quite similar with that of the best reconstruction in current resources (0.92 in NIPGR). Two other genes, IDI1 and STR did not display ideal reconstruction, according to the reference sequence. IDI1 was slightly better reconstructed in PMS454 and STR was better in NIPGR and PMS454. The origin of those discrepancies are unclear but might have been caused by a higher polymorphism, leading to a different reference sequence in comparison to the representative clusters obtained here. According to the quality of MIA biosynthetic gene reconstruction, we further retained the clustered dataset obtained with a sequence identity threshold of 97 %. This threshold should be permissive enough to combine alleles differing by only few SNPs. The resulting clustered dataset, thereafter renamed CD97, was composed of a total of 534,979 clusters, 357,652 being singletons (a contig displaying no sufficient identity with other contigs) and 177,327 being real clusters, containing more than two contigs (which may originate from the same single assembly or from different single assemblies). A total of 249,423 sequences had identities (e-value < 1e-20) with sequences of the Uniprot database (Blastx), and 9,283 proteins found in this database were represented at 90 % of their length by at least one cluster in CD97.Fig. 8


Characterization of a second secologanin synthase isoform producing both secologanin and secoxyloganin allows enhanced de novo assembly of a Catharanthus roseus transcriptome.

Dugé de Bernonville T, Foureau E, Parage C, Lanoue A, Clastre M, Londono MA, Oudin A, Houillé B, Papon N, Besseau S, Glévarec G, Atehortùa L, Giglioli-Guivarc'h N, St-Pierre B, De Luca V, O'Connor SE, Courdavault V - BMC Genomics (2015)

Clustering of redundant contigs in the dataset resulting from the combination of all single assemblies. Contigs sharing a given % of identity were clustered with CD-HIT-EST. a Number of clusters after CD-HIT-EST at % identity thresholds fixed from 90 to 100 %. b Reconstruction quality of MIA genes in the current resources (A = ccOrcae, B = mpgrCra, C = NIPGR, D = PMS454, E = PMSIllu) and the datasets resulting from the clustering by CD-HIT-EST at % identity thresholds. Reference MIA gene sequences were BLASTed against each assembly and the resulting bitscore was compared to that of an ideal sequence (bitscore of the reference sequence against itself, i.e. bitscore ratio = 1)
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4541752&req=5

Fig8: Clustering of redundant contigs in the dataset resulting from the combination of all single assemblies. Contigs sharing a given % of identity were clustered with CD-HIT-EST. a Number of clusters after CD-HIT-EST at % identity thresholds fixed from 90 to 100 %. b Reconstruction quality of MIA genes in the current resources (A = ccOrcae, B = mpgrCra, C = NIPGR, D = PMS454, E = PMSIllu) and the datasets resulting from the clustering by CD-HIT-EST at % identity thresholds. Reference MIA gene sequences were BLASTed against each assembly and the resulting bitscore was compared to that of an ideal sequence (bitscore of the reference sequence against itself, i.e. bitscore ratio = 1)
Mentions: In the second approach, we merged all single Trinity assemblies (see above) and ran different filtering procedures in order to decrease the resulting redundancy without altering transcript quality. A total of 3,145,245 contigs from single assemblies were then combined. This allowed combining very high quality transcripts within one new assembly which however, contained an evident redundancy due to the merging procedure. Indeed, the resulting large dataset is expected to cover a large number of isoforms. These isoforms may be real transcripts such as isoforms of SLS and T16H that have to be differentiated, or alleles of different cultivars, which should be integrated into a reference sequence. Running CD-HIT-EST with different sequence identity thresholds succeeded in combining contigs into clusters (Fig. 8a). This algorithm clusters similar sequences and uses one of them as a representative one. A weak decrease in sequence quality was observed with lower identity thresholds for 16OMT (bitscore/ideal bitscore in non-clustered dataset, 0.94; at clustering threshold 98 %, 0.92), IS and SLS2 for clustering thresholds lower than 0.94 (Fig. 8b). The transcript with lowest quality was T16H2 (0.88 for clustering threshold above 0.96). However, its quality was quite similar with that of the best reconstruction in current resources (0.92 in NIPGR). Two other genes, IDI1 and STR did not display ideal reconstruction, according to the reference sequence. IDI1 was slightly better reconstructed in PMS454 and STR was better in NIPGR and PMS454. The origin of those discrepancies are unclear but might have been caused by a higher polymorphism, leading to a different reference sequence in comparison to the representative clusters obtained here. According to the quality of MIA biosynthetic gene reconstruction, we further retained the clustered dataset obtained with a sequence identity threshold of 97 %. This threshold should be permissive enough to combine alleles differing by only few SNPs. The resulting clustered dataset, thereafter renamed CD97, was composed of a total of 534,979 clusters, 357,652 being singletons (a contig displaying no sufficient identity with other contigs) and 177,327 being real clusters, containing more than two contigs (which may originate from the same single assembly or from different single assemblies). A total of 249,423 sequences had identities (e-value < 1e-20) with sequences of the Uniprot database (Blastx), and 9,283 proteins found in this database were represented at 90 % of their length by at least one cluster in CD97.Fig. 8

Bottom Line: The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway.Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

View Article: PubMed Central - PubMed

Affiliation: Université François-Rabelais de Tours, EA2106 "Biomolécules et Biotechnologies Végétales", UFR Sciences et Techniques, 37200, Tours, France. Bernonvillethomas.duge@univ-tours.fr.

ABSTRACT

Background: Transcriptome sequencing offers a great resource for the study of non-model plants such as Catharanthus roseus, which produces valuable monoterpenoid indole alkaloids (MIAs) via a complex biosynthetic pathway whose characterization is still undergoing. Transcriptome databases dedicated to this plant were recently developed by several consortia to uncover new biosynthetic genes. However, the identification of missing steps in MIA biosynthesis based on these large datasets may be limited by the erroneous assembly of close transcripts and isoforms, even with the multiple available transcriptomes.

Results: Secologanin synthases (SLS) are P450 enzymes that catalyze an unusual ring-opening reaction of loganin in the biosynthesis of the MIA precursor secologanin. We report here the identification and characterization in C. roseus of a new isoform of SLS, SLS2, sharing 97 % nucleotide sequence identity with the previously characterized SLS1. We also discovered that both isoforms further oxidize secologanin into secoxyloganin. SLS2 had however a different expression profile, being the major isoform in aerial organs that constitute the main site of MIA accumulation. Unfortunately, we were unable to find a current C. roseus transcriptome database containing simultaneously well reconstructed sequences of SLS isoforms and accurate expression levels. After a pair of close mRNA encoding tabersonine 16-hydroxylase (T16H1 and T16H2), this is the second example of improperly assembled transcripts from the MIA pathway in the public transcriptome databases. To construct a more complete transcriptome resource for C. roseus, we re-processed previously published transcriptome data by combining new single assemblies. Care was particularly taken during clustering and filtering steps to remove redundant contigs but not transcripts encoding potential isoforms by monitoring quality reconstruction of MIA genes and specific SLS and T16H isoforms. The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.

Conclusions: The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway. Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

No MeSH data available.