Limits...
Characterization of a second secologanin synthase isoform producing both secologanin and secoxyloganin allows enhanced de novo assembly of a Catharanthus roseus transcriptome.

Dugé de Bernonville T, Foureau E, Parage C, Lanoue A, Clastre M, Londono MA, Oudin A, Houillé B, Papon N, Besseau S, Glévarec G, Atehortùa L, Giglioli-Guivarc'h N, St-Pierre B, De Luca V, O'Connor SE, Courdavault V - BMC Genomics (2015)

Bottom Line: The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway.Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

View Article: PubMed Central - PubMed

Affiliation: Université François-Rabelais de Tours, EA2106 "Biomolécules et Biotechnologies Végétales", UFR Sciences et Techniques, 37200, Tours, France. Bernonvillethomas.duge@univ-tours.fr.

ABSTRACT

Background: Transcriptome sequencing offers a great resource for the study of non-model plants such as Catharanthus roseus, which produces valuable monoterpenoid indole alkaloids (MIAs) via a complex biosynthetic pathway whose characterization is still undergoing. Transcriptome databases dedicated to this plant were recently developed by several consortia to uncover new biosynthetic genes. However, the identification of missing steps in MIA biosynthesis based on these large datasets may be limited by the erroneous assembly of close transcripts and isoforms, even with the multiple available transcriptomes.

Results: Secologanin synthases (SLS) are P450 enzymes that catalyze an unusual ring-opening reaction of loganin in the biosynthesis of the MIA precursor secologanin. We report here the identification and characterization in C. roseus of a new isoform of SLS, SLS2, sharing 97 % nucleotide sequence identity with the previously characterized SLS1. We also discovered that both isoforms further oxidize secologanin into secoxyloganin. SLS2 had however a different expression profile, being the major isoform in aerial organs that constitute the main site of MIA accumulation. Unfortunately, we were unable to find a current C. roseus transcriptome database containing simultaneously well reconstructed sequences of SLS isoforms and accurate expression levels. After a pair of close mRNA encoding tabersonine 16-hydroxylase (T16H1 and T16H2), this is the second example of improperly assembled transcripts from the MIA pathway in the public transcriptome databases. To construct a more complete transcriptome resource for C. roseus, we re-processed previously published transcriptome data by combining new single assemblies. Care was particularly taken during clustering and filtering steps to remove redundant contigs but not transcripts encoding potential isoforms by monitoring quality reconstruction of MIA genes and specific SLS and T16H isoforms. The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.

Conclusions: The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway. Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

No MeSH data available.


Characterization of clusters in the clustered dataset (CD97). These graphs represent the number of clusters falling below different threshold values for the number of contigs (a), the summed FPKM (on the 42 available samples) (b) and cluster length (c). Red bars show the thresholds that were retained to filter poorly supported sequences from CD97. Sequences which met at least two of those criteria were discarded (d) (total of 485,641)
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4541752&req=5

Fig10: Characterization of clusters in the clustered dataset (CD97). These graphs represent the number of clusters falling below different threshold values for the number of contigs (a), the summed FPKM (on the 42 available samples) (b) and cluster length (c). Red bars show the thresholds that were retained to filter poorly supported sequences from CD97. Sequences which met at least two of those criteria were discarded (d) (total of 485,641)

Mentions: To further clean CD97, all putative clusters were tested for 3 criteria: (i) length, (ii) number of contigs and (iii) expression level (sum of Fragment per Kilobase per Million of reads (FPKM) calculated on the 42 samples (19 paired-end and 23 single-end, see Additional file 3: Table S1). Visual inspection of the number of clusters potentially removed by each filter (Fig. 10) was used to choose appropriate values. The objective was to eliminate clusters with poor representation which could be reconstruction artefacts. Choosing low thresholds of number of contigs and FPKM quickly removed a high number of clusters (427,494 with less than 3 contigs and 315,357 with sum FPKM < 5; Fig. 10a and b). For these two filters, we choose to retain values at which changes in the number of removed clusters displayed lower variation:10 contigs per cluster and sum of FPKM >50. Concerning cluster length, the distribution was more graduated (Fig. 10c). In order to avoid removing weakly expressed or small genes, we chose to discard clusters that do not meet at least two of the three filters (Fig. 10d). We expected that this procedure could reduce the loss of weakly expressed genes or rare isoforms. By fixing a minimal length of 500 bp, a number of contigs > 10 and a sum of FPKM > 50, we found that a large number of sequences (245,395) did not pass the three filters. This indicated that many clusters which size was < 500 bp have both poor representation and weak expression levels. We also found 233,752 clusters which had a sum of FPKM < 50 and contained less than 10 contigs. All sequences having a sum of FPKM fell in this class. Out of the 543,979 clusters of CD97, a total of 485,641 sequences did not pass the filters. The resulting dataset, which contained 58,338 clusters, was retained and called CDF97.Fig. 10


Characterization of a second secologanin synthase isoform producing both secologanin and secoxyloganin allows enhanced de novo assembly of a Catharanthus roseus transcriptome.

Dugé de Bernonville T, Foureau E, Parage C, Lanoue A, Clastre M, Londono MA, Oudin A, Houillé B, Papon N, Besseau S, Glévarec G, Atehortùa L, Giglioli-Guivarc'h N, St-Pierre B, De Luca V, O'Connor SE, Courdavault V - BMC Genomics (2015)

Characterization of clusters in the clustered dataset (CD97). These graphs represent the number of clusters falling below different threshold values for the number of contigs (a), the summed FPKM (on the 42 available samples) (b) and cluster length (c). Red bars show the thresholds that were retained to filter poorly supported sequences from CD97. Sequences which met at least two of those criteria were discarded (d) (total of 485,641)
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4541752&req=5

Fig10: Characterization of clusters in the clustered dataset (CD97). These graphs represent the number of clusters falling below different threshold values for the number of contigs (a), the summed FPKM (on the 42 available samples) (b) and cluster length (c). Red bars show the thresholds that were retained to filter poorly supported sequences from CD97. Sequences which met at least two of those criteria were discarded (d) (total of 485,641)
Mentions: To further clean CD97, all putative clusters were tested for 3 criteria: (i) length, (ii) number of contigs and (iii) expression level (sum of Fragment per Kilobase per Million of reads (FPKM) calculated on the 42 samples (19 paired-end and 23 single-end, see Additional file 3: Table S1). Visual inspection of the number of clusters potentially removed by each filter (Fig. 10) was used to choose appropriate values. The objective was to eliminate clusters with poor representation which could be reconstruction artefacts. Choosing low thresholds of number of contigs and FPKM quickly removed a high number of clusters (427,494 with less than 3 contigs and 315,357 with sum FPKM < 5; Fig. 10a and b). For these two filters, we choose to retain values at which changes in the number of removed clusters displayed lower variation:10 contigs per cluster and sum of FPKM >50. Concerning cluster length, the distribution was more graduated (Fig. 10c). In order to avoid removing weakly expressed or small genes, we chose to discard clusters that do not meet at least two of the three filters (Fig. 10d). We expected that this procedure could reduce the loss of weakly expressed genes or rare isoforms. By fixing a minimal length of 500 bp, a number of contigs > 10 and a sum of FPKM > 50, we found that a large number of sequences (245,395) did not pass the three filters. This indicated that many clusters which size was < 500 bp have both poor representation and weak expression levels. We also found 233,752 clusters which had a sum of FPKM < 50 and contained less than 10 contigs. All sequences having a sum of FPKM fell in this class. Out of the 543,979 clusters of CD97, a total of 485,641 sequences did not pass the filters. The resulting dataset, which contained 58,338 clusters, was retained and called CDF97.Fig. 10

Bottom Line: The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway.Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

View Article: PubMed Central - PubMed

Affiliation: Université François-Rabelais de Tours, EA2106 "Biomolécules et Biotechnologies Végétales", UFR Sciences et Techniques, 37200, Tours, France. Bernonvillethomas.duge@univ-tours.fr.

ABSTRACT

Background: Transcriptome sequencing offers a great resource for the study of non-model plants such as Catharanthus roseus, which produces valuable monoterpenoid indole alkaloids (MIAs) via a complex biosynthetic pathway whose characterization is still undergoing. Transcriptome databases dedicated to this plant were recently developed by several consortia to uncover new biosynthetic genes. However, the identification of missing steps in MIA biosynthesis based on these large datasets may be limited by the erroneous assembly of close transcripts and isoforms, even with the multiple available transcriptomes.

Results: Secologanin synthases (SLS) are P450 enzymes that catalyze an unusual ring-opening reaction of loganin in the biosynthesis of the MIA precursor secologanin. We report here the identification and characterization in C. roseus of a new isoform of SLS, SLS2, sharing 97 % nucleotide sequence identity with the previously characterized SLS1. We also discovered that both isoforms further oxidize secologanin into secoxyloganin. SLS2 had however a different expression profile, being the major isoform in aerial organs that constitute the main site of MIA accumulation. Unfortunately, we were unable to find a current C. roseus transcriptome database containing simultaneously well reconstructed sequences of SLS isoforms and accurate expression levels. After a pair of close mRNA encoding tabersonine 16-hydroxylase (T16H1 and T16H2), this is the second example of improperly assembled transcripts from the MIA pathway in the public transcriptome databases. To construct a more complete transcriptome resource for C. roseus, we re-processed previously published transcriptome data by combining new single assemblies. Care was particularly taken during clustering and filtering steps to remove redundant contigs but not transcripts encoding potential isoforms by monitoring quality reconstruction of MIA genes and specific SLS and T16H isoforms. The new consensus transcriptome allowed a precise estimation of abundance of SLS and T16H isoforms, similar to qPCR measurements.

Conclusions: The C. roseus consensus transcriptome can now be used for characterization of new genes of the MIA pathway. Furthermore, additional isoforms of genes encoding distinct MIA biosynthetic enzymes isoforms could be predicted suggesting the existence of a higher level of complexity in the synthesis of MIA, raising the question of the evolutionary events behind what seems like redundancy.

No MeSH data available.