Limits...
Exploring the genes of yerba mate (Ilex paraguariensis A. St.-Hil.) by NGS and de novo transcriptome assembly.

Debat HJ, Grabiele M, Aguilera PM, Bubillo RE, Otegui MB, Ducasse DA, Zapata PD, Marti DA - PLoS ONE (2014)

Bottom Line: We have also pinpointed several members of the gene silencing pathway, and characterized the silencing effector Argonaute1.We present here the first draft of the transcribed genomes of the yerba mate chloroplast and mitochondrion.Moreover, we provide a collection of over 10,800 SSR accessible to the scientific community interested in yerba mate genetic improvement.

View Article: PubMed Central - PubMed

Affiliation: Instituto de Patología Vegetal, Centro de Investigaciones Agropecuarias, Instituto Nacional de Tecnología Agropecuaria (IPAVE-CIAP-INTA), Córdoba, Argentina.

ABSTRACT
Yerba mate (Ilex paraguariensis A. St.-Hil.) is an important subtropical tree crop cultivated on 326,000 ha in Argentina, Brazil and Paraguay, with a total yield production of more than 1,000,000 t. Yerba mate presents a strong limitation regarding sequence information. The NCBI GenBank lacks an EST database of yerba mate and depicts only 80 DNA sequences, mostly uncharacterized. In this scenario, in order to elucidate the yerba mate gene landscape by means of NGS, we explored and discovered a vast collection of I. paraguariensis transcripts. Total RNA from I. paraguariensis was sequenced by Illumina HiSeq-2000 obtaining 72,031,388 pair-end 100 bp sequences. High quality reads were de novo assembled into 44,907 transcripts encompassing 40 million bases with an estimated coverage of 180X. Multiple sequence analysis allowed us to predict that yerba mate contains ∼ 32,355 genes and 12,551 gene variants or isoforms. We identified and categorized members of more than 100 metabolic pathways. Overall, we have identified ∼ 1,000 putative transcription factors, genes involved in heat and oxidative stress, pathogen response, as well as disease resistance and hormone response. We have also identified, based in sequence homology searches, novel transcripts related to osmotic, drought, salinity and cold stress, senescence and early flowering. We have also pinpointed several members of the gene silencing pathway, and characterized the silencing effector Argonaute1. We predicted a diverse supply of putative microRNA precursors involved in developmental processes. We present here the first draft of the transcribed genomes of the yerba mate chloroplast and mitochondrion. The putative sequence and predicted structure of the caffeine synthase of yerba mate is presented. Moreover, we provide a collection of over 10,800 SSR accessible to the scientific community interested in yerba mate genetic improvement. This contribution broadly expands the limited knowledge of yerba mate genes, and is presented as the first genomic resource of this important crop.

Show MeSH

Related in: MedlinePlus

Proportion and frequencies of predicted SSRs in Ilex paraguariensis transcriptome.(a) Proportion of SSR predicted in yerba mate transcriptome categorized by k-mer length. (b) ct/ag-tc/ga account for 84% of di-nucleotide SSRs found in yerba mate. (c) Frecuency of tri-nucleotide SSRs predicted in yerba mate. With over 26% of the hits, aag/ctt-tct/aga-ttc/gaa are the most common SSR found in Ilex paraguariensis.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4199719&req=5

pone-0109835-g003: Proportion and frequencies of predicted SSRs in Ilex paraguariensis transcriptome.(a) Proportion of SSR predicted in yerba mate transcriptome categorized by k-mer length. (b) ct/ag-tc/ga account for 84% of di-nucleotide SSRs found in yerba mate. (c) Frecuency of tri-nucleotide SSRs predicted in yerba mate. With over 26% of the hits, aag/ctt-tct/aga-ttc/gaa are the most common SSR found in Ilex paraguariensis.

Mentions: Simple sequence repeat (SSR) markers are well-known and widely used as valuable tools for assessing genetic diversity. SSRs are useful in the development of genetic maps, comparative genomics and marker-assisted selection breeding [28]. Thus, in parallel, the yerba mate transcripts library was comprehensively analyzed in search of SSRs. A total of 10,813 SSRs were identified in 8,449 sequences along the transcriptome. We analyzed our data and in silico predicted SSRs using 6,4,3,3,3 motifs repeats criteria for di-, tri-, tetra-, penta-, and hexa-nucleotides SSRs. In this context, the 2 nt motif repeats represented 40.9% of total SSRs found, while 3 nt motif repeats constituted roughly a 35.8% of total SSRs (Figure 3a). The most represented SSR corresponded to 2 nt motif ct/ag-tc/ga (Figure 3b) which encompassed over 84% of the 4,429 SSRs of 2 bp motif (Table S11). Among the tri-nucleotide motif repeats, with over 26% of the hits, aag/ctt-tct/aga-ttc/gaa are the most common SSR found in I. paraguariensis (Figure 3c). In most plant transcriptome studies, tri-nucleotide are the most frequent SSRs. However, the repeat motif abundance in plant transcriptomes is affected by the in silico determination of SSRs prediction criteria. For instance, several studies consider di-, tri-, tetra- penta- and hexa-nucleotides when diverse motif repeats are present, i.e. 6,5,4,4,4 in Salvia splendens[25], 6,5,5,5,5 in Saccharum spp [28], 6,5,5,4,4 in Capsicum frutescens[23], 6,4,3,3,3 in Curcuma longa[29], 4,4,4,4,4 in Ipomoea batatas[30]. In order to be consistent with the literature, we have in silico predicted SSRs using 6,4,3,3,3 motifs repeats criteria. In this background, di-nucleotides were the most representative SSR species, followed by tri-nucleotides. This non-standard distribution has also been described for Salvia splendes[25] with 39.9%/29.3% di- and tri-nucleotide frequencies, respectively, sweet potato with 43.3%/42.4% [30], rubber tree with 38%/34% [31] and several other plants such as cucumber [32], sesame [33], kiwi [34] and coffee [35] where di-nucleotides are also the most represented SSR species.


Exploring the genes of yerba mate (Ilex paraguariensis A. St.-Hil.) by NGS and de novo transcriptome assembly.

Debat HJ, Grabiele M, Aguilera PM, Bubillo RE, Otegui MB, Ducasse DA, Zapata PD, Marti DA - PLoS ONE (2014)

Proportion and frequencies of predicted SSRs in Ilex paraguariensis transcriptome.(a) Proportion of SSR predicted in yerba mate transcriptome categorized by k-mer length. (b) ct/ag-tc/ga account for 84% of di-nucleotide SSRs found in yerba mate. (c) Frecuency of tri-nucleotide SSRs predicted in yerba mate. With over 26% of the hits, aag/ctt-tct/aga-ttc/gaa are the most common SSR found in Ilex paraguariensis.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4199719&req=5

pone-0109835-g003: Proportion and frequencies of predicted SSRs in Ilex paraguariensis transcriptome.(a) Proportion of SSR predicted in yerba mate transcriptome categorized by k-mer length. (b) ct/ag-tc/ga account for 84% of di-nucleotide SSRs found in yerba mate. (c) Frecuency of tri-nucleotide SSRs predicted in yerba mate. With over 26% of the hits, aag/ctt-tct/aga-ttc/gaa are the most common SSR found in Ilex paraguariensis.
Mentions: Simple sequence repeat (SSR) markers are well-known and widely used as valuable tools for assessing genetic diversity. SSRs are useful in the development of genetic maps, comparative genomics and marker-assisted selection breeding [28]. Thus, in parallel, the yerba mate transcripts library was comprehensively analyzed in search of SSRs. A total of 10,813 SSRs were identified in 8,449 sequences along the transcriptome. We analyzed our data and in silico predicted SSRs using 6,4,3,3,3 motifs repeats criteria for di-, tri-, tetra-, penta-, and hexa-nucleotides SSRs. In this context, the 2 nt motif repeats represented 40.9% of total SSRs found, while 3 nt motif repeats constituted roughly a 35.8% of total SSRs (Figure 3a). The most represented SSR corresponded to 2 nt motif ct/ag-tc/ga (Figure 3b) which encompassed over 84% of the 4,429 SSRs of 2 bp motif (Table S11). Among the tri-nucleotide motif repeats, with over 26% of the hits, aag/ctt-tct/aga-ttc/gaa are the most common SSR found in I. paraguariensis (Figure 3c). In most plant transcriptome studies, tri-nucleotide are the most frequent SSRs. However, the repeat motif abundance in plant transcriptomes is affected by the in silico determination of SSRs prediction criteria. For instance, several studies consider di-, tri-, tetra- penta- and hexa-nucleotides when diverse motif repeats are present, i.e. 6,5,4,4,4 in Salvia splendens[25], 6,5,5,5,5 in Saccharum spp [28], 6,5,5,4,4 in Capsicum frutescens[23], 6,4,3,3,3 in Curcuma longa[29], 4,4,4,4,4 in Ipomoea batatas[30]. In order to be consistent with the literature, we have in silico predicted SSRs using 6,4,3,3,3 motifs repeats criteria. In this background, di-nucleotides were the most representative SSR species, followed by tri-nucleotides. This non-standard distribution has also been described for Salvia splendes[25] with 39.9%/29.3% di- and tri-nucleotide frequencies, respectively, sweet potato with 43.3%/42.4% [30], rubber tree with 38%/34% [31] and several other plants such as cucumber [32], sesame [33], kiwi [34] and coffee [35] where di-nucleotides are also the most represented SSR species.

Bottom Line: We have also pinpointed several members of the gene silencing pathway, and characterized the silencing effector Argonaute1.We present here the first draft of the transcribed genomes of the yerba mate chloroplast and mitochondrion.Moreover, we provide a collection of over 10,800 SSR accessible to the scientific community interested in yerba mate genetic improvement.

View Article: PubMed Central - PubMed

Affiliation: Instituto de Patología Vegetal, Centro de Investigaciones Agropecuarias, Instituto Nacional de Tecnología Agropecuaria (IPAVE-CIAP-INTA), Córdoba, Argentina.

ABSTRACT
Yerba mate (Ilex paraguariensis A. St.-Hil.) is an important subtropical tree crop cultivated on 326,000 ha in Argentina, Brazil and Paraguay, with a total yield production of more than 1,000,000 t. Yerba mate presents a strong limitation regarding sequence information. The NCBI GenBank lacks an EST database of yerba mate and depicts only 80 DNA sequences, mostly uncharacterized. In this scenario, in order to elucidate the yerba mate gene landscape by means of NGS, we explored and discovered a vast collection of I. paraguariensis transcripts. Total RNA from I. paraguariensis was sequenced by Illumina HiSeq-2000 obtaining 72,031,388 pair-end 100 bp sequences. High quality reads were de novo assembled into 44,907 transcripts encompassing 40 million bases with an estimated coverage of 180X. Multiple sequence analysis allowed us to predict that yerba mate contains ∼ 32,355 genes and 12,551 gene variants or isoforms. We identified and categorized members of more than 100 metabolic pathways. Overall, we have identified ∼ 1,000 putative transcription factors, genes involved in heat and oxidative stress, pathogen response, as well as disease resistance and hormone response. We have also identified, based in sequence homology searches, novel transcripts related to osmotic, drought, salinity and cold stress, senescence and early flowering. We have also pinpointed several members of the gene silencing pathway, and characterized the silencing effector Argonaute1. We predicted a diverse supply of putative microRNA precursors involved in developmental processes. We present here the first draft of the transcribed genomes of the yerba mate chloroplast and mitochondrion. The putative sequence and predicted structure of the caffeine synthase of yerba mate is presented. Moreover, we provide a collection of over 10,800 SSR accessible to the scientific community interested in yerba mate genetic improvement. This contribution broadly expands the limited knowledge of yerba mate genes, and is presented as the first genomic resource of this important crop.

Show MeSH
Related in: MedlinePlus