Limits...
An EST-based analysis identifies new genes and reveals distinctive gene expression features of Coffea arabica and Coffea canephora.

Mondego JM, Vidal RO, Carazzolle MF, Tokuda EK, Parizzi LP, Costa GG, Pereira LF, Andrade AC, Colombo CA, Vieira LG, Pereira GA, Brazilian Coffee Genome Project Consorti - BMC Plant Biol. (2011)

Bottom Line: OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species.Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes.Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centro de Recursos Genéticos Vegetais, Instituto Agronômico de Campinas, CP 28, 13001-970, Campinas-SP, Brazil.

ABSTRACT

Background: Coffee is one of the world's most important crops; it is consumed worldwide and plays a significant role in the economy of producing countries. Coffea arabica and C. canephora are responsible for 70 and 30% of commercial production, respectively. C. arabica is an allotetraploid from a recent hybridization of the diploid species, C. canephora and C. eugenioides. C. arabica has lower genetic diversity and results in a higher quality beverage than C. canephora. Research initiatives have been launched to produce genomic and transcriptomic data about Coffea spp. as a strategy to improve breeding efficiency.

Results: Assembling the expressed sequence tags (ESTs) of C. arabica and C. canephora produced by the Brazilian Coffee Genome Project and the Nestlé-Cornell Consortium revealed 32,007 clusters of C. arabica and 16,665 clusters of C. canephora. We detected different GC3 profiles between these species that are related to their genome structure and mating system. BLAST analysis revealed similarities between coffee and grape (Vitis vinifera) genes. Using KA/KS analysis, we identified coffee genes under purifying and positive selection. Protein domain and gene ontology analyses suggested differences between Coffea spp. data, mainly in relation to complex sugar synthases and nucleotide binding proteins. OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species. Among the interesting families annotated are new cystatins, glycine-rich proteins and RALF-like peptides. Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes. Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories.

Conclusion: We present the first comprehensive genome-wide transcript profile study of C. arabica and C. canephora, which can be freely assessed by the scientific community at http://www.lge.ibi.unicamp.br/coffea. Our data reveal the presence of species-specific/prevalent genes in coffee that may help to explain particular characteristics of these two crops. The identification of differentially expressed transcripts offers a starting point for the correlation between gene expression profiles and Coffea spp. developmental traits, providing valuable insights for coffee breeding and biotechnology, especially concerning sugar metabolism and stress tolerance.

Show MeSH

Related in: MedlinePlus

Comparative chart between the relative percentage of Pfam domains in C. arabica and C. canephora EST databases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3045888&req=5

Figure 4: Comparative chart between the relative percentage of Pfam domains in C. arabica and C. canephora EST databases.

Mentions: We performed a comparison of C. arabica and C. canephora gene clusters with the CDD-PFAM databank to catalog the protein domains present in the Coffea EST datasets. The submission of the clusters to RPS-BLAST resulted in 30% (9,886) of C. arabica and 32% (5,478) of C. canephora clusters containing an assigned domain. To compare the prevalence of protein domains in Coffea species, the number of clusters assigned to each domain was normalized by dividing by the total number of clusters containing a domain. Serine threonine kinases (Pfam00069), cytochrome P450 monooxygenases (Pfam00067), tyrosine kinases (Pfam07714) and proteins containing RNA recognition motifs (RRM; Pfam00076) are among the top 20 PFAM families in Coffea species (Additional File 5). Next, we plotted the percentage of protein domains in Coffea datasets in a comparative histogram. Protein domain analysis revealed significant differences between the two species datasets (Figure 4). For example, C. arabica contains more cytochrome P450 monooxygenases, tyrosine kinases, extensin-like proteins, glycine-rich proteins, sugar transporters, UDP glucosyl- transferases, NAD-dependent epimerases, DNA-J proteins, NB-ARC proteins, cellulose synthases, raffinose synthases, D-mannose-binding lectins and flavin amine oxidoreductases than C. canephora (Figure 4). In contrast, the C. canephora dataset contains a higher percentage of transcripts coding for proteins containing RRM motifs, ubiquitin conjugation enzymes, ABC transporters, Ras/Rab/Rac proteins, 2-OG oxygenases, cupin proteins, HSP20 s, HSP70 s, ADP-ribosylation factors, dehydrins, glutenins and seed maturation proteins (Figure 4). Despite these dissimilarities between datasets may be caused by the different tissues used for constructing the C. arabica and C. canephora cDNA libraries, such results offer clues for further comparative research.


An EST-based analysis identifies new genes and reveals distinctive gene expression features of Coffea arabica and Coffea canephora.

Mondego JM, Vidal RO, Carazzolle MF, Tokuda EK, Parizzi LP, Costa GG, Pereira LF, Andrade AC, Colombo CA, Vieira LG, Pereira GA, Brazilian Coffee Genome Project Consorti - BMC Plant Biol. (2011)

Comparative chart between the relative percentage of Pfam domains in C. arabica and C. canephora EST databases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3045888&req=5

Figure 4: Comparative chart between the relative percentage of Pfam domains in C. arabica and C. canephora EST databases.
Mentions: We performed a comparison of C. arabica and C. canephora gene clusters with the CDD-PFAM databank to catalog the protein domains present in the Coffea EST datasets. The submission of the clusters to RPS-BLAST resulted in 30% (9,886) of C. arabica and 32% (5,478) of C. canephora clusters containing an assigned domain. To compare the prevalence of protein domains in Coffea species, the number of clusters assigned to each domain was normalized by dividing by the total number of clusters containing a domain. Serine threonine kinases (Pfam00069), cytochrome P450 monooxygenases (Pfam00067), tyrosine kinases (Pfam07714) and proteins containing RNA recognition motifs (RRM; Pfam00076) are among the top 20 PFAM families in Coffea species (Additional File 5). Next, we plotted the percentage of protein domains in Coffea datasets in a comparative histogram. Protein domain analysis revealed significant differences between the two species datasets (Figure 4). For example, C. arabica contains more cytochrome P450 monooxygenases, tyrosine kinases, extensin-like proteins, glycine-rich proteins, sugar transporters, UDP glucosyl- transferases, NAD-dependent epimerases, DNA-J proteins, NB-ARC proteins, cellulose synthases, raffinose synthases, D-mannose-binding lectins and flavin amine oxidoreductases than C. canephora (Figure 4). In contrast, the C. canephora dataset contains a higher percentage of transcripts coding for proteins containing RRM motifs, ubiquitin conjugation enzymes, ABC transporters, Ras/Rab/Rac proteins, 2-OG oxygenases, cupin proteins, HSP20 s, HSP70 s, ADP-ribosylation factors, dehydrins, glutenins and seed maturation proteins (Figure 4). Despite these dissimilarities between datasets may be caused by the different tissues used for constructing the C. arabica and C. canephora cDNA libraries, such results offer clues for further comparative research.

Bottom Line: OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species.Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes.Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centro de Recursos Genéticos Vegetais, Instituto Agronômico de Campinas, CP 28, 13001-970, Campinas-SP, Brazil.

ABSTRACT

Background: Coffee is one of the world's most important crops; it is consumed worldwide and plays a significant role in the economy of producing countries. Coffea arabica and C. canephora are responsible for 70 and 30% of commercial production, respectively. C. arabica is an allotetraploid from a recent hybridization of the diploid species, C. canephora and C. eugenioides. C. arabica has lower genetic diversity and results in a higher quality beverage than C. canephora. Research initiatives have been launched to produce genomic and transcriptomic data about Coffea spp. as a strategy to improve breeding efficiency.

Results: Assembling the expressed sequence tags (ESTs) of C. arabica and C. canephora produced by the Brazilian Coffee Genome Project and the Nestlé-Cornell Consortium revealed 32,007 clusters of C. arabica and 16,665 clusters of C. canephora. We detected different GC3 profiles between these species that are related to their genome structure and mating system. BLAST analysis revealed similarities between coffee and grape (Vitis vinifera) genes. Using KA/KS analysis, we identified coffee genes under purifying and positive selection. Protein domain and gene ontology analyses suggested differences between Coffea spp. data, mainly in relation to complex sugar synthases and nucleotide binding proteins. OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species. Among the interesting families annotated are new cystatins, glycine-rich proteins and RALF-like peptides. Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes. Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories.

Conclusion: We present the first comprehensive genome-wide transcript profile study of C. arabica and C. canephora, which can be freely assessed by the scientific community at http://www.lge.ibi.unicamp.br/coffea. Our data reveal the presence of species-specific/prevalent genes in coffee that may help to explain particular characteristics of these two crops. The identification of differentially expressed transcripts offers a starting point for the correlation between gene expression profiles and Coffea spp. developmental traits, providing valuable insights for coffee breeding and biotechnology, especially concerning sugar metabolism and stress tolerance.

Show MeSH
Related in: MedlinePlus