Limits...
An EST-based analysis identifies new genes and reveals distinctive gene expression features of Coffea arabica and Coffea canephora.

Mondego JM, Vidal RO, Carazzolle MF, Tokuda EK, Parizzi LP, Costa GG, Pereira LF, Andrade AC, Colombo CA, Vieira LG, Pereira GA, Brazilian Coffee Genome Project Consorti - BMC Plant Biol. (2011)

Bottom Line: OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species.Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes.Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centro de Recursos Genéticos Vegetais, Instituto Agronômico de Campinas, CP 28, 13001-970, Campinas-SP, Brazil.

ABSTRACT

Background: Coffee is one of the world's most important crops; it is consumed worldwide and plays a significant role in the economy of producing countries. Coffea arabica and C. canephora are responsible for 70 and 30% of commercial production, respectively. C. arabica is an allotetraploid from a recent hybridization of the diploid species, C. canephora and C. eugenioides. C. arabica has lower genetic diversity and results in a higher quality beverage than C. canephora. Research initiatives have been launched to produce genomic and transcriptomic data about Coffea spp. as a strategy to improve breeding efficiency.

Results: Assembling the expressed sequence tags (ESTs) of C. arabica and C. canephora produced by the Brazilian Coffee Genome Project and the Nestlé-Cornell Consortium revealed 32,007 clusters of C. arabica and 16,665 clusters of C. canephora. We detected different GC3 profiles between these species that are related to their genome structure and mating system. BLAST analysis revealed similarities between coffee and grape (Vitis vinifera) genes. Using KA/KS analysis, we identified coffee genes under purifying and positive selection. Protein domain and gene ontology analyses suggested differences between Coffea spp. data, mainly in relation to complex sugar synthases and nucleotide binding proteins. OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species. Among the interesting families annotated are new cystatins, glycine-rich proteins and RALF-like peptides. Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes. Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories.

Conclusion: We present the first comprehensive genome-wide transcript profile study of C. arabica and C. canephora, which can be freely assessed by the scientific community at http://www.lge.ibi.unicamp.br/coffea. Our data reveal the presence of species-specific/prevalent genes in coffee that may help to explain particular characteristics of these two crops. The identification of differentially expressed transcripts offers a starting point for the correlation between gene expression profiles and Coffea spp. developmental traits, providing valuable insights for coffee breeding and biotechnology, especially concerning sugar metabolism and stress tolerance.

Show MeSH

Related in: MedlinePlus

Hierarchical clustering of coffee cDNA libraries and clusters based on EST distribution. a) C. canephora hierarchical clustering of 443 clusters differentially expressed vs. the eight cDNA library assemblies. b) C. arabica hierarchical clustering of 331 clusters differentially expressed vs. the 23 cDNA library assemblies. Hierarchical clustering was performed using a correlation matrix constructed from EST frequencies for differentially expressed C. arabica and C. canephora contigs. Black intensity designates relative transcript abundance in a given library, as inferred from EST frequency within each contig. Library abbreviations correspond to the following descriptions: C. canephora: LF; young leaves, PP1; pericarp, all developmental stages; SE1; whole cherries,18 and 22 weeks after pollination; SE2, whole cherries,18 and 22 weeks after pollination; SE3: endosperm and perisperm, 30 weeks after pollination SE4; endosperm and perisperm, 42 and 46 weeks after pollination; EC1: embriogenic calli; SH1: leaves from water deficit stressed plants; and SH3: leaves from water deficit stressed plants (drought resistant clone). C. arabica: PC1, C. arabica non-embryogenic cell line induced with 2,4-D; CA1, non-embryogenic calli; IC1, C. arabica non-embryogenic cell line without 2,4-D; EA; EA2, C. arabica embryogenic calli; IA2, C. arabica embryogenic cell line induced with 2,4-D; PA1, primary embryogenic C. arabica calli; EM1, zygotic embryo from mature germinating seeds; SI3, germinating whole seeds; LV4, young leaves from orthotropic branches; LV5, young leaves from orthotropic branches; LV8, mature leaves from plagiotropic branches; LV9, mature leaves from plagiotropic branches; FB1, floral buds at developmental stages 1 and 2; FB2, floral buds at developmental stages 1 and 2; FB4, floral buds at developmental stages 3 and 4; FR1, floral buds, pinhead fruits, fruit developmental stages 1 and 2; FR2, floral buds, pinhead fruits, fruit developmental stages 1 and 2; SS1, well-watered field plant tissues; SH2, water-stressed plant tissues; CB1, suspension cells treated with acibenzolar-S-methyl and brassinosteroids; CS1, suspension cells under osmotic stress; AR1, leaves treated with arachidonic acid; LP1, plantlets treated with arachidonic acid; RT5, roots with acibenzolar-S-methyl; CL2, hypocotyls treated with acibenzolar-S-methyl; BP1, suspension cells treated with acibenzolar-S-methyl; RT8, root suspension cells under aluminum stress; RX1, Xyllela spp.-infected stems; NS1, nematode-infected roots; and RM1, leaves infected with leaf miner and coffee leaf rust.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3045888&req=5

Figure 6: Hierarchical clustering of coffee cDNA libraries and clusters based on EST distribution. a) C. canephora hierarchical clustering of 443 clusters differentially expressed vs. the eight cDNA library assemblies. b) C. arabica hierarchical clustering of 331 clusters differentially expressed vs. the 23 cDNA library assemblies. Hierarchical clustering was performed using a correlation matrix constructed from EST frequencies for differentially expressed C. arabica and C. canephora contigs. Black intensity designates relative transcript abundance in a given library, as inferred from EST frequency within each contig. Library abbreviations correspond to the following descriptions: C. canephora: LF; young leaves, PP1; pericarp, all developmental stages; SE1; whole cherries,18 and 22 weeks after pollination; SE2, whole cherries,18 and 22 weeks after pollination; SE3: endosperm and perisperm, 30 weeks after pollination SE4; endosperm and perisperm, 42 and 46 weeks after pollination; EC1: embriogenic calli; SH1: leaves from water deficit stressed plants; and SH3: leaves from water deficit stressed plants (drought resistant clone). C. arabica: PC1, C. arabica non-embryogenic cell line induced with 2,4-D; CA1, non-embryogenic calli; IC1, C. arabica non-embryogenic cell line without 2,4-D; EA; EA2, C. arabica embryogenic calli; IA2, C. arabica embryogenic cell line induced with 2,4-D; PA1, primary embryogenic C. arabica calli; EM1, zygotic embryo from mature germinating seeds; SI3, germinating whole seeds; LV4, young leaves from orthotropic branches; LV5, young leaves from orthotropic branches; LV8, mature leaves from plagiotropic branches; LV9, mature leaves from plagiotropic branches; FB1, floral buds at developmental stages 1 and 2; FB2, floral buds at developmental stages 1 and 2; FB4, floral buds at developmental stages 3 and 4; FR1, floral buds, pinhead fruits, fruit developmental stages 1 and 2; FR2, floral buds, pinhead fruits, fruit developmental stages 1 and 2; SS1, well-watered field plant tissues; SH2, water-stressed plant tissues; CB1, suspension cells treated with acibenzolar-S-methyl and brassinosteroids; CS1, suspension cells under osmotic stress; AR1, leaves treated with arachidonic acid; LP1, plantlets treated with arachidonic acid; RT5, roots with acibenzolar-S-methyl; CL2, hypocotyls treated with acibenzolar-S-methyl; BP1, suspension cells treated with acibenzolar-S-methyl; RT8, root suspension cells under aluminum stress; RX1, Xyllela spp.-infected stems; NS1, nematode-infected roots; and RM1, leaves infected with leaf miner and coffee leaf rust.

Mentions: To identify genes uniquely or preferentially expressed in specific coffee EST libraries, R statistics [55] and Audic Claverie (AC) statistics [56] were used through IDEG6, a web tool for the statistical analysis of gene expression data [57]. Libraries containing < 300 ESTs were discarded from these analyses, because libraries with a small amount of ESTs tend to disturb the prediction of differentially expressed genes. After some manual clusterization, we observed that several libraries derived from the same tissues (EA1, IA1 and IA2; EM1 and SI3; LV4, LV5, LV8 and LV9; FB1 and FB4; and FR1 and FR2) present the same set of genes differentially expressed in comparison to the other libraries. Thus, they were combined for further analyses. After evaluating statistical data, the merging of AC and R statistical analyses resulted in 331 contigs from C. arabica and 443 contigs from C. canephora. Thereafter, hierarchical clustering was applied to this data using a correlation matrix constructed from EST frequencies for differentially expressed C. arabica and C. canephora contigs (Figure 6; Additional File 8). The clustering results indicated that the differences among C. canephora libraries were more evident than in C. arabica, likely due to the small number of libraries of the former (Figure 6A and 6B).


An EST-based analysis identifies new genes and reveals distinctive gene expression features of Coffea arabica and Coffea canephora.

Mondego JM, Vidal RO, Carazzolle MF, Tokuda EK, Parizzi LP, Costa GG, Pereira LF, Andrade AC, Colombo CA, Vieira LG, Pereira GA, Brazilian Coffee Genome Project Consorti - BMC Plant Biol. (2011)

Hierarchical clustering of coffee cDNA libraries and clusters based on EST distribution. a) C. canephora hierarchical clustering of 443 clusters differentially expressed vs. the eight cDNA library assemblies. b) C. arabica hierarchical clustering of 331 clusters differentially expressed vs. the 23 cDNA library assemblies. Hierarchical clustering was performed using a correlation matrix constructed from EST frequencies for differentially expressed C. arabica and C. canephora contigs. Black intensity designates relative transcript abundance in a given library, as inferred from EST frequency within each contig. Library abbreviations correspond to the following descriptions: C. canephora: LF; young leaves, PP1; pericarp, all developmental stages; SE1; whole cherries,18 and 22 weeks after pollination; SE2, whole cherries,18 and 22 weeks after pollination; SE3: endosperm and perisperm, 30 weeks after pollination SE4; endosperm and perisperm, 42 and 46 weeks after pollination; EC1: embriogenic calli; SH1: leaves from water deficit stressed plants; and SH3: leaves from water deficit stressed plants (drought resistant clone). C. arabica: PC1, C. arabica non-embryogenic cell line induced with 2,4-D; CA1, non-embryogenic calli; IC1, C. arabica non-embryogenic cell line without 2,4-D; EA; EA2, C. arabica embryogenic calli; IA2, C. arabica embryogenic cell line induced with 2,4-D; PA1, primary embryogenic C. arabica calli; EM1, zygotic embryo from mature germinating seeds; SI3, germinating whole seeds; LV4, young leaves from orthotropic branches; LV5, young leaves from orthotropic branches; LV8, mature leaves from plagiotropic branches; LV9, mature leaves from plagiotropic branches; FB1, floral buds at developmental stages 1 and 2; FB2, floral buds at developmental stages 1 and 2; FB4, floral buds at developmental stages 3 and 4; FR1, floral buds, pinhead fruits, fruit developmental stages 1 and 2; FR2, floral buds, pinhead fruits, fruit developmental stages 1 and 2; SS1, well-watered field plant tissues; SH2, water-stressed plant tissues; CB1, suspension cells treated with acibenzolar-S-methyl and brassinosteroids; CS1, suspension cells under osmotic stress; AR1, leaves treated with arachidonic acid; LP1, plantlets treated with arachidonic acid; RT5, roots with acibenzolar-S-methyl; CL2, hypocotyls treated with acibenzolar-S-methyl; BP1, suspension cells treated with acibenzolar-S-methyl; RT8, root suspension cells under aluminum stress; RX1, Xyllela spp.-infected stems; NS1, nematode-infected roots; and RM1, leaves infected with leaf miner and coffee leaf rust.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3045888&req=5

Figure 6: Hierarchical clustering of coffee cDNA libraries and clusters based on EST distribution. a) C. canephora hierarchical clustering of 443 clusters differentially expressed vs. the eight cDNA library assemblies. b) C. arabica hierarchical clustering of 331 clusters differentially expressed vs. the 23 cDNA library assemblies. Hierarchical clustering was performed using a correlation matrix constructed from EST frequencies for differentially expressed C. arabica and C. canephora contigs. Black intensity designates relative transcript abundance in a given library, as inferred from EST frequency within each contig. Library abbreviations correspond to the following descriptions: C. canephora: LF; young leaves, PP1; pericarp, all developmental stages; SE1; whole cherries,18 and 22 weeks after pollination; SE2, whole cherries,18 and 22 weeks after pollination; SE3: endosperm and perisperm, 30 weeks after pollination SE4; endosperm and perisperm, 42 and 46 weeks after pollination; EC1: embriogenic calli; SH1: leaves from water deficit stressed plants; and SH3: leaves from water deficit stressed plants (drought resistant clone). C. arabica: PC1, C. arabica non-embryogenic cell line induced with 2,4-D; CA1, non-embryogenic calli; IC1, C. arabica non-embryogenic cell line without 2,4-D; EA; EA2, C. arabica embryogenic calli; IA2, C. arabica embryogenic cell line induced with 2,4-D; PA1, primary embryogenic C. arabica calli; EM1, zygotic embryo from mature germinating seeds; SI3, germinating whole seeds; LV4, young leaves from orthotropic branches; LV5, young leaves from orthotropic branches; LV8, mature leaves from plagiotropic branches; LV9, mature leaves from plagiotropic branches; FB1, floral buds at developmental stages 1 and 2; FB2, floral buds at developmental stages 1 and 2; FB4, floral buds at developmental stages 3 and 4; FR1, floral buds, pinhead fruits, fruit developmental stages 1 and 2; FR2, floral buds, pinhead fruits, fruit developmental stages 1 and 2; SS1, well-watered field plant tissues; SH2, water-stressed plant tissues; CB1, suspension cells treated with acibenzolar-S-methyl and brassinosteroids; CS1, suspension cells under osmotic stress; AR1, leaves treated with arachidonic acid; LP1, plantlets treated with arachidonic acid; RT5, roots with acibenzolar-S-methyl; CL2, hypocotyls treated with acibenzolar-S-methyl; BP1, suspension cells treated with acibenzolar-S-methyl; RT8, root suspension cells under aluminum stress; RX1, Xyllela spp.-infected stems; NS1, nematode-infected roots; and RM1, leaves infected with leaf miner and coffee leaf rust.
Mentions: To identify genes uniquely or preferentially expressed in specific coffee EST libraries, R statistics [55] and Audic Claverie (AC) statistics [56] were used through IDEG6, a web tool for the statistical analysis of gene expression data [57]. Libraries containing < 300 ESTs were discarded from these analyses, because libraries with a small amount of ESTs tend to disturb the prediction of differentially expressed genes. After some manual clusterization, we observed that several libraries derived from the same tissues (EA1, IA1 and IA2; EM1 and SI3; LV4, LV5, LV8 and LV9; FB1 and FB4; and FR1 and FR2) present the same set of genes differentially expressed in comparison to the other libraries. Thus, they were combined for further analyses. After evaluating statistical data, the merging of AC and R statistical analyses resulted in 331 contigs from C. arabica and 443 contigs from C. canephora. Thereafter, hierarchical clustering was applied to this data using a correlation matrix constructed from EST frequencies for differentially expressed C. arabica and C. canephora contigs (Figure 6; Additional File 8). The clustering results indicated that the differences among C. canephora libraries were more evident than in C. arabica, likely due to the small number of libraries of the former (Figure 6A and 6B).

Bottom Line: OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species.Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes.Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centro de Recursos Genéticos Vegetais, Instituto Agronômico de Campinas, CP 28, 13001-970, Campinas-SP, Brazil.

ABSTRACT

Background: Coffee is one of the world's most important crops; it is consumed worldwide and plays a significant role in the economy of producing countries. Coffea arabica and C. canephora are responsible for 70 and 30% of commercial production, respectively. C. arabica is an allotetraploid from a recent hybridization of the diploid species, C. canephora and C. eugenioides. C. arabica has lower genetic diversity and results in a higher quality beverage than C. canephora. Research initiatives have been launched to produce genomic and transcriptomic data about Coffea spp. as a strategy to improve breeding efficiency.

Results: Assembling the expressed sequence tags (ESTs) of C. arabica and C. canephora produced by the Brazilian Coffee Genome Project and the Nestlé-Cornell Consortium revealed 32,007 clusters of C. arabica and 16,665 clusters of C. canephora. We detected different GC3 profiles between these species that are related to their genome structure and mating system. BLAST analysis revealed similarities between coffee and grape (Vitis vinifera) genes. Using KA/KS analysis, we identified coffee genes under purifying and positive selection. Protein domain and gene ontology analyses suggested differences between Coffea spp. data, mainly in relation to complex sugar synthases and nucleotide binding proteins. OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species. Among the interesting families annotated are new cystatins, glycine-rich proteins and RALF-like peptides. Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes. Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories.

Conclusion: We present the first comprehensive genome-wide transcript profile study of C. arabica and C. canephora, which can be freely assessed by the scientific community at http://www.lge.ibi.unicamp.br/coffea. Our data reveal the presence of species-specific/prevalent genes in coffee that may help to explain particular characteristics of these two crops. The identification of differentially expressed transcripts offers a starting point for the correlation between gene expression profiles and Coffea spp. developmental traits, providing valuable insights for coffee breeding and biotechnology, especially concerning sugar metabolism and stress tolerance.

Show MeSH
Related in: MedlinePlus