Limits...
The landscape of nucleotide polymorphism among 13,500 genes of the conifer picea glauca, relationships with functions, and comparison with medicago truncatula.

Pavy N, Deschênes A, Blais S, Lavigne P, Beaulieu J, Isabel N, Mackay J, Bousquet J - Genome Biol Evol (2013)

Bottom Line: Conifer-specific sequences were also generally associated with the highest A/S ratios.These patterns confirm that the breadth of gene expression is a contributing factor to the evolution of nucleotide polymorphism.However, a number of families with high A/S ratios were found specific to P. glauca, suggesting cases of divergent evolution at the functional level.

View Article: PubMed Central - PubMed

Affiliation: Canada Research Chair in Forest and Environmental Genomics, Centre for Forest Research and Institute for Systems and Integrative Biology, Université Laval, Québec, Canada.

ABSTRACT
Gene families differ in composition, expression, and chromosomal organization between conifers and angiosperms, but little is known regarding nucleotide polymorphism. Using various sequencing strategies, an atlas of 212k high-confidence single nucleotide polymorphisms (SNPs) with a validation rate of more than 92% was developed for the conifer white spruce (Picea glauca). Nonsynonymous and synonymous SNPs were annotated over the corresponding 13,498 white spruce genes representative of 2,457 known gene families. Patterns of nucleotide polymorphisms were analyzed by estimating the ratio of nonsynonymous to synonymous numbers of substitutions per site (A/S). A general excess of synonymous SNPs was expected and observed. However, the analysis from several perspectives enabled to identify groups of genes harboring an excess of nonsynonymous SNPs, thus potentially under positive selection. Four known gene families harbored such an excess: dehydrins, ankyrin-repeats, AP2/DREB, and leucine-rich repeat. Conifer-specific sequences were also generally associated with the highest A/S ratios. A/S values were also distributed asymmetrically across genes specifically expressed in megagametophytes, roots, or in both, harboring on average an excess of nonsynonymous SNPs. These patterns confirm that the breadth of gene expression is a contributing factor to the evolution of nucleotide polymorphism. The A/S ratios of Medicago truncatula genes were also analyzed: several gene families shared between P. glauca and M. truncatula data sets had similar excess of synonymous or nonsynonymous SNPs. However, a number of families with high A/S ratios were found specific to P. glauca, suggesting cases of divergent evolution at the functional level.

Show MeSH

Related in: MedlinePlus

Relationship between sequence conservation with angiosperm genes and A/S ratios in Picea glauca. (a) Distribution of the 13,097 annotated P. glauca genes according to their conservation with angiosperm genes (BlastX e-value <e−10) and their A/S values. Hatched boxes: genes with a homolog in pines or Douglas-fir or both, but not in angiosperms (nor Amborella, nor Arabidopsis, nor rice) (14.6% of the data set) Solid boxes: genes with a homolog in angiosperms (at least with Amborella, or Arabidopsis, or rice) (85.4% of the data set). A total of 401 orphan P. glauca genes with no match (BlastX e-value >e−10) were excluded from this analysis. (b) Histogram illustrating the representation of P. glauca genes in the sets of genes with low (in green) or high (in blue) A/S values, according to their conservation with angiosperm sequences (gene set enrichment analysis, Fisher’s exact test, two-tailed, adjusted P < 0.01). Numbers indicate the average A/S for each data set with high or low A/S ratios.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3814201&req=5

evt143-F5: Relationship between sequence conservation with angiosperm genes and A/S ratios in Picea glauca. (a) Distribution of the 13,097 annotated P. glauca genes according to their conservation with angiosperm genes (BlastX e-value <e−10) and their A/S values. Hatched boxes: genes with a homolog in pines or Douglas-fir or both, but not in angiosperms (nor Amborella, nor Arabidopsis, nor rice) (14.6% of the data set) Solid boxes: genes with a homolog in angiosperms (at least with Amborella, or Arabidopsis, or rice) (85.4% of the data set). A total of 401 orphan P. glauca genes with no match (BlastX e-value >e−10) were excluded from this analysis. (b) Histogram illustrating the representation of P. glauca genes in the sets of genes with low (in green) or high (in blue) A/S values, according to their conservation with angiosperm sequences (gene set enrichment analysis, Fisher’s exact test, two-tailed, adjusted P < 0.01). Numbers indicate the average A/S for each data set with high or low A/S ratios.

Mentions: The PFAM-based approach relied on sequence conservation with known proteins and thus, it excluded the most divergent sequences. The P. glauca data set encompassed 1,911 conifer-specific sequences (see Materials and Methods) and 11,186 sequences that were conserved with angiosperms (BlastX e-value <e−10). According to the gene set enrichment analysis, the most populated list of genes was the one with the lowest A/S ratios (12,001 genes; 86.3%), whereas a minority of genes was characterized with the highest A/S ratios (1,096 genes; 7.9%) (fig. 5a). Interestingly, the conifer-specific genes were more abundant among genes with the highest A/S (45.3%) than among those with the lowest A/S values (11.8%) (fig. 5a). On the opposite, P. glauca genes conserved with an angiosperm homolog (BlastX e-value <e−10) were significantly more populated among the genes with the lowest A/S (gene set enrichment analysis; Fisher’s exact test, two-tailed, adjusted P < 0.01) (fig. 5b). Our approach based on Blast search (e-value <e−10) may have retained spruce sequences sharing short motifs with angiosperms; such motifs could be found in fast-evolving genes. To filter out such cases, we also ran the analysis with a much more stringent e-value cutoff of 0.5 (data not shown). Only 486 putative conifer-specific genes remained; but for this reduced subset, the trend was the same (gene set enrichment analysis; Fisher’s exact test, two-tailed, adjusted P < 0.01) as mentioned above (fig. 5).Fig. 5.—


The landscape of nucleotide polymorphism among 13,500 genes of the conifer picea glauca, relationships with functions, and comparison with medicago truncatula.

Pavy N, Deschênes A, Blais S, Lavigne P, Beaulieu J, Isabel N, Mackay J, Bousquet J - Genome Biol Evol (2013)

Relationship between sequence conservation with angiosperm genes and A/S ratios in Picea glauca. (a) Distribution of the 13,097 annotated P. glauca genes according to their conservation with angiosperm genes (BlastX e-value <e−10) and their A/S values. Hatched boxes: genes with a homolog in pines or Douglas-fir or both, but not in angiosperms (nor Amborella, nor Arabidopsis, nor rice) (14.6% of the data set) Solid boxes: genes with a homolog in angiosperms (at least with Amborella, or Arabidopsis, or rice) (85.4% of the data set). A total of 401 orphan P. glauca genes with no match (BlastX e-value >e−10) were excluded from this analysis. (b) Histogram illustrating the representation of P. glauca genes in the sets of genes with low (in green) or high (in blue) A/S values, according to their conservation with angiosperm sequences (gene set enrichment analysis, Fisher’s exact test, two-tailed, adjusted P < 0.01). Numbers indicate the average A/S for each data set with high or low A/S ratios.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3814201&req=5

evt143-F5: Relationship between sequence conservation with angiosperm genes and A/S ratios in Picea glauca. (a) Distribution of the 13,097 annotated P. glauca genes according to their conservation with angiosperm genes (BlastX e-value <e−10) and their A/S values. Hatched boxes: genes with a homolog in pines or Douglas-fir or both, but not in angiosperms (nor Amborella, nor Arabidopsis, nor rice) (14.6% of the data set) Solid boxes: genes with a homolog in angiosperms (at least with Amborella, or Arabidopsis, or rice) (85.4% of the data set). A total of 401 orphan P. glauca genes with no match (BlastX e-value >e−10) were excluded from this analysis. (b) Histogram illustrating the representation of P. glauca genes in the sets of genes with low (in green) or high (in blue) A/S values, according to their conservation with angiosperm sequences (gene set enrichment analysis, Fisher’s exact test, two-tailed, adjusted P < 0.01). Numbers indicate the average A/S for each data set with high or low A/S ratios.
Mentions: The PFAM-based approach relied on sequence conservation with known proteins and thus, it excluded the most divergent sequences. The P. glauca data set encompassed 1,911 conifer-specific sequences (see Materials and Methods) and 11,186 sequences that were conserved with angiosperms (BlastX e-value <e−10). According to the gene set enrichment analysis, the most populated list of genes was the one with the lowest A/S ratios (12,001 genes; 86.3%), whereas a minority of genes was characterized with the highest A/S ratios (1,096 genes; 7.9%) (fig. 5a). Interestingly, the conifer-specific genes were more abundant among genes with the highest A/S (45.3%) than among those with the lowest A/S values (11.8%) (fig. 5a). On the opposite, P. glauca genes conserved with an angiosperm homolog (BlastX e-value <e−10) were significantly more populated among the genes with the lowest A/S (gene set enrichment analysis; Fisher’s exact test, two-tailed, adjusted P < 0.01) (fig. 5b). Our approach based on Blast search (e-value <e−10) may have retained spruce sequences sharing short motifs with angiosperms; such motifs could be found in fast-evolving genes. To filter out such cases, we also ran the analysis with a much more stringent e-value cutoff of 0.5 (data not shown). Only 486 putative conifer-specific genes remained; but for this reduced subset, the trend was the same (gene set enrichment analysis; Fisher’s exact test, two-tailed, adjusted P < 0.01) as mentioned above (fig. 5).Fig. 5.—

Bottom Line: Conifer-specific sequences were also generally associated with the highest A/S ratios.These patterns confirm that the breadth of gene expression is a contributing factor to the evolution of nucleotide polymorphism.However, a number of families with high A/S ratios were found specific to P. glauca, suggesting cases of divergent evolution at the functional level.

View Article: PubMed Central - PubMed

Affiliation: Canada Research Chair in Forest and Environmental Genomics, Centre for Forest Research and Institute for Systems and Integrative Biology, Université Laval, Québec, Canada.

ABSTRACT
Gene families differ in composition, expression, and chromosomal organization between conifers and angiosperms, but little is known regarding nucleotide polymorphism. Using various sequencing strategies, an atlas of 212k high-confidence single nucleotide polymorphisms (SNPs) with a validation rate of more than 92% was developed for the conifer white spruce (Picea glauca). Nonsynonymous and synonymous SNPs were annotated over the corresponding 13,498 white spruce genes representative of 2,457 known gene families. Patterns of nucleotide polymorphisms were analyzed by estimating the ratio of nonsynonymous to synonymous numbers of substitutions per site (A/S). A general excess of synonymous SNPs was expected and observed. However, the analysis from several perspectives enabled to identify groups of genes harboring an excess of nonsynonymous SNPs, thus potentially under positive selection. Four known gene families harbored such an excess: dehydrins, ankyrin-repeats, AP2/DREB, and leucine-rich repeat. Conifer-specific sequences were also generally associated with the highest A/S ratios. A/S values were also distributed asymmetrically across genes specifically expressed in megagametophytes, roots, or in both, harboring on average an excess of nonsynonymous SNPs. These patterns confirm that the breadth of gene expression is a contributing factor to the evolution of nucleotide polymorphism. The A/S ratios of Medicago truncatula genes were also analyzed: several gene families shared between P. glauca and M. truncatula data sets had similar excess of synonymous or nonsynonymous SNPs. However, a number of families with high A/S ratios were found specific to P. glauca, suggesting cases of divergent evolution at the functional level.

Show MeSH
Related in: MedlinePlus