Limits...
The landscape of nucleotide polymorphism among 13,500 genes of the conifer picea glauca, relationships with functions, and comparison with medicago truncatula.

Pavy N, Deschênes A, Blais S, Lavigne P, Beaulieu J, Isabel N, Mackay J, Bousquet J - Genome Biol Evol (2013)

Bottom Line: Conifer-specific sequences were also generally associated with the highest A/S ratios.These patterns confirm that the breadth of gene expression is a contributing factor to the evolution of nucleotide polymorphism.However, a number of families with high A/S ratios were found specific to P. glauca, suggesting cases of divergent evolution at the functional level.

View Article: PubMed Central - PubMed

Affiliation: Canada Research Chair in Forest and Environmental Genomics, Centre for Forest Research and Institute for Systems and Integrative Biology, Université Laval, Québec, Canada.

ABSTRACT
Gene families differ in composition, expression, and chromosomal organization between conifers and angiosperms, but little is known regarding nucleotide polymorphism. Using various sequencing strategies, an atlas of 212k high-confidence single nucleotide polymorphisms (SNPs) with a validation rate of more than 92% was developed for the conifer white spruce (Picea glauca). Nonsynonymous and synonymous SNPs were annotated over the corresponding 13,498 white spruce genes representative of 2,457 known gene families. Patterns of nucleotide polymorphisms were analyzed by estimating the ratio of nonsynonymous to synonymous numbers of substitutions per site (A/S). A general excess of synonymous SNPs was expected and observed. However, the analysis from several perspectives enabled to identify groups of genes harboring an excess of nonsynonymous SNPs, thus potentially under positive selection. Four known gene families harbored such an excess: dehydrins, ankyrin-repeats, AP2/DREB, and leucine-rich repeat. Conifer-specific sequences were also generally associated with the highest A/S ratios. A/S values were also distributed asymmetrically across genes specifically expressed in megagametophytes, roots, or in both, harboring on average an excess of nonsynonymous SNPs. These patterns confirm that the breadth of gene expression is a contributing factor to the evolution of nucleotide polymorphism. The A/S ratios of Medicago truncatula genes were also analyzed: several gene families shared between P. glauca and M. truncatula data sets had similar excess of synonymous or nonsynonymous SNPs. However, a number of families with high A/S ratios were found specific to P. glauca, suggesting cases of divergent evolution at the functional level.

Show MeSH

Related in: MedlinePlus

Tools and data used to delineate the Picea glauca atlas of 212,765 high-confidence SNPs. 1The P. glauca reference gene catalog was described by Rigault et al. (2011). It encompassed 27,720 sequences representative of distinct transcribed genes. 2http://bioinformatics.bc.edu/marthlab/wiki/index.php/Software (last accessed October 11, 2013). 3Koboldt et al. (2009). 4A nonsingleton SNP is a nucleotide polymorphism that is present on at least two reads. 5Pavy et al. (2013) described the genotyping array and released the genotyping data. 6A full-length insert clone (FLIC) sequence represents a sequence encompassing the entire length of a cloned cDNA insert. 7 & 8 RNA transcript sequence completion was determined by Rigault et al. (2011). Complete cds were similar to a reference protein (Arabidopsis, rice, poplar, grape, Swissprot BlastX e-value <e−10). The sequence was declared as confirmed complete cds if it was similar over the entire protein7. It was declared as predicted complete cds if the cds was similar over part of the protein but the transcript extended long enough on either side to cover the entire protein length. 9The other cds were either partial or complete but with no match with a reference protein.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3814201&req=5

evt143-F1: Tools and data used to delineate the Picea glauca atlas of 212,765 high-confidence SNPs. 1The P. glauca reference gene catalog was described by Rigault et al. (2011). It encompassed 27,720 sequences representative of distinct transcribed genes. 2http://bioinformatics.bc.edu/marthlab/wiki/index.php/Software (last accessed October 11, 2013). 3Koboldt et al. (2009). 4A nonsingleton SNP is a nucleotide polymorphism that is present on at least two reads. 5Pavy et al. (2013) described the genotyping array and released the genotyping data. 6A full-length insert clone (FLIC) sequence represents a sequence encompassing the entire length of a cloned cDNA insert. 7 & 8 RNA transcript sequence completion was determined by Rigault et al. (2011). Complete cds were similar to a reference protein (Arabidopsis, rice, poplar, grape, Swissprot BlastX e-value <e−10). The sequence was declared as confirmed complete cds if it was similar over the entire protein7. It was declared as predicted complete cds if the cds was similar over part of the protein but the transcript extended long enough on either side to cover the entire protein length. 9The other cds were either partial or complete but with no match with a reference protein.

Mentions: A P. glauca SNP atlas was constructed starting from 33.5 million quality reads which were ascribed to 27,645 distinct coding sequences from the P. glauca gene catalog (Rigault et al. 2011) (fig. 1). Then, the identification of high-confidence SNPs considered three main criteria: the sequencing depth, the VarScan P value, and the MAF (table 1). From the sequence alignments, 373,686 nonsingleton SNPs (i.e., a SNP polymorphism that is present in at least two reads) were identified with a MAF ≥0.01. Genotyping data were available for 5,938 of the SNPs (Infinium array PgAS1 in Pavy et al. 2013). They were used to determine the TP rates and assess variations as a function of sequencing depth, the MAF, and the P value obtained with the variant calling software VarScan (table 1). The findings are summarized in supplementary methods S2, Supplementary Material online, and were used to define three main criteria to develop an atlas of high-confidence P. glauca SNPs: 1) selection of nonsingleton SNPs, 2) a sequence depth ≥10, and 3) a VarScan P < 0.10. With these criteria, the overall TP rate was maximized with a value of 92.1%. This validation rate is a conservative estimate, given that the Infinium iSelect (Illumina) genotyping array used, PgAS1, had a maximum success rate of 92.3% for SNPs previously confirmed using genotyping arrays based on the Illumina Golden Gate assay (Pavy et al. 2013). Hence, most of the failures (7.9%) are not due to miscalling SNPs but to the inherent limits of the hyperplex genotyping assay. However, a high false-negative rate (27.8%) was one drawback of the application of stringent criteria aiming to maximize the TP rate, given that 1,662 of the 5,986 in silico SNPs successfully genotyped with the array were not present in our final set of high-confidence SNPs. When we did not apply the alignment depth and the VarScan P value criteria, the TP rate of non-validated SNPs decreased to 87.7% among SNPs tested with the genotyping array.Fig. 1.—


The landscape of nucleotide polymorphism among 13,500 genes of the conifer picea glauca, relationships with functions, and comparison with medicago truncatula.

Pavy N, Deschênes A, Blais S, Lavigne P, Beaulieu J, Isabel N, Mackay J, Bousquet J - Genome Biol Evol (2013)

Tools and data used to delineate the Picea glauca atlas of 212,765 high-confidence SNPs. 1The P. glauca reference gene catalog was described by Rigault et al. (2011). It encompassed 27,720 sequences representative of distinct transcribed genes. 2http://bioinformatics.bc.edu/marthlab/wiki/index.php/Software (last accessed October 11, 2013). 3Koboldt et al. (2009). 4A nonsingleton SNP is a nucleotide polymorphism that is present on at least two reads. 5Pavy et al. (2013) described the genotyping array and released the genotyping data. 6A full-length insert clone (FLIC) sequence represents a sequence encompassing the entire length of a cloned cDNA insert. 7 & 8 RNA transcript sequence completion was determined by Rigault et al. (2011). Complete cds were similar to a reference protein (Arabidopsis, rice, poplar, grape, Swissprot BlastX e-value <e−10). The sequence was declared as confirmed complete cds if it was similar over the entire protein7. It was declared as predicted complete cds if the cds was similar over part of the protein but the transcript extended long enough on either side to cover the entire protein length. 9The other cds were either partial or complete but with no match with a reference protein.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3814201&req=5

evt143-F1: Tools and data used to delineate the Picea glauca atlas of 212,765 high-confidence SNPs. 1The P. glauca reference gene catalog was described by Rigault et al. (2011). It encompassed 27,720 sequences representative of distinct transcribed genes. 2http://bioinformatics.bc.edu/marthlab/wiki/index.php/Software (last accessed October 11, 2013). 3Koboldt et al. (2009). 4A nonsingleton SNP is a nucleotide polymorphism that is present on at least two reads. 5Pavy et al. (2013) described the genotyping array and released the genotyping data. 6A full-length insert clone (FLIC) sequence represents a sequence encompassing the entire length of a cloned cDNA insert. 7 & 8 RNA transcript sequence completion was determined by Rigault et al. (2011). Complete cds were similar to a reference protein (Arabidopsis, rice, poplar, grape, Swissprot BlastX e-value <e−10). The sequence was declared as confirmed complete cds if it was similar over the entire protein7. It was declared as predicted complete cds if the cds was similar over part of the protein but the transcript extended long enough on either side to cover the entire protein length. 9The other cds were either partial or complete but with no match with a reference protein.
Mentions: A P. glauca SNP atlas was constructed starting from 33.5 million quality reads which were ascribed to 27,645 distinct coding sequences from the P. glauca gene catalog (Rigault et al. 2011) (fig. 1). Then, the identification of high-confidence SNPs considered three main criteria: the sequencing depth, the VarScan P value, and the MAF (table 1). From the sequence alignments, 373,686 nonsingleton SNPs (i.e., a SNP polymorphism that is present in at least two reads) were identified with a MAF ≥0.01. Genotyping data were available for 5,938 of the SNPs (Infinium array PgAS1 in Pavy et al. 2013). They were used to determine the TP rates and assess variations as a function of sequencing depth, the MAF, and the P value obtained with the variant calling software VarScan (table 1). The findings are summarized in supplementary methods S2, Supplementary Material online, and were used to define three main criteria to develop an atlas of high-confidence P. glauca SNPs: 1) selection of nonsingleton SNPs, 2) a sequence depth ≥10, and 3) a VarScan P < 0.10. With these criteria, the overall TP rate was maximized with a value of 92.1%. This validation rate is a conservative estimate, given that the Infinium iSelect (Illumina) genotyping array used, PgAS1, had a maximum success rate of 92.3% for SNPs previously confirmed using genotyping arrays based on the Illumina Golden Gate assay (Pavy et al. 2013). Hence, most of the failures (7.9%) are not due to miscalling SNPs but to the inherent limits of the hyperplex genotyping assay. However, a high false-negative rate (27.8%) was one drawback of the application of stringent criteria aiming to maximize the TP rate, given that 1,662 of the 5,986 in silico SNPs successfully genotyped with the array were not present in our final set of high-confidence SNPs. When we did not apply the alignment depth and the VarScan P value criteria, the TP rate of non-validated SNPs decreased to 87.7% among SNPs tested with the genotyping array.Fig. 1.—

Bottom Line: Conifer-specific sequences were also generally associated with the highest A/S ratios.These patterns confirm that the breadth of gene expression is a contributing factor to the evolution of nucleotide polymorphism.However, a number of families with high A/S ratios were found specific to P. glauca, suggesting cases of divergent evolution at the functional level.

View Article: PubMed Central - PubMed

Affiliation: Canada Research Chair in Forest and Environmental Genomics, Centre for Forest Research and Institute for Systems and Integrative Biology, Université Laval, Québec, Canada.

ABSTRACT
Gene families differ in composition, expression, and chromosomal organization between conifers and angiosperms, but little is known regarding nucleotide polymorphism. Using various sequencing strategies, an atlas of 212k high-confidence single nucleotide polymorphisms (SNPs) with a validation rate of more than 92% was developed for the conifer white spruce (Picea glauca). Nonsynonymous and synonymous SNPs were annotated over the corresponding 13,498 white spruce genes representative of 2,457 known gene families. Patterns of nucleotide polymorphisms were analyzed by estimating the ratio of nonsynonymous to synonymous numbers of substitutions per site (A/S). A general excess of synonymous SNPs was expected and observed. However, the analysis from several perspectives enabled to identify groups of genes harboring an excess of nonsynonymous SNPs, thus potentially under positive selection. Four known gene families harbored such an excess: dehydrins, ankyrin-repeats, AP2/DREB, and leucine-rich repeat. Conifer-specific sequences were also generally associated with the highest A/S ratios. A/S values were also distributed asymmetrically across genes specifically expressed in megagametophytes, roots, or in both, harboring on average an excess of nonsynonymous SNPs. These patterns confirm that the breadth of gene expression is a contributing factor to the evolution of nucleotide polymorphism. The A/S ratios of Medicago truncatula genes were also analyzed: several gene families shared between P. glauca and M. truncatula data sets had similar excess of synonymous or nonsynonymous SNPs. However, a number of families with high A/S ratios were found specific to P. glauca, suggesting cases of divergent evolution at the functional level.

Show MeSH
Related in: MedlinePlus