Limits...
Identification of single nucleotide polymorphisms and analysis of linkage disequilibrium in sunflower elite inbred lines using the candidate gene approach.

Fusari CM, Lia VV, Hopp HE, Heinz RA, Paniego NB - BMC Plant Biol. (2008)

Bottom Line: In average, 1 SNP was found per 69 nucleotides and 38 indels were identified in the complete data set.Two putative gene pools were identified (G1 and G2), with a large proportion of the inbred lines being assigned to one of them (G1).Knowledge about the patterns of diversity and the genetic relationships between breeding materials could be an invaluable aid in crop improvement strategies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto Nacional de Tecnología Agropecuaria, Instituto de Biotecnología (CNIA), CC 25, Castelar (B1712WAA), Buenos Aires, Argentina. cfusari@cnia.inta.gov.ar

ABSTRACT

Background: Association analysis is a powerful tool to identify gene loci that may contribute to phenotypic variation. This includes the estimation of nucleotide diversity, the assessment of linkage disequilibrium structure (LD) and the evaluation of selection processes. Trait mapping by allele association requires a high-density map, which could be obtained by the addition of Single Nucleotide Polymorphisms (SNPs) and short insertion and/or deletions (indels) to SSR and AFLP genetic maps. Nucleotide diversity analysis of randomly selected candidate regions is a promising approach for the success of association analysis and fine mapping in the sunflower genome. Moreover, knowledge of the distance over which LD persists, in agronomically meaningful sunflower accessions, is important to establish the density of markers and the experimental design for association analysis.

Results: A set of 28 candidate genes related to biotic and abiotic stresses were studied in 19 sunflower inbred lines. A total of 14,348 bp of sequence alignment was analyzed per individual. In average, 1 SNP was found per 69 nucleotides and 38 indels were identified in the complete data set. The mean nucleotide polymorphism was moderate (theta = 0.0056), as expected for inbred materials. The number of haplotypes per region ranged from 1 to 9 (mean = 3.54 +/- 1.88). Model-based population structure analysis allowed detection of admixed individuals within the set of accessions examined. Two putative gene pools were identified (G1 and G2), with a large proportion of the inbred lines being assigned to one of them (G1). Consistent with the absence of population sub-structuring, LD for G1 decayed more rapidly (r2 = 0.48 at 643 bp; trend line, pooled data) than the LD trend line for the entire set of 19 individuals (r2 = 0.64 for the same distance).

Conclusion: Knowledge about the patterns of diversity and the genetic relationships between breeding materials could be an invaluable aid in crop improvement strategies. The relatively high frequency of SNPs within the elite inbred lines studied here, along with the predicted extent of LD over distances of 100 kbp (r2 approximately 0.1) suggest that high resolution association mapping in sunflower could be achieved with marker densities lower than those usually reported in the literature.

Show MeSH

Related in: MedlinePlus

Linkage disequilibrium. A: LD plot from 24 genes pooled together for the 19 inbred lines. The logarithmic trend line reaches a value of 0.64 at 643 bp. B: LD plot from the whole gene data calculated for the G1 subset of individuals identified in the STRUCTURE analysis (HA52, HA61, HA89, HA370, HAR3, HAR5, KLM280, PAC2, RHA266, RHA274, RHA293 and RHA374).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2266750&req=5

Figure 2: Linkage disequilibrium. A: LD plot from 24 genes pooled together for the 19 inbred lines. The logarithmic trend line reaches a value of 0.64 at 643 bp. B: LD plot from the whole gene data calculated for the G1 subset of individuals identified in the STRUCTURE analysis (HA52, HA61, HA89, HA370, HAR3, HAR5, KLM280, PAC2, RHA266, RHA274, RHA293 and RHA374).

Mentions: The presence of population structure can lead to spurious results and must be considered in the statistical analysis [51]. Therefore, as a preliminary step to the assessment of LD, population structure was analyzed using the model-based approach reported by Pritchard et al. [52], employing 136 non-linked SNP loci derived from the 9 genes shared between the 19 inbred lines studied in this work and the 32 wild and cultivated individuals previously reported by Liu and Burke [46]. This test was useful to prevent spurious associations that arise for reasons other than physical proximity and to assess the real extent of LD. The highest log likelihood scores were obtained when the number of populations was set to five. Each individual's inferred ancestry to the five model-based populations is presented in Figure 1. The 19 elite accessions examined here are mainly composed by the contribution of two gene pools (yellow and light-blue, Figure 1), with most of their inferred ancestries being higher than 80%. These two gene pools are also the main constituents, but in a different proportion, of the cultivated accessions analyzed by Liu and Burke [46]. As expected, the wild accessions have a more diverse ancestry, with contributions from all five model-based populations identified. On the basis of population structure analysis, two groups can be defined within the 19 inbred lines studied in this work. The first group (G1) is composed by HA52, HA61, HA89, HA370, HAR3, HAR5, KLM280, PAC2, RHA266, HA274, RHA293 and RHA374 (yellow gene pool); the second group (G2) includes HA292, HA303, HA369, HA821, HAR2, RHA801 and V94 inbred lines (light-blue gene pool). According to the method's assumptions, these two groups are characterized by different sets of allele frequencies. For this reason, pairwise estimates of LD (i.e. r2) were calculated for: (i) the entire set of inbred lines (Figure 2A), and (ii) the subset of inbred lines from G1 (Figure 2B). The G2 subset was not included in this analysis because of its small number of individuals. Figure 2 displays the scatter plots of r2 versus the physical distance between all pairs of SNP alleles within a gene, pooled for the 24 polymorphic regions included in this work. Since all regions are <1 kbp long this analysis reveals disequilibrium patterns at short distance. For the entire set of genotypes, the logarithmic trend line declines very slowly, reaching a value of 0.64 at 643 bp (Figure 2A). Conversely, when the LD plot includes only the genotypes belonging to G1 group, the logarithmic trend decays more rapidly and the value is 0.48 for the same distance (Figure 2B). As expected, there is clearly a bias towards higher levels of LD when the population structure in the sample is not factored into the analysis. Interlocus analyses revealed no LD between loci (data not shown).


Identification of single nucleotide polymorphisms and analysis of linkage disequilibrium in sunflower elite inbred lines using the candidate gene approach.

Fusari CM, Lia VV, Hopp HE, Heinz RA, Paniego NB - BMC Plant Biol. (2008)

Linkage disequilibrium. A: LD plot from 24 genes pooled together for the 19 inbred lines. The logarithmic trend line reaches a value of 0.64 at 643 bp. B: LD plot from the whole gene data calculated for the G1 subset of individuals identified in the STRUCTURE analysis (HA52, HA61, HA89, HA370, HAR3, HAR5, KLM280, PAC2, RHA266, RHA274, RHA293 and RHA374).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2266750&req=5

Figure 2: Linkage disequilibrium. A: LD plot from 24 genes pooled together for the 19 inbred lines. The logarithmic trend line reaches a value of 0.64 at 643 bp. B: LD plot from the whole gene data calculated for the G1 subset of individuals identified in the STRUCTURE analysis (HA52, HA61, HA89, HA370, HAR3, HAR5, KLM280, PAC2, RHA266, RHA274, RHA293 and RHA374).
Mentions: The presence of population structure can lead to spurious results and must be considered in the statistical analysis [51]. Therefore, as a preliminary step to the assessment of LD, population structure was analyzed using the model-based approach reported by Pritchard et al. [52], employing 136 non-linked SNP loci derived from the 9 genes shared between the 19 inbred lines studied in this work and the 32 wild and cultivated individuals previously reported by Liu and Burke [46]. This test was useful to prevent spurious associations that arise for reasons other than physical proximity and to assess the real extent of LD. The highest log likelihood scores were obtained when the number of populations was set to five. Each individual's inferred ancestry to the five model-based populations is presented in Figure 1. The 19 elite accessions examined here are mainly composed by the contribution of two gene pools (yellow and light-blue, Figure 1), with most of their inferred ancestries being higher than 80%. These two gene pools are also the main constituents, but in a different proportion, of the cultivated accessions analyzed by Liu and Burke [46]. As expected, the wild accessions have a more diverse ancestry, with contributions from all five model-based populations identified. On the basis of population structure analysis, two groups can be defined within the 19 inbred lines studied in this work. The first group (G1) is composed by HA52, HA61, HA89, HA370, HAR3, HAR5, KLM280, PAC2, RHA266, HA274, RHA293 and RHA374 (yellow gene pool); the second group (G2) includes HA292, HA303, HA369, HA821, HAR2, RHA801 and V94 inbred lines (light-blue gene pool). According to the method's assumptions, these two groups are characterized by different sets of allele frequencies. For this reason, pairwise estimates of LD (i.e. r2) were calculated for: (i) the entire set of inbred lines (Figure 2A), and (ii) the subset of inbred lines from G1 (Figure 2B). The G2 subset was not included in this analysis because of its small number of individuals. Figure 2 displays the scatter plots of r2 versus the physical distance between all pairs of SNP alleles within a gene, pooled for the 24 polymorphic regions included in this work. Since all regions are <1 kbp long this analysis reveals disequilibrium patterns at short distance. For the entire set of genotypes, the logarithmic trend line declines very slowly, reaching a value of 0.64 at 643 bp (Figure 2A). Conversely, when the LD plot includes only the genotypes belonging to G1 group, the logarithmic trend decays more rapidly and the value is 0.48 for the same distance (Figure 2B). As expected, there is clearly a bias towards higher levels of LD when the population structure in the sample is not factored into the analysis. Interlocus analyses revealed no LD between loci (data not shown).

Bottom Line: In average, 1 SNP was found per 69 nucleotides and 38 indels were identified in the complete data set.Two putative gene pools were identified (G1 and G2), with a large proportion of the inbred lines being assigned to one of them (G1).Knowledge about the patterns of diversity and the genetic relationships between breeding materials could be an invaluable aid in crop improvement strategies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto Nacional de Tecnología Agropecuaria, Instituto de Biotecnología (CNIA), CC 25, Castelar (B1712WAA), Buenos Aires, Argentina. cfusari@cnia.inta.gov.ar

ABSTRACT

Background: Association analysis is a powerful tool to identify gene loci that may contribute to phenotypic variation. This includes the estimation of nucleotide diversity, the assessment of linkage disequilibrium structure (LD) and the evaluation of selection processes. Trait mapping by allele association requires a high-density map, which could be obtained by the addition of Single Nucleotide Polymorphisms (SNPs) and short insertion and/or deletions (indels) to SSR and AFLP genetic maps. Nucleotide diversity analysis of randomly selected candidate regions is a promising approach for the success of association analysis and fine mapping in the sunflower genome. Moreover, knowledge of the distance over which LD persists, in agronomically meaningful sunflower accessions, is important to establish the density of markers and the experimental design for association analysis.

Results: A set of 28 candidate genes related to biotic and abiotic stresses were studied in 19 sunflower inbred lines. A total of 14,348 bp of sequence alignment was analyzed per individual. In average, 1 SNP was found per 69 nucleotides and 38 indels were identified in the complete data set. The mean nucleotide polymorphism was moderate (theta = 0.0056), as expected for inbred materials. The number of haplotypes per region ranged from 1 to 9 (mean = 3.54 +/- 1.88). Model-based population structure analysis allowed detection of admixed individuals within the set of accessions examined. Two putative gene pools were identified (G1 and G2), with a large proportion of the inbred lines being assigned to one of them (G1). Consistent with the absence of population sub-structuring, LD for G1 decayed more rapidly (r2 = 0.48 at 643 bp; trend line, pooled data) than the LD trend line for the entire set of 19 individuals (r2 = 0.64 for the same distance).

Conclusion: Knowledge about the patterns of diversity and the genetic relationships between breeding materials could be an invaluable aid in crop improvement strategies. The relatively high frequency of SNPs within the elite inbred lines studied here, along with the predicted extent of LD over distances of 100 kbp (r2 approximately 0.1) suggest that high resolution association mapping in sunflower could be achieved with marker densities lower than those usually reported in the literature.

Show MeSH
Related in: MedlinePlus