Limits...
Genetic distance as an alternative to physical distance for definition of gene units in association studies.

Rodriguez-Fontenla C, Calaza M, Gonzalez A - BMC Genomics (2014)

Bottom Line: The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance.In the analysis of 18 top disease associated loci form the first study, the SRR≥2 genes led to a fully concordant interpretation in 17 loci; the ±50 Kb genes only in 6.A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance.

View Article: PubMed Central - PubMed

Affiliation: Laboratorio de Investigacion 10 and Rheumatology Unit, Instituto de Investigacion Sanitaria - Hospital Clinico Universitario de Santiago, Santiago de Compostela, Spain. antonio.gonzalez.martinez-pedrayo@sergas.es.

ABSTRACT

Background: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ±50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances.

Results: Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ±50 Kb offset that has been common in previous studies. A SRR≥2 was selected because it led to gene extensions with median length=45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ±50 Kb and with the SRR≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance. In the analysis of 18 top disease associated loci form the first study, the SRR≥2 genes led to a fully concordant interpretation in 17 loci; the ±50 Kb genes only in 6. Interpretation of the 43 putative functional genes of the second study based in the SRR≥2 definition only missed 4 of the genes, whereas the based in the ±50 Kb definition missed 10 genes.

Conclusions: A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance. The genome coordinates for each gene are provided to maintain a simple use of the new definitions.

Show MeSH

Related in: MedlinePlus

Length distribution of the 36 044 gene extensions according to the SRR ≥ 2 rule. The 5′ and 3′ extensions for each gene have been separately considered. All followed the SRR ≥ 2 rule except for 2669 of genes near telomeres and centromeres, where information is incomplete and that were replaced by the median length of extensions in their chromosomes; most of them in the 40–50 Kb range.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4048458&req=5

Fig2: Length distribution of the 36 044 gene extensions according to the SRR ≥ 2 rule. The 5′ and 3′ extensions for each gene have been separately considered. All followed the SRR ≥ 2 rule except for 2669 of genes near telomeres and centromeres, where information is incomplete and that were replaced by the median length of extensions in their chromosomes; most of them in the 40–50 Kb range.

Mentions: Concordance between the median length of the extensions based on SRR ≥ 2 and the ± 50 Kb rule made possible a direct comparison. However, the new definitions obtained here account for recombination and are variable (Figure 2), not uniform. They go from less than 10 Kb (8.8% of the extensions) to more than 500 Kb (1.2% of the extensions). The distribution of extension lengths implies that most gene boundaries are discordant between the two definitions. In fact, only 21.3% of the extensions obtained with one definition are within ± 10 Kb of the obtained with the other, and even less frequently (6.1%) when the two extensions of a gene are considered simultaneously.Figure 2


Genetic distance as an alternative to physical distance for definition of gene units in association studies.

Rodriguez-Fontenla C, Calaza M, Gonzalez A - BMC Genomics (2014)

Length distribution of the 36 044 gene extensions according to the SRR ≥ 2 rule. The 5′ and 3′ extensions for each gene have been separately considered. All followed the SRR ≥ 2 rule except for 2669 of genes near telomeres and centromeres, where information is incomplete and that were replaced by the median length of extensions in their chromosomes; most of them in the 40–50 Kb range.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4048458&req=5

Fig2: Length distribution of the 36 044 gene extensions according to the SRR ≥ 2 rule. The 5′ and 3′ extensions for each gene have been separately considered. All followed the SRR ≥ 2 rule except for 2669 of genes near telomeres and centromeres, where information is incomplete and that were replaced by the median length of extensions in their chromosomes; most of them in the 40–50 Kb range.
Mentions: Concordance between the median length of the extensions based on SRR ≥ 2 and the ± 50 Kb rule made possible a direct comparison. However, the new definitions obtained here account for recombination and are variable (Figure 2), not uniform. They go from less than 10 Kb (8.8% of the extensions) to more than 500 Kb (1.2% of the extensions). The distribution of extension lengths implies that most gene boundaries are discordant between the two definitions. In fact, only 21.3% of the extensions obtained with one definition are within ± 10 Kb of the obtained with the other, and even less frequently (6.1%) when the two extensions of a gene are considered simultaneously.Figure 2

Bottom Line: The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance.In the analysis of 18 top disease associated loci form the first study, the SRR≥2 genes led to a fully concordant interpretation in 17 loci; the ±50 Kb genes only in 6.A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance.

View Article: PubMed Central - PubMed

Affiliation: Laboratorio de Investigacion 10 and Rheumatology Unit, Instituto de Investigacion Sanitaria - Hospital Clinico Universitario de Santiago, Santiago de Compostela, Spain. antonio.gonzalez.martinez-pedrayo@sergas.es.

ABSTRACT

Background: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ±50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances.

Results: Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ±50 Kb offset that has been common in previous studies. A SRR≥2 was selected because it led to gene extensions with median length=45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ±50 Kb and with the SRR≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance. In the analysis of 18 top disease associated loci form the first study, the SRR≥2 genes led to a fully concordant interpretation in 17 loci; the ±50 Kb genes only in 6. Interpretation of the 43 putative functional genes of the second study based in the SRR≥2 definition only missed 4 of the genes, whereas the based in the ±50 Kb definition missed 10 genes.

Conclusions: A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance. The genome coordinates for each gene are provided to maintain a simple use of the new definitions.

Show MeSH
Related in: MedlinePlus