Limits...
Genetic distance as an alternative to physical distance for definition of gene units in association studies.

Rodriguez-Fontenla C, Calaza M, Gonzalez A - BMC Genomics (2014)

Bottom Line: The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance.In the analysis of 18 top disease associated loci form the first study, the SRR≥2 genes led to a fully concordant interpretation in 17 loci; the ±50 Kb genes only in 6.A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance.

View Article: PubMed Central - PubMed

Affiliation: Laboratorio de Investigacion 10 and Rheumatology Unit, Instituto de Investigacion Sanitaria - Hospital Clinico Universitario de Santiago, Santiago de Compostela, Spain. antonio.gonzalez.martinez-pedrayo@sergas.es.

ABSTRACT

Background: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ±50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances.

Results: Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ±50 Kb offset that has been common in previous studies. A SRR≥2 was selected because it led to gene extensions with median length=45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ±50 Kb and with the SRR≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance. In the analysis of 18 top disease associated loci form the first study, the SRR≥2 genes led to a fully concordant interpretation in 17 loci; the ±50 Kb genes only in 6. Interpretation of the 43 putative functional genes of the second study based in the SRR≥2 definition only missed 4 of the genes, whereas the based in the ±50 Kb definition missed 10 genes.

Conclusions: A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance. The genome coordinates for each gene are provided to maintain a simple use of the new definitions.

Show MeSH

Related in: MedlinePlus

Distribution of the standardized recombination rate (SRR) in the human genome. Number of 10 Kb bins from the deCODE recombination map [16] within each interval of SRR values.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4048458&req=5

Fig1: Distribution of the standardized recombination rate (SRR) in the human genome. Number of 10 Kb bins from the deCODE recombination map [16] within each interval of SRR values.

Mentions: It is well known that the recombination rate is very irregular along the human genome [15–18]. This irregularity leads to a skewed distribution of SRR along the genome (Figure 1) [16] including a large fraction of bins, 42.6%, with no recombination (SRR = 0) and 78.4% of the bins with less than the average (SRR < 1). Therefore, most of the recombination takes place in the remaining 21.6% bins. Analysis of the SRR distribution showed that extensions of genes based on an SRR ≥ 2 have a median physical length of 45.3 Kb (IQR = 22.9-90.2 Kb). This median length is similar to the most common physical distance extension used until now, which is of 50 Kb. The SRR ≥ 2 is only found in a minor fraction of bins, 12.9%. The remaining 87.1% of the 10 Kb bins showed lower SRR. No detailed optimization of the SRR was attempted preferring to keep the simplicity of an integer value.Figure 1


Genetic distance as an alternative to physical distance for definition of gene units in association studies.

Rodriguez-Fontenla C, Calaza M, Gonzalez A - BMC Genomics (2014)

Distribution of the standardized recombination rate (SRR) in the human genome. Number of 10 Kb bins from the deCODE recombination map [16] within each interval of SRR values.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4048458&req=5

Fig1: Distribution of the standardized recombination rate (SRR) in the human genome. Number of 10 Kb bins from the deCODE recombination map [16] within each interval of SRR values.
Mentions: It is well known that the recombination rate is very irregular along the human genome [15–18]. This irregularity leads to a skewed distribution of SRR along the genome (Figure 1) [16] including a large fraction of bins, 42.6%, with no recombination (SRR = 0) and 78.4% of the bins with less than the average (SRR < 1). Therefore, most of the recombination takes place in the remaining 21.6% bins. Analysis of the SRR distribution showed that extensions of genes based on an SRR ≥ 2 have a median physical length of 45.3 Kb (IQR = 22.9-90.2 Kb). This median length is similar to the most common physical distance extension used until now, which is of 50 Kb. The SRR ≥ 2 is only found in a minor fraction of bins, 12.9%. The remaining 87.1% of the 10 Kb bins showed lower SRR. No detailed optimization of the SRR was attempted preferring to keep the simplicity of an integer value.Figure 1

Bottom Line: The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance.In the analysis of 18 top disease associated loci form the first study, the SRR≥2 genes led to a fully concordant interpretation in 17 loci; the ±50 Kb genes only in 6.A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance.

View Article: PubMed Central - PubMed

Affiliation: Laboratorio de Investigacion 10 and Rheumatology Unit, Instituto de Investigacion Sanitaria - Hospital Clinico Universitario de Santiago, Santiago de Compostela, Spain. antonio.gonzalez.martinez-pedrayo@sergas.es.

ABSTRACT

Background: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ±50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances.

Results: Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ±50 Kb offset that has been common in previous studies. A SRR≥2 was selected because it led to gene extensions with median length=45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ±50 Kb and with the SRR≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance. In the analysis of 18 top disease associated loci form the first study, the SRR≥2 genes led to a fully concordant interpretation in 17 loci; the ±50 Kb genes only in 6. Interpretation of the 43 putative functional genes of the second study based in the SRR≥2 definition only missed 4 of the genes, whereas the based in the ±50 Kb definition missed 10 genes.

Conclusions: A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance. The genome coordinates for each gene are provided to maintain a simple use of the new definitions.

Show MeSH
Related in: MedlinePlus