Limits...
A scale invariant clustering of genes on human chromosome 7.

Kendal WS - BMC Evol. Biol. (2004)

Bottom Line: Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution.On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering.However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Radiation Oncology, Ottawa Regional Cancer Centre, 503 Smyth, Ottawa, Ontario K1H 1C4, Canada. wayne.kendal@orcc.on.ca

ABSTRACT

Background: Vertebrate genes often appear to cluster within the background of nontranscribed genomic DNA. Here an analysis of the physical distribution of gene structures on human chromosome 7 was performed to confirm the presence of clustering, and to elucidate possible underlying statistical and biological mechanisms.

Results: Clustering of genes was confirmed by virtue of a variance of the number of genes per unit physical length that exceeded the respective mean. Further evidence for clustering came from a power function relationship between the variance and mean that possessed an exponent of 1.51. This power function implied that the spatial distribution of genes on chromosome 7 was scale invariant, and that the underlying statistical distribution had a Poisson-gamma (PG) form. A PG distribution for the spatial scattering of genes was validated by stringent comparisons of both the predicted variance to mean power function and its cumulative distribution function to data derived from chromosome 7.

Conclusion: The PG distribution was consistent with at least two different biological models: In the microrearrangement model, the number of genes per unit length of chromosome represented the contribution of a random number of smaller chromosomal segments that had originated by random breakage and reconstruction of more primitive chromosomes. Each of these smaller segments would have necessarily contained (on average) a gamma distributed number of genes. In the gene cluster model, genes would be scattered randomly to begin with. Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution. On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering. However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

Show MeSH

Related in: MedlinePlus

Cumulative distribution function. The empirical CDF, derived from the numbers of genes per 200 kb bins within chromosome 7, was plotted here as data points. The corresponding solid curve represents the least squares fit of the PG model to these data. The PG model fitted very well to these data as evident from the low value for the Kolmogorov Smirnov Dmax, and the normal probability plot of the residuals (insert).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC373443&req=5

Figure 3: Cumulative distribution function. The empirical CDF, derived from the numbers of genes per 200 kb bins within chromosome 7, was plotted here as data points. The corresponding solid curve represents the least squares fit of the PG model to these data. The PG model fitted very well to these data as evident from the low value for the Kolmogorov Smirnov Dmax, and the normal probability plot of the residuals (insert).

Mentions: How well does the PG distribution fit the observed data? Figure 3 provides the empirical CDF, as obtained from a bin size of 200 kb and fitted to the theoretical PG CDF (Eq. 1). The fit was very good, with at most a 1.4% deviation between theory and observation. An analysis of the residuals (Fig. 3 insert) revealed that they were essentially negligible and normally distributed about zero. A Kolmogorov Smirnov test additionally confirmed an acceptable fit of the theoretical PG model to the empirical CDF.


A scale invariant clustering of genes on human chromosome 7.

Kendal WS - BMC Evol. Biol. (2004)

Cumulative distribution function. The empirical CDF, derived from the numbers of genes per 200 kb bins within chromosome 7, was plotted here as data points. The corresponding solid curve represents the least squares fit of the PG model to these data. The PG model fitted very well to these data as evident from the low value for the Kolmogorov Smirnov Dmax, and the normal probability plot of the residuals (insert).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC373443&req=5

Figure 3: Cumulative distribution function. The empirical CDF, derived from the numbers of genes per 200 kb bins within chromosome 7, was plotted here as data points. The corresponding solid curve represents the least squares fit of the PG model to these data. The PG model fitted very well to these data as evident from the low value for the Kolmogorov Smirnov Dmax, and the normal probability plot of the residuals (insert).
Mentions: How well does the PG distribution fit the observed data? Figure 3 provides the empirical CDF, as obtained from a bin size of 200 kb and fitted to the theoretical PG CDF (Eq. 1). The fit was very good, with at most a 1.4% deviation between theory and observation. An analysis of the residuals (Fig. 3 insert) revealed that they were essentially negligible and normally distributed about zero. A Kolmogorov Smirnov test additionally confirmed an acceptable fit of the theoretical PG model to the empirical CDF.

Bottom Line: Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution.On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering.However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Radiation Oncology, Ottawa Regional Cancer Centre, 503 Smyth, Ottawa, Ontario K1H 1C4, Canada. wayne.kendal@orcc.on.ca

ABSTRACT

Background: Vertebrate genes often appear to cluster within the background of nontranscribed genomic DNA. Here an analysis of the physical distribution of gene structures on human chromosome 7 was performed to confirm the presence of clustering, and to elucidate possible underlying statistical and biological mechanisms.

Results: Clustering of genes was confirmed by virtue of a variance of the number of genes per unit physical length that exceeded the respective mean. Further evidence for clustering came from a power function relationship between the variance and mean that possessed an exponent of 1.51. This power function implied that the spatial distribution of genes on chromosome 7 was scale invariant, and that the underlying statistical distribution had a Poisson-gamma (PG) form. A PG distribution for the spatial scattering of genes was validated by stringent comparisons of both the predicted variance to mean power function and its cumulative distribution function to data derived from chromosome 7.

Conclusion: The PG distribution was consistent with at least two different biological models: In the microrearrangement model, the number of genes per unit length of chromosome represented the contribution of a random number of smaller chromosomal segments that had originated by random breakage and reconstruction of more primitive chromosomes. Each of these smaller segments would have necessarily contained (on average) a gamma distributed number of genes. In the gene cluster model, genes would be scattered randomly to begin with. Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution. On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering. However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

Show MeSH
Related in: MedlinePlus