Limits...
A scale invariant clustering of genes on human chromosome 7.

Kendal WS - BMC Evol. Biol. (2004)

Bottom Line: Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution.On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering.However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Radiation Oncology, Ottawa Regional Cancer Centre, 503 Smyth, Ottawa, Ontario K1H 1C4, Canada. wayne.kendal@orcc.on.ca

ABSTRACT

Background: Vertebrate genes often appear to cluster within the background of nontranscribed genomic DNA. Here an analysis of the physical distribution of gene structures on human chromosome 7 was performed to confirm the presence of clustering, and to elucidate possible underlying statistical and biological mechanisms.

Results: Clustering of genes was confirmed by virtue of a variance of the number of genes per unit physical length that exceeded the respective mean. Further evidence for clustering came from a power function relationship between the variance and mean that possessed an exponent of 1.51. This power function implied that the spatial distribution of genes on chromosome 7 was scale invariant, and that the underlying statistical distribution had a Poisson-gamma (PG) form. A PG distribution for the spatial scattering of genes was validated by stringent comparisons of both the predicted variance to mean power function and its cumulative distribution function to data derived from chromosome 7.

Conclusion: The PG distribution was consistent with at least two different biological models: In the microrearrangement model, the number of genes per unit length of chromosome represented the contribution of a random number of smaller chromosomal segments that had originated by random breakage and reconstruction of more primitive chromosomes. Each of these smaller segments would have necessarily contained (on average) a gamma distributed number of genes. In the gene cluster model, genes would be scattered randomly to begin with. Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution. On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering. However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

Show MeSH

Related in: MedlinePlus

Variance to mean power function. Shown here is a log-log plot of the variance versus the mean number of gene structures per bin, as calculated for a range of bin sizes over chromosome 7. The transformed data points described a straight line on the log-log plot, which implied a power function relationship between the variance and the mean. The solid line represents the theoretical linear relationship determined from the fit of the PG model. A linear model fitted very well to these transformed data as evident from the high value for the correlation coefficient squared r2, and the normal probability plot of the residuals (insert) derived from the differences between the theoretical straight line and the transformed data points. The broken line represents the best fit of a second model, intended for the distribution of genes within conserved segments that was based upon the negative binomial distribution. It did not fit the data as well as did the variance to mean power function.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC373443&req=5

Figure 2: Variance to mean power function. Shown here is a log-log plot of the variance versus the mean number of gene structures per bin, as calculated for a range of bin sizes over chromosome 7. The transformed data points described a straight line on the log-log plot, which implied a power function relationship between the variance and the mean. The solid line represents the theoretical linear relationship determined from the fit of the PG model. A linear model fitted very well to these transformed data as evident from the high value for the correlation coefficient squared r2, and the normal probability plot of the residuals (insert) derived from the differences between the theoretical straight line and the transformed data points. The broken line represents the best fit of a second model, intended for the distribution of genes within conserved segments that was based upon the negative binomial distribution. It did not fit the data as well as did the variance to mean power function.

Mentions: To determine whether this clustering persisted at other measurement scales, the variance and mean number of gene structures per bin were estimated for a range of bin sizes. Figure 2 provides these data on a log-log plot of variance versus mean. The logarithmically transformed points seemed to describe a linear relationship. Indeed the correlation coefficient squared, estimated between the transformed variance and mean estimates, was r2 = 0.997 thus substantiating a linear relationship. As well, the residuals between the logarithmically transformed variables and a trial linear relationship were essentially negligible and normally distributed about zero (Fig. 2 insert). It should be mentioned that the linear relationship tested here against the data in Fig. 2 was obtained not from the regression fit of the logarithmically transformed data, but from a statistical model that was fitted to the chromosome 7 data and that will be presented later in this article.


A scale invariant clustering of genes on human chromosome 7.

Kendal WS - BMC Evol. Biol. (2004)

Variance to mean power function. Shown here is a log-log plot of the variance versus the mean number of gene structures per bin, as calculated for a range of bin sizes over chromosome 7. The transformed data points described a straight line on the log-log plot, which implied a power function relationship between the variance and the mean. The solid line represents the theoretical linear relationship determined from the fit of the PG model. A linear model fitted very well to these transformed data as evident from the high value for the correlation coefficient squared r2, and the normal probability plot of the residuals (insert) derived from the differences between the theoretical straight line and the transformed data points. The broken line represents the best fit of a second model, intended for the distribution of genes within conserved segments that was based upon the negative binomial distribution. It did not fit the data as well as did the variance to mean power function.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC373443&req=5

Figure 2: Variance to mean power function. Shown here is a log-log plot of the variance versus the mean number of gene structures per bin, as calculated for a range of bin sizes over chromosome 7. The transformed data points described a straight line on the log-log plot, which implied a power function relationship between the variance and the mean. The solid line represents the theoretical linear relationship determined from the fit of the PG model. A linear model fitted very well to these transformed data as evident from the high value for the correlation coefficient squared r2, and the normal probability plot of the residuals (insert) derived from the differences between the theoretical straight line and the transformed data points. The broken line represents the best fit of a second model, intended for the distribution of genes within conserved segments that was based upon the negative binomial distribution. It did not fit the data as well as did the variance to mean power function.
Mentions: To determine whether this clustering persisted at other measurement scales, the variance and mean number of gene structures per bin were estimated for a range of bin sizes. Figure 2 provides these data on a log-log plot of variance versus mean. The logarithmically transformed points seemed to describe a linear relationship. Indeed the correlation coefficient squared, estimated between the transformed variance and mean estimates, was r2 = 0.997 thus substantiating a linear relationship. As well, the residuals between the logarithmically transformed variables and a trial linear relationship were essentially negligible and normally distributed about zero (Fig. 2 insert). It should be mentioned that the linear relationship tested here against the data in Fig. 2 was obtained not from the regression fit of the logarithmically transformed data, but from a statistical model that was fitted to the chromosome 7 data and that will be presented later in this article.

Bottom Line: Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution.On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering.However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Radiation Oncology, Ottawa Regional Cancer Centre, 503 Smyth, Ottawa, Ontario K1H 1C4, Canada. wayne.kendal@orcc.on.ca

ABSTRACT

Background: Vertebrate genes often appear to cluster within the background of nontranscribed genomic DNA. Here an analysis of the physical distribution of gene structures on human chromosome 7 was performed to confirm the presence of clustering, and to elucidate possible underlying statistical and biological mechanisms.

Results: Clustering of genes was confirmed by virtue of a variance of the number of genes per unit physical length that exceeded the respective mean. Further evidence for clustering came from a power function relationship between the variance and mean that possessed an exponent of 1.51. This power function implied that the spatial distribution of genes on chromosome 7 was scale invariant, and that the underlying statistical distribution had a Poisson-gamma (PG) form. A PG distribution for the spatial scattering of genes was validated by stringent comparisons of both the predicted variance to mean power function and its cumulative distribution function to data derived from chromosome 7.

Conclusion: The PG distribution was consistent with at least two different biological models: In the microrearrangement model, the number of genes per unit length of chromosome represented the contribution of a random number of smaller chromosomal segments that had originated by random breakage and reconstruction of more primitive chromosomes. Each of these smaller segments would have necessarily contained (on average) a gamma distributed number of genes. In the gene cluster model, genes would be scattered randomly to begin with. Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution. On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering. However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

Show MeSH
Related in: MedlinePlus