Limits...
A scale invariant clustering of genes on human chromosome 7.

Kendal WS - BMC Evol. Biol. (2004)

Bottom Line: Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution.On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering.However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Radiation Oncology, Ottawa Regional Cancer Centre, 503 Smyth, Ottawa, Ontario K1H 1C4, Canada. wayne.kendal@orcc.on.ca

ABSTRACT

Background: Vertebrate genes often appear to cluster within the background of nontranscribed genomic DNA. Here an analysis of the physical distribution of gene structures on human chromosome 7 was performed to confirm the presence of clustering, and to elucidate possible underlying statistical and biological mechanisms.

Results: Clustering of genes was confirmed by virtue of a variance of the number of genes per unit physical length that exceeded the respective mean. Further evidence for clustering came from a power function relationship between the variance and mean that possessed an exponent of 1.51. This power function implied that the spatial distribution of genes on chromosome 7 was scale invariant, and that the underlying statistical distribution had a Poisson-gamma (PG) form. A PG distribution for the spatial scattering of genes was validated by stringent comparisons of both the predicted variance to mean power function and its cumulative distribution function to data derived from chromosome 7.

Conclusion: The PG distribution was consistent with at least two different biological models: In the microrearrangement model, the number of genes per unit length of chromosome represented the contribution of a random number of smaller chromosomal segments that had originated by random breakage and reconstruction of more primitive chromosomes. Each of these smaller segments would have necessarily contained (on average) a gamma distributed number of genes. In the gene cluster model, genes would be scattered randomly to begin with. Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution. On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering. However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

Show MeSH

Related in: MedlinePlus

Predicted spacing of the segments within chromosome 7. A CDF for the physical distances between the p-termini of the primitive segments of the microrearrangement model is presented here, on the basis of the assumption of an underlying exponential distribution, and the parameters derived from the best fit of the PG CDF to the chromosome 7 data. Granted these assumptions, about 50% of the primitive segments should be separated by distances of 100 kb size or less. If the amounts of intervening DNA between adjacent primitive segments could be considered negligible, then this plot would correspond to the size distribution of the primitive segments. Alternatively, under the gene cluster model, this CDF would correspond to the physical distances between gene cluster sites. Insert: Frequency Histogram for the Number of Genes per Primitive Segment. The parameters provided from the PG model were used to estimate the frequency distribution of genes within the primitive segments of the microrearrangement model. More than 40% of the primitive segments would be expected to contain no recognizable gene structure, and somewhat more than 20% of segments would contain only one gene. Under the alternative gene cluster model, this histogram would represent the frequencies of the number of genes per cluster.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC373443&req=5

Figure 4: Predicted spacing of the segments within chromosome 7. A CDF for the physical distances between the p-termini of the primitive segments of the microrearrangement model is presented here, on the basis of the assumption of an underlying exponential distribution, and the parameters derived from the best fit of the PG CDF to the chromosome 7 data. Granted these assumptions, about 50% of the primitive segments should be separated by distances of 100 kb size or less. If the amounts of intervening DNA between adjacent primitive segments could be considered negligible, then this plot would correspond to the size distribution of the primitive segments. Alternatively, under the gene cluster model, this CDF would correspond to the physical distances between gene cluster sites. Insert: Frequency Histogram for the Number of Genes per Primitive Segment. The parameters provided from the PG model were used to estimate the frequency distribution of genes within the primitive segments of the microrearrangement model. More than 40% of the primitive segments would be expected to contain no recognizable gene structure, and somewhat more than 20% of segments would contain only one gene. Under the alternative gene cluster model, this histogram would represent the frequencies of the number of genes per cluster.

Mentions: where x is the distance between the segments and Δx is the bin size. Figure 4 provides the predicted CDF for the physical spacing between the p termini of the primitive segments. Here we see that about half of the p termini were spaced at least 100 kb apart. If one assumed that there was no intervening DNA between the boundaries of the primitive segments, then these distances predicted would correspond to the lengths of the segments, and average segment length would be about 200/λ·κ (θ) = 150 kb.


A scale invariant clustering of genes on human chromosome 7.

Kendal WS - BMC Evol. Biol. (2004)

Predicted spacing of the segments within chromosome 7. A CDF for the physical distances between the p-termini of the primitive segments of the microrearrangement model is presented here, on the basis of the assumption of an underlying exponential distribution, and the parameters derived from the best fit of the PG CDF to the chromosome 7 data. Granted these assumptions, about 50% of the primitive segments should be separated by distances of 100 kb size or less. If the amounts of intervening DNA between adjacent primitive segments could be considered negligible, then this plot would correspond to the size distribution of the primitive segments. Alternatively, under the gene cluster model, this CDF would correspond to the physical distances between gene cluster sites. Insert: Frequency Histogram for the Number of Genes per Primitive Segment. The parameters provided from the PG model were used to estimate the frequency distribution of genes within the primitive segments of the microrearrangement model. More than 40% of the primitive segments would be expected to contain no recognizable gene structure, and somewhat more than 20% of segments would contain only one gene. Under the alternative gene cluster model, this histogram would represent the frequencies of the number of genes per cluster.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC373443&req=5

Figure 4: Predicted spacing of the segments within chromosome 7. A CDF for the physical distances between the p-termini of the primitive segments of the microrearrangement model is presented here, on the basis of the assumption of an underlying exponential distribution, and the parameters derived from the best fit of the PG CDF to the chromosome 7 data. Granted these assumptions, about 50% of the primitive segments should be separated by distances of 100 kb size or less. If the amounts of intervening DNA between adjacent primitive segments could be considered negligible, then this plot would correspond to the size distribution of the primitive segments. Alternatively, under the gene cluster model, this CDF would correspond to the physical distances between gene cluster sites. Insert: Frequency Histogram for the Number of Genes per Primitive Segment. The parameters provided from the PG model were used to estimate the frequency distribution of genes within the primitive segments of the microrearrangement model. More than 40% of the primitive segments would be expected to contain no recognizable gene structure, and somewhat more than 20% of segments would contain only one gene. Under the alternative gene cluster model, this histogram would represent the frequencies of the number of genes per cluster.
Mentions: where x is the distance between the segments and Δx is the bin size. Figure 4 provides the predicted CDF for the physical spacing between the p termini of the primitive segments. Here we see that about half of the p termini were spaced at least 100 kb apart. If one assumed that there was no intervening DNA between the boundaries of the primitive segments, then these distances predicted would correspond to the lengths of the segments, and average segment length would be about 200/λ·κ (θ) = 150 kb.

Bottom Line: Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution.On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering.However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Radiation Oncology, Ottawa Regional Cancer Centre, 503 Smyth, Ottawa, Ontario K1H 1C4, Canada. wayne.kendal@orcc.on.ca

ABSTRACT

Background: Vertebrate genes often appear to cluster within the background of nontranscribed genomic DNA. Here an analysis of the physical distribution of gene structures on human chromosome 7 was performed to confirm the presence of clustering, and to elucidate possible underlying statistical and biological mechanisms.

Results: Clustering of genes was confirmed by virtue of a variance of the number of genes per unit physical length that exceeded the respective mean. Further evidence for clustering came from a power function relationship between the variance and mean that possessed an exponent of 1.51. This power function implied that the spatial distribution of genes on chromosome 7 was scale invariant, and that the underlying statistical distribution had a Poisson-gamma (PG) form. A PG distribution for the spatial scattering of genes was validated by stringent comparisons of both the predicted variance to mean power function and its cumulative distribution function to data derived from chromosome 7.

Conclusion: The PG distribution was consistent with at least two different biological models: In the microrearrangement model, the number of genes per unit length of chromosome represented the contribution of a random number of smaller chromosomal segments that had originated by random breakage and reconstruction of more primitive chromosomes. Each of these smaller segments would have necessarily contained (on average) a gamma distributed number of genes. In the gene cluster model, genes would be scattered randomly to begin with. Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution. On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering. However, the underlying PG statistical model implicated neutral evolutionary mechanisms as the basis for this clustering.

Show MeSH
Related in: MedlinePlus