Limits...
The impact of mutation and gene conversion on the local diversification of antigen genes in African trypanosomes.

Gjini E, Haydon DT, Barry JD, Cobbold CA - Mol. Biol. Evol. (2012)

Bottom Line: We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long.However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members.We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

View Article: PubMed Central - PubMed

Affiliation: School of Mathematics and Statistics, College of Science and Engineering, University of Glasgow, Glasgow, United Kingdom. egjini@igc.gulbenkian.pt

ABSTRACT
Patterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

Show MeSH

Related in: MedlinePlus

The posterior probabilities of gene conversion tracts in Model 4. This model gives results, which are the best among the four models considered, on the basis of both DIC and log likelihood. Because a Bayesian approach is adopted, the uncertainty around the most likely hidden path is given in the posterior probabilities of each inter-mismatch segment being of type within or between conversion. The triplets of closely related genes are presented in each row panel in the order (1,2), (1,3), (2,3) for each triplet.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC3472502&req=5

mss166-F3: The posterior probabilities of gene conversion tracts in Model 4. This model gives results, which are the best among the four models considered, on the basis of both DIC and log likelihood. Because a Bayesian approach is adopted, the uncertainty around the most likely hidden path is given in the posterior probabilities of each inter-mismatch segment being of type within or between conversion. The triplets of closely related genes are presented in each row panel in the order (1,2), (1,3), (2,3) for each triplet.

Mentions: In Model 1, the estimated mean probability of conversion was estimated to be 0.0099 per base pair, whereas the mean probability of point mutation was estimated to be 0.0410, i.e., about four times higher. This suggests that mutation events are more frequent than conversion events with other members of the gene archive in the short time scale after duplication. In Model 2, we considered the case of each triplet being governed by distinct values of parameters. We found that the estimated λbegin was in the range 0.0038–0.0175 across triplets, a result not very far from the estimate obtained with Model 1. The point mutation probability also showed some variation 0.0325–0.0623, but the values predicted for each triplet stayed within the same order of magnitude. The ratio m/λbegin increased slightly in Model 3, ≈4.7, strengthening the dominance of the point mutation process. In Model 4, because the effective event probabilities on each gene pair are obtained by the baseline values in the reference pair multiplied by the corresponding relative ages, the λbegin and m values are pair specific. The values inferred in this model for the reference pair are lower than the values obtained in Model 1, for example. The ratio m/λbegin, however, is invariant across gene pairs and independent of their relative ages. We observe that point mutations in this last model occur five times more frequently than conversion events (table 1). Note that to obtain the conversion event and mutation probability per gene per nucleotide, the obtained estimates across all models need to be divided by 2. The posterior probabilities associated with the location of imported gene conversion tracts for Model 4 are shown in figure 3.Fig. 3.


The impact of mutation and gene conversion on the local diversification of antigen genes in African trypanosomes.

Gjini E, Haydon DT, Barry JD, Cobbold CA - Mol. Biol. Evol. (2012)

The posterior probabilities of gene conversion tracts in Model 4. This model gives results, which are the best among the four models considered, on the basis of both DIC and log likelihood. Because a Bayesian approach is adopted, the uncertainty around the most likely hidden path is given in the posterior probabilities of each inter-mismatch segment being of type within or between conversion. The triplets of closely related genes are presented in each row panel in the order (1,2), (1,3), (2,3) for each triplet.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC3472502&req=5

mss166-F3: The posterior probabilities of gene conversion tracts in Model 4. This model gives results, which are the best among the four models considered, on the basis of both DIC and log likelihood. Because a Bayesian approach is adopted, the uncertainty around the most likely hidden path is given in the posterior probabilities of each inter-mismatch segment being of type within or between conversion. The triplets of closely related genes are presented in each row panel in the order (1,2), (1,3), (2,3) for each triplet.
Mentions: In Model 1, the estimated mean probability of conversion was estimated to be 0.0099 per base pair, whereas the mean probability of point mutation was estimated to be 0.0410, i.e., about four times higher. This suggests that mutation events are more frequent than conversion events with other members of the gene archive in the short time scale after duplication. In Model 2, we considered the case of each triplet being governed by distinct values of parameters. We found that the estimated λbegin was in the range 0.0038–0.0175 across triplets, a result not very far from the estimate obtained with Model 1. The point mutation probability also showed some variation 0.0325–0.0623, but the values predicted for each triplet stayed within the same order of magnitude. The ratio m/λbegin increased slightly in Model 3, ≈4.7, strengthening the dominance of the point mutation process. In Model 4, because the effective event probabilities on each gene pair are obtained by the baseline values in the reference pair multiplied by the corresponding relative ages, the λbegin and m values are pair specific. The values inferred in this model for the reference pair are lower than the values obtained in Model 1, for example. The ratio m/λbegin, however, is invariant across gene pairs and independent of their relative ages. We observe that point mutations in this last model occur five times more frequently than conversion events (table 1). Note that to obtain the conversion event and mutation probability per gene per nucleotide, the obtained estimates across all models need to be divided by 2. The posterior probabilities associated with the location of imported gene conversion tracts for Model 4 are shown in figure 3.Fig. 3.

Bottom Line: We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long.However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members.We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

View Article: PubMed Central - PubMed

Affiliation: School of Mathematics and Statistics, College of Science and Engineering, University of Glasgow, Glasgow, United Kingdom. egjini@igc.gulbenkian.pt

ABSTRACT
Patterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

Show MeSH
Related in: MedlinePlus