Limits...
The impact of mutation and gene conversion on the local diversification of antigen genes in African trypanosomes.

Gjini E, Haydon DT, Barry JD, Cobbold CA - Mol. Biol. Evol. (2012)

Bottom Line: We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long.However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members.We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

View Article: PubMed Central - PubMed

Affiliation: School of Mathematics and Statistics, College of Science and Engineering, University of Glasgow, Glasgow, United Kingdom. egjini@igc.gulbenkian.pt

ABSTRACT
Patterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

Show MeSH

Related in: MedlinePlus

Goodness-of-fit tests for Model 4 using higher order statistics. (A) Pair correlation functions, denoting the density g(r) of mismatches at distance r from each other (supplementary material SI6, Supplementary Material online). (B) Cumulative next-mismatch distance distribution. The gray shaded area represents 95% credibility intervals for the modeled mismatch patterns (100 replicates, with mean estimates for each parameter as in table 1). The lines represent the respective statistics of observed mismatches from the data set. The panels show the VSG gene pairs in the order 1–5 (row 1), 6–10 (row 2), and 11–15 (row 3).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC3472502&req=5

mss166-F5: Goodness-of-fit tests for Model 4 using higher order statistics. (A) Pair correlation functions, denoting the density g(r) of mismatches at distance r from each other (supplementary material SI6, Supplementary Material online). (B) Cumulative next-mismatch distance distribution. The gray shaded area represents 95% credibility intervals for the modeled mismatch patterns (100 replicates, with mean estimates for each parameter as in table 1). The lines represent the respective statistics of observed mismatches from the data set. The panels show the VSG gene pairs in the order 1–5 (row 1), 6–10 (row 2), and 11–15 (row 3).

Mentions: DIC values for each model indicated that rank order performance of these four formulations supports Model 4 as the best model, despite its large number of parameters, followed by Model 2, Model 3, and Model 1. Applying the Viterbi algorithm (Forney 1973) within the framework of Model 4, to the observed mismatch patterns on all 15 alignments, we were able to “decode” the most likely hidden path, thus obtaining the most likely locations of point mutations and conversion tracts, shown in figure 4. As expected, the empirical conversion lengths obtained from this maximum-likelihood decoding fit well the theoretical geometric distribution with parameter E[λend] predicted by our model. Further, as independent goodness-of-fit tests, we compared pair correlation functions in the original data set with pair correlation functions (Illian et al. 2008) of simulated data for the best model. We also compared the cumulative distribution of next-mismatch distances in the real data and in simulated data with estimated parameters, to verify the quality of fit of Model 4. As shown in figure 5, simulated statistics very closely matched the statistics from the original data set, demonstrating the usefulness of the individual ages model in capturing the diversity pattern displayed by our data set of closely-related VSG pairs.Fig. 5.


The impact of mutation and gene conversion on the local diversification of antigen genes in African trypanosomes.

Gjini E, Haydon DT, Barry JD, Cobbold CA - Mol. Biol. Evol. (2012)

Goodness-of-fit tests for Model 4 using higher order statistics. (A) Pair correlation functions, denoting the density g(r) of mismatches at distance r from each other (supplementary material SI6, Supplementary Material online). (B) Cumulative next-mismatch distance distribution. The gray shaded area represents 95% credibility intervals for the modeled mismatch patterns (100 replicates, with mean estimates for each parameter as in table 1). The lines represent the respective statistics of observed mismatches from the data set. The panels show the VSG gene pairs in the order 1–5 (row 1), 6–10 (row 2), and 11–15 (row 3).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC3472502&req=5

mss166-F5: Goodness-of-fit tests for Model 4 using higher order statistics. (A) Pair correlation functions, denoting the density g(r) of mismatches at distance r from each other (supplementary material SI6, Supplementary Material online). (B) Cumulative next-mismatch distance distribution. The gray shaded area represents 95% credibility intervals for the modeled mismatch patterns (100 replicates, with mean estimates for each parameter as in table 1). The lines represent the respective statistics of observed mismatches from the data set. The panels show the VSG gene pairs in the order 1–5 (row 1), 6–10 (row 2), and 11–15 (row 3).
Mentions: DIC values for each model indicated that rank order performance of these four formulations supports Model 4 as the best model, despite its large number of parameters, followed by Model 2, Model 3, and Model 1. Applying the Viterbi algorithm (Forney 1973) within the framework of Model 4, to the observed mismatch patterns on all 15 alignments, we were able to “decode” the most likely hidden path, thus obtaining the most likely locations of point mutations and conversion tracts, shown in figure 4. As expected, the empirical conversion lengths obtained from this maximum-likelihood decoding fit well the theoretical geometric distribution with parameter E[λend] predicted by our model. Further, as independent goodness-of-fit tests, we compared pair correlation functions in the original data set with pair correlation functions (Illian et al. 2008) of simulated data for the best model. We also compared the cumulative distribution of next-mismatch distances in the real data and in simulated data with estimated parameters, to verify the quality of fit of Model 4. As shown in figure 5, simulated statistics very closely matched the statistics from the original data set, demonstrating the usefulness of the individual ages model in capturing the diversity pattern displayed by our data set of closely-related VSG pairs.Fig. 5.

Bottom Line: We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long.However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members.We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

View Article: PubMed Central - PubMed

Affiliation: School of Mathematics and Statistics, College of Science and Engineering, University of Glasgow, Glasgow, United Kingdom. egjini@igc.gulbenkian.pt

ABSTRACT
Patterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

Show MeSH
Related in: MedlinePlus