Limits...
Inferring processes underlying B-cell repertoire diversity.

Elhanati Y, Sethna Z, Marcou Q, Callan CG, Mora T, Walczak AM - Philos. Trans. R. Soc. Lond., B, Biol. Sci. (2015)

Bottom Line: Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality.We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection).Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire de physique théorique, UMR8549, CNRS and École normale supérieure, 24, rue Lhomond, 75005 Paris, France.

ABSTRACT
We quantify the VDJ recombination and somatic hypermutation processes in human B cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.

No MeSH data available.


Related in: MedlinePlus

(a) BCR heavy chain sequences are formed during VDJ recombination according to a probability distribution Ppre that we infer from the unproductive naive sequence repertoire. The unproductive memory repertoire is used to infer the rate and sequence dependence of somatic hypermutation. Productive sequences are selected for entry into the naive peripheral repertoire with a sequence-dependent factor Q, resulting in the observed distribution of receptor sequences Ppost. (b) Recombined sequences arise via a scenario involving independent choices of which gene segments to recombine as well as of numbers of deletions and insertions. The probability distribution of these choices is not known unambiguously from the observed sequences and is estimated probabilistically in an iterative procedure. (c) The selection factor Q is assumed to be a product of factors for V and J gene choice together with factors qi;L(a) for the choice of the specific amino acid a at each position i in a CDR3 of length L. These factors are determined from the naive productive sequence repertoire by an iterative procedure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4528420&req=5

RSTB20140243F1: (a) BCR heavy chain sequences are formed during VDJ recombination according to a probability distribution Ppre that we infer from the unproductive naive sequence repertoire. The unproductive memory repertoire is used to infer the rate and sequence dependence of somatic hypermutation. Productive sequences are selected for entry into the naive peripheral repertoire with a sequence-dependent factor Q, resulting in the observed distribution of receptor sequences Ppost. (b) Recombined sequences arise via a scenario involving independent choices of which gene segments to recombine as well as of numbers of deletions and insertions. The probability distribution of these choices is not known unambiguously from the observed sequences and is estimated probabilistically in an iterative procedure. (c) The selection factor Q is assumed to be a product of factors for V and J gene choice together with factors qi;L(a) for the choice of the specific amino acid a at each position i in a CDR3 of length L. These factors are determined from the naive productive sequence repertoire by an iterative procedure.

Mentions: The VDJ recombination process is not guaranteed to produce in-frame sequences or, even when sequences are in frame, functional proteins. If the receptor gene from the initially rearranged chromosome is not functional, the second chromosome may be rearranged. If this second recombination event leads to a functional receptor, the cell has two rearranged chromosomes—one functional and expressed, and the other one silenced by allelic exclusion. As a result, the DNA sequence dataset we analysed contains a large fraction of non-productive sequences, which are either out-of-frame or contain a stop codon. These sequences experienced no selection and owe their survival to the receptor expressed by the other chromosome. For this reason, they provide us with the raw, unselected product of the generation process. We used such out-of-frame sequences from the naive subsample to infer the statistics of the VDJ recombination process, and the out-of-frame sequences from the memory subsample to learn the statistics of hypermutations. McCoy et al. [19] previously exploited these differences between in- and out-of-frame sequences in human BCR memory repertoire analysis. The naive productive sequences (in frame and with no stop codon) are expected to have passed a selection process before being admitted to the periphery (henceforth called initial selection, to distinguish it from selection following a recognition event). We used this subsample to learn the selective forces acting on amino acids by comparing how their statistics differ from the raw product of VDJ recombination learned from the naive out-of-frame sequences. Figure 1a summarizes the analysis workflow and emphasizes how the three main processes underlying sequence diversity—VDJ recombination, initial selection, hypermutations—are inferred using three subsamples of the sequences. A typical subsample used in our analysis had approximately 200 000 unique sequences.Figure 1.


Inferring processes underlying B-cell repertoire diversity.

Elhanati Y, Sethna Z, Marcou Q, Callan CG, Mora T, Walczak AM - Philos. Trans. R. Soc. Lond., B, Biol. Sci. (2015)

(a) BCR heavy chain sequences are formed during VDJ recombination according to a probability distribution Ppre that we infer from the unproductive naive sequence repertoire. The unproductive memory repertoire is used to infer the rate and sequence dependence of somatic hypermutation. Productive sequences are selected for entry into the naive peripheral repertoire with a sequence-dependent factor Q, resulting in the observed distribution of receptor sequences Ppost. (b) Recombined sequences arise via a scenario involving independent choices of which gene segments to recombine as well as of numbers of deletions and insertions. The probability distribution of these choices is not known unambiguously from the observed sequences and is estimated probabilistically in an iterative procedure. (c) The selection factor Q is assumed to be a product of factors for V and J gene choice together with factors qi;L(a) for the choice of the specific amino acid a at each position i in a CDR3 of length L. These factors are determined from the naive productive sequence repertoire by an iterative procedure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4528420&req=5

RSTB20140243F1: (a) BCR heavy chain sequences are formed during VDJ recombination according to a probability distribution Ppre that we infer from the unproductive naive sequence repertoire. The unproductive memory repertoire is used to infer the rate and sequence dependence of somatic hypermutation. Productive sequences are selected for entry into the naive peripheral repertoire with a sequence-dependent factor Q, resulting in the observed distribution of receptor sequences Ppost. (b) Recombined sequences arise via a scenario involving independent choices of which gene segments to recombine as well as of numbers of deletions and insertions. The probability distribution of these choices is not known unambiguously from the observed sequences and is estimated probabilistically in an iterative procedure. (c) The selection factor Q is assumed to be a product of factors for V and J gene choice together with factors qi;L(a) for the choice of the specific amino acid a at each position i in a CDR3 of length L. These factors are determined from the naive productive sequence repertoire by an iterative procedure.
Mentions: The VDJ recombination process is not guaranteed to produce in-frame sequences or, even when sequences are in frame, functional proteins. If the receptor gene from the initially rearranged chromosome is not functional, the second chromosome may be rearranged. If this second recombination event leads to a functional receptor, the cell has two rearranged chromosomes—one functional and expressed, and the other one silenced by allelic exclusion. As a result, the DNA sequence dataset we analysed contains a large fraction of non-productive sequences, which are either out-of-frame or contain a stop codon. These sequences experienced no selection and owe their survival to the receptor expressed by the other chromosome. For this reason, they provide us with the raw, unselected product of the generation process. We used such out-of-frame sequences from the naive subsample to infer the statistics of the VDJ recombination process, and the out-of-frame sequences from the memory subsample to learn the statistics of hypermutations. McCoy et al. [19] previously exploited these differences between in- and out-of-frame sequences in human BCR memory repertoire analysis. The naive productive sequences (in frame and with no stop codon) are expected to have passed a selection process before being admitted to the periphery (henceforth called initial selection, to distinguish it from selection following a recognition event). We used this subsample to learn the selective forces acting on amino acids by comparing how their statistics differ from the raw product of VDJ recombination learned from the naive out-of-frame sequences. Figure 1a summarizes the analysis workflow and emphasizes how the three main processes underlying sequence diversity—VDJ recombination, initial selection, hypermutations—are inferred using three subsamples of the sequences. A typical subsample used in our analysis had approximately 200 000 unique sequences.Figure 1.

Bottom Line: Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality.We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection).Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire de physique théorique, UMR8549, CNRS and École normale supérieure, 24, rue Lhomond, 75005 Paris, France.

ABSTRACT
We quantify the VDJ recombination and somatic hypermutation processes in human B cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.

No MeSH data available.


Related in: MedlinePlus