Limits...
Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence.

Thomas N, Best K, Cinelli M, Reich-Zeliger S, Gal H, Shifrut E, Madi A, Friedman N, Shawe-Taylor J, Chain B - Bioinformatics (2014)

Bottom Line: Our initial hypothesis that immunization would induce repertoire convergence proved to be incorrect, and therefore an alternative approach was developed that allows accurate stratification of TcR repertoires and provides novel insights into the nature of CD4 + T-cell receptor recognition.However, specific motifs defined by short stretches of amino acids within the CDR3 region may determine TcR specificity and define a new approach to TcR sequence classification.The analysis was implemented in R and Python, and source code can be found in Supplementary Data. b.chain@ucl.ac.uk Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK.

Show MeSH

Related in: MedlinePlus

CDR3 sequences shared between immunized mice. (a) The frequency (counts per million) of 57 CDR3s that are present in 75% of the immunized mice, but absent from all unimmunized mice (not shown). Each column represents one mouse, grouped according to time after immunization as shown below the x axis. (b) The amino acid sequences of all 57 CDR3s, clustered according to Levenstein distance. (c) A plot of the frequency of each individual amino acid triplet (i.e. sequence of three consecutive amino acids, see Fig. 1) encoded by the 57 CDR3s, measured within the 57 CDR3s themselves (x axis) versus the frequency of the same triplets within a random sample of 1000 sets of 57 CDR3s selected from the set of CDR3s from all immunized mice (y axis). The diagonal line designates an equal frequency in the shared CDR3s and in the random set. Those triplets that are overrepresented in the shared CDR3s are found in the lower right area of the plot
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4221123&req=5

btu523-F3: CDR3 sequences shared between immunized mice. (a) The frequency (counts per million) of 57 CDR3s that are present in 75% of the immunized mice, but absent from all unimmunized mice (not shown). Each column represents one mouse, grouped according to time after immunization as shown below the x axis. (b) The amino acid sequences of all 57 CDR3s, clustered according to Levenstein distance. (c) A plot of the frequency of each individual amino acid triplet (i.e. sequence of three consecutive amino acids, see Fig. 1) encoded by the 57 CDR3s, measured within the 57 CDR3s themselves (x axis) versus the frequency of the same triplets within a random sample of 1000 sets of 57 CDR3s selected from the set of CDR3s from all immunized mice (y axis). The diagonal line designates an equal frequency in the shared CDR3s and in the random set. Those triplets that are overrepresented in the shared CDR3s are found in the lower right area of the plot

Mentions: We looked next in more detail for individual CDR3s shared between immunized mice. No CDR3s were present in all immunized mice but absent from all unimmunized mice. However, 57 CDR3 were present in 75% of immunized mice but absent from all unimmunized mice. In general, these CDR3s were present at low frequencies (Fig. 3a) although a few CDR3s were amplified in individual mice. Inspection of the CDR3 sequences (Fig. 3b) suggested that the CDR3 sequences clustered into families, defined by shared short amino acid sequence motifs. To capture this impression quantitatively, the frequency of each amino acid triplet (sequence of three consecutive amino acids) within the 57 CDR3s was compared with their frequency in a large sample of random CDR3s (Fig. 3c). A number of triplets were over-represented in the shared CDR3 set, suggesting they reflected functional similarity between related sets of CDR3s. We therefore investigated in a more systematic way whether CDR3s from immunized mice shared primary protein sequence features that distinguished them from unimmunized mice. For this purpose, we adapted the BOW approach (also called the n-gram kernel) originally developed in the context of document recognition (Joachims, 1998), together with a clustering step to reduce the dimensionality of the vocabulary. Details of the method are given above. The codebook used for classification was initially chosen arbitrarily to be 100 clusters each containing a subset of contiguous, short (length p, where p typically = 3) stretches of amino acids, from the set of contiguous p-tuples found within the CDR3 dataset. The similarity metric for clustering was based on individual amino acid Atchley factors, reflecting similarities in physicochemical characteristics of the amino acids. The contents of each cluster are given in Supplementary Table S1 (SI), and the sizes of the 100 clusters are shown in Supplementary Figure S1.Fig. 3.


Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence.

Thomas N, Best K, Cinelli M, Reich-Zeliger S, Gal H, Shifrut E, Madi A, Friedman N, Shawe-Taylor J, Chain B - Bioinformatics (2014)

CDR3 sequences shared between immunized mice. (a) The frequency (counts per million) of 57 CDR3s that are present in 75% of the immunized mice, but absent from all unimmunized mice (not shown). Each column represents one mouse, grouped according to time after immunization as shown below the x axis. (b) The amino acid sequences of all 57 CDR3s, clustered according to Levenstein distance. (c) A plot of the frequency of each individual amino acid triplet (i.e. sequence of three consecutive amino acids, see Fig. 1) encoded by the 57 CDR3s, measured within the 57 CDR3s themselves (x axis) versus the frequency of the same triplets within a random sample of 1000 sets of 57 CDR3s selected from the set of CDR3s from all immunized mice (y axis). The diagonal line designates an equal frequency in the shared CDR3s and in the random set. Those triplets that are overrepresented in the shared CDR3s are found in the lower right area of the plot
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4221123&req=5

btu523-F3: CDR3 sequences shared between immunized mice. (a) The frequency (counts per million) of 57 CDR3s that are present in 75% of the immunized mice, but absent from all unimmunized mice (not shown). Each column represents one mouse, grouped according to time after immunization as shown below the x axis. (b) The amino acid sequences of all 57 CDR3s, clustered according to Levenstein distance. (c) A plot of the frequency of each individual amino acid triplet (i.e. sequence of three consecutive amino acids, see Fig. 1) encoded by the 57 CDR3s, measured within the 57 CDR3s themselves (x axis) versus the frequency of the same triplets within a random sample of 1000 sets of 57 CDR3s selected from the set of CDR3s from all immunized mice (y axis). The diagonal line designates an equal frequency in the shared CDR3s and in the random set. Those triplets that are overrepresented in the shared CDR3s are found in the lower right area of the plot
Mentions: We looked next in more detail for individual CDR3s shared between immunized mice. No CDR3s were present in all immunized mice but absent from all unimmunized mice. However, 57 CDR3 were present in 75% of immunized mice but absent from all unimmunized mice. In general, these CDR3s were present at low frequencies (Fig. 3a) although a few CDR3s were amplified in individual mice. Inspection of the CDR3 sequences (Fig. 3b) suggested that the CDR3 sequences clustered into families, defined by shared short amino acid sequence motifs. To capture this impression quantitatively, the frequency of each amino acid triplet (sequence of three consecutive amino acids) within the 57 CDR3s was compared with their frequency in a large sample of random CDR3s (Fig. 3c). A number of triplets were over-represented in the shared CDR3 set, suggesting they reflected functional similarity between related sets of CDR3s. We therefore investigated in a more systematic way whether CDR3s from immunized mice shared primary protein sequence features that distinguished them from unimmunized mice. For this purpose, we adapted the BOW approach (also called the n-gram kernel) originally developed in the context of document recognition (Joachims, 1998), together with a clustering step to reduce the dimensionality of the vocabulary. Details of the method are given above. The codebook used for classification was initially chosen arbitrarily to be 100 clusters each containing a subset of contiguous, short (length p, where p typically = 3) stretches of amino acids, from the set of contiguous p-tuples found within the CDR3 dataset. The similarity metric for clustering was based on individual amino acid Atchley factors, reflecting similarities in physicochemical characteristics of the amino acids. The contents of each cluster are given in Supplementary Table S1 (SI), and the sizes of the 100 clusters are shown in Supplementary Figure S1.Fig. 3.

Bottom Line: Our initial hypothesis that immunization would induce repertoire convergence proved to be incorrect, and therefore an alternative approach was developed that allows accurate stratification of TcR repertoires and provides novel insights into the nature of CD4 + T-cell receptor recognition.However, specific motifs defined by short stretches of amino acids within the CDR3 region may determine TcR specificity and define a new approach to TcR sequence classification.The analysis was implemented in R and Python, and source code can be found in Supplementary Data. b.chain@ucl.ac.uk Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK.

Show MeSH
Related in: MedlinePlus