Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence.
Bottom Line: Our initial hypothesis that immunization would induce repertoire convergence proved to be incorrect, and therefore an alternative approach was developed that allows accurate stratification of TcR repertoires and provides novel insights into the nature of CD4 + T-cell receptor recognition.However, specific motifs defined by short stretches of amino acids within the CDR3 region may determine TcR specificity and define a new approach to TcR sequence classification.The analysis was implemented in R and Python, and source code can be found in Supplementary Data. email@example.com Supplementary data are available at Bioinformatics online.
Affiliation: UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK.Show MeSH
Related in: MedlinePlus
Mentions: We looked next in more detail for individual CDR3s shared between immunized mice. No CDR3s were present in all immunized mice but absent from all unimmunized mice. However, 57 CDR3 were present in 75% of immunized mice but absent from all unimmunized mice. In general, these CDR3s were present at low frequencies (Fig. 3a) although a few CDR3s were amplified in individual mice. Inspection of the CDR3 sequences (Fig. 3b) suggested that the CDR3 sequences clustered into families, defined by shared short amino acid sequence motifs. To capture this impression quantitatively, the frequency of each amino acid triplet (sequence of three consecutive amino acids) within the 57 CDR3s was compared with their frequency in a large sample of random CDR3s (Fig. 3c). A number of triplets were over-represented in the shared CDR3 set, suggesting they reflected functional similarity between related sets of CDR3s. We therefore investigated in a more systematic way whether CDR3s from immunized mice shared primary protein sequence features that distinguished them from unimmunized mice. For this purpose, we adapted the BOW approach (also called the n-gram kernel) originally developed in the context of document recognition (Joachims, 1998), together with a clustering step to reduce the dimensionality of the vocabulary. Details of the method are given above. The codebook used for classification was initially chosen arbitrarily to be 100 clusters each containing a subset of contiguous, short (length p, where p typically = 3) stretches of amino acids, from the set of contiguous p-tuples found within the CDR3 dataset. The similarity metric for clustering was based on individual amino acid Atchley factors, reflecting similarities in physicochemical characteristics of the amino acids. The contents of each cluster are given in Supplementary Table S1 (SI), and the sizes of the 100 clusters are shown in Supplementary Figure S1.Fig. 3.
Affiliation: UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK.