Limits...
Conserved DNA motifs in the type II-A CRISPR leader region

View Article: PubMed Central - HTML - PubMed

ABSTRACT

The Clustered Regularly Interspaced Short Palindromic Repeats associated (CRISPR-Cas) systems consist of RNA-protein complexes that provide bacteria and archaea with sequence-specific immunity against bacteriophages, plasmids, and other mobile genetic elements. Bacteria and archaea become immune to phage or plasmid infections by inserting short pieces of the intruder DNA (spacer) site-specifically into the leader-repeat junction in a process called adaptation. Previous studies have shown that parts of the leader region, especially the 3′ end of the leader, are indispensable for adaptation. However, a comprehensive analysis of leader ends remains absent. Here, we have analyzed the leader, repeat, and Cas proteins from 167 type II-A CRISPR loci. Our results indicate two distinct conserved DNA motifs at the 3′ leader end: ATTTGAG (noted previously in the CRISPR1 locus of Streptococcus thermophilus DGCC7710) and a newly defined CTRCGAG, associated with the CRISPR3 locus of S. thermophilus DGCC7710. A third group with a very short CG DNA conservation at the 3′ leader end is observed mostly in lactobacilli. Analysis of the repeats and Cas proteins revealed clustering of these CRISPR components that mirrors the leader motif clustering, in agreement with the coevolution of CRISPR-Cas components. Based on our analysis of the type II-A CRISPR loci, we implicate leader end sequences that could confer site-specificity for the adaptation-machinery in the different subsets of type II-A CRISPR loci.

No MeSH data available.


Phylogenetic analysis of Cas9 and Cas2.(A) Phylogenetic tree generated from the sequence alignment of Cas9. Groups based on the segregation of the Cas1 tree are shown in cyan (Group 1), red (Group 2), and yellow (Group 3). The tree shows 5 different branches with two branches showing the Group 1 leader end motif, one branch showing the Group 2 motif, and one branch representing the less-conserved Group 3 leader end. One of the branches represent a very loosely conserved Group 1 loci. Three members of Group 3 segregated away from the normal cluster, of which Plo NGRI0510Q has a very short Cas9 sequence. Lru ATCC25644 and Lfa KCTC3681 have normal length Cas9 sequences. (B) Phylogenetic tree generated from the sequence alignment of Cas2. All the four branches segregate similarly to those of Cas1 phylogenetic tree. WebLogos for both panels of the figure were generated by aligning the last 7 nucleotides of the leader and the first 4 nucleotides of the repeat from the loci within each corresponding branch.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5382924&req=5

fig-5: Phylogenetic analysis of Cas9 and Cas2.(A) Phylogenetic tree generated from the sequence alignment of Cas9. Groups based on the segregation of the Cas1 tree are shown in cyan (Group 1), red (Group 2), and yellow (Group 3). The tree shows 5 different branches with two branches showing the Group 1 leader end motif, one branch showing the Group 2 motif, and one branch representing the less-conserved Group 3 leader end. One of the branches represent a very loosely conserved Group 1 loci. Three members of Group 3 segregated away from the normal cluster, of which Plo NGRI0510Q has a very short Cas9 sequence. Lru ATCC25644 and Lfa KCTC3681 have normal length Cas9 sequences. (B) Phylogenetic tree generated from the sequence alignment of Cas2. All the four branches segregate similarly to those of Cas1 phylogenetic tree. WebLogos for both panels of the figure were generated by aligning the last 7 nucleotides of the leader and the first 4 nucleotides of the repeat from the loci within each corresponding branch.

Mentions: A similar analysis was done for the Cas2, Csn2, and Cas9 proteins. The sequence alignments generated using the sequences of the corresponding Cas proteins were used to build phylogenetic trees (Figs. 5, 6 and Figs. S4, S5). All the clades in the different trees have similar 3′-leader ends, except for a few differences in the Cas9 phylogenetic tree where some Group 3 members appeared along with Group 1. A closer analysis of the sequences showed high variability in the Cas9 lengths, including an extremely short Cas9 sequence (Plo NGRI0510Q) in the outliers, which may have contributed to the random placement of this Cas9 protein. Cas9 also showed a branch (1b) for Group 1 that did not show prominent leader end conservation as what was observed in branch 1a. Except for the few differences in Cas9, our results indicate the presence of distinct groups within the type II-A CRISPR systems that possess conserved 3′ leader ends and group-specific Cas proteins. It was proposed earlier that the longer version of Csn2 evolved first and the shorter Csn2 proteins evolved from the longer versions (Chylinski et al. 2014). Interestingly, our phylogenetic analysis agrees with this and shows a branch that represents the ancestor with an average Csn2 length of 320 amino acids (Fig. 6). Three main branches evolved from the ancestor and all of them have an average amino acid length of 218–230 amino acids, but varying 3′ leader ends (Table S4). Thus, the ATTTGAG motif is ancestral and universal in the type II-A systems, which later developed to have a similar (ATTTGAG), deviating (CTRCGAG), or less conserved (CG) 3′ leader end, with a corresponding change in the protein sequences of all four type II-A Cas proteins. Examining the lengths of Cas1, Cas2, and Cas9 from different groups, we did not observe a strong correlation between the average length of these Cas proteins and the branching group that they belonged.


Conserved DNA motifs in the type II-A CRISPR leader region
Phylogenetic analysis of Cas9 and Cas2.(A) Phylogenetic tree generated from the sequence alignment of Cas9. Groups based on the segregation of the Cas1 tree are shown in cyan (Group 1), red (Group 2), and yellow (Group 3). The tree shows 5 different branches with two branches showing the Group 1 leader end motif, one branch showing the Group 2 motif, and one branch representing the less-conserved Group 3 leader end. One of the branches represent a very loosely conserved Group 1 loci. Three members of Group 3 segregated away from the normal cluster, of which Plo NGRI0510Q has a very short Cas9 sequence. Lru ATCC25644 and Lfa KCTC3681 have normal length Cas9 sequences. (B) Phylogenetic tree generated from the sequence alignment of Cas2. All the four branches segregate similarly to those of Cas1 phylogenetic tree. WebLogos for both panels of the figure were generated by aligning the last 7 nucleotides of the leader and the first 4 nucleotides of the repeat from the loci within each corresponding branch.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5382924&req=5

fig-5: Phylogenetic analysis of Cas9 and Cas2.(A) Phylogenetic tree generated from the sequence alignment of Cas9. Groups based on the segregation of the Cas1 tree are shown in cyan (Group 1), red (Group 2), and yellow (Group 3). The tree shows 5 different branches with two branches showing the Group 1 leader end motif, one branch showing the Group 2 motif, and one branch representing the less-conserved Group 3 leader end. One of the branches represent a very loosely conserved Group 1 loci. Three members of Group 3 segregated away from the normal cluster, of which Plo NGRI0510Q has a very short Cas9 sequence. Lru ATCC25644 and Lfa KCTC3681 have normal length Cas9 sequences. (B) Phylogenetic tree generated from the sequence alignment of Cas2. All the four branches segregate similarly to those of Cas1 phylogenetic tree. WebLogos for both panels of the figure were generated by aligning the last 7 nucleotides of the leader and the first 4 nucleotides of the repeat from the loci within each corresponding branch.
Mentions: A similar analysis was done for the Cas2, Csn2, and Cas9 proteins. The sequence alignments generated using the sequences of the corresponding Cas proteins were used to build phylogenetic trees (Figs. 5, 6 and Figs. S4, S5). All the clades in the different trees have similar 3′-leader ends, except for a few differences in the Cas9 phylogenetic tree where some Group 3 members appeared along with Group 1. A closer analysis of the sequences showed high variability in the Cas9 lengths, including an extremely short Cas9 sequence (Plo NGRI0510Q) in the outliers, which may have contributed to the random placement of this Cas9 protein. Cas9 also showed a branch (1b) for Group 1 that did not show prominent leader end conservation as what was observed in branch 1a. Except for the few differences in Cas9, our results indicate the presence of distinct groups within the type II-A CRISPR systems that possess conserved 3′ leader ends and group-specific Cas proteins. It was proposed earlier that the longer version of Csn2 evolved first and the shorter Csn2 proteins evolved from the longer versions (Chylinski et al. 2014). Interestingly, our phylogenetic analysis agrees with this and shows a branch that represents the ancestor with an average Csn2 length of 320 amino acids (Fig. 6). Three main branches evolved from the ancestor and all of them have an average amino acid length of 218–230 amino acids, but varying 3′ leader ends (Table S4). Thus, the ATTTGAG motif is ancestral and universal in the type II-A systems, which later developed to have a similar (ATTTGAG), deviating (CTRCGAG), or less conserved (CG) 3′ leader end, with a corresponding change in the protein sequences of all four type II-A Cas proteins. Examining the lengths of Cas1, Cas2, and Cas9 from different groups, we did not observe a strong correlation between the average length of these Cas proteins and the branching group that they belonged.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

The Clustered Regularly Interspaced Short Palindromic Repeats associated (CRISPR-Cas) systems consist of RNA-protein complexes that provide bacteria and archaea with sequence-specific immunity against bacteriophages, plasmids, and other mobile genetic elements. Bacteria and archaea become immune to phage or plasmid infections by inserting short pieces of the intruder DNA (spacer) site-specifically into the leader-repeat junction in a process called adaptation. Previous studies have shown that parts of the leader region, especially the 3′ end of the leader, are indispensable for adaptation. However, a comprehensive analysis of leader ends remains absent. Here, we have analyzed the leader, repeat, and Cas proteins from 167 type II-A CRISPR loci. Our results indicate two distinct conserved DNA motifs at the 3′ leader end: ATTTGAG (noted previously in the CRISPR1 locus of Streptococcus thermophilus DGCC7710) and a newly defined CTRCGAG, associated with the CRISPR3 locus of S. thermophilus DGCC7710. A third group with a very short CG DNA conservation at the 3′ leader end is observed mostly in lactobacilli. Analysis of the repeats and Cas proteins revealed clustering of these CRISPR components that mirrors the leader motif clustering, in agreement with the coevolution of CRISPR-Cas components. Based on our analysis of the type II-A CRISPR loci, we implicate leader end sequences that could confer site-specificity for the adaptation-machinery in the different subsets of type II-A CRISPR loci.

No MeSH data available.