Limits...
Conserved DNA motifs in the type II-A CRISPR leader region

View Article: PubMed Central - HTML - PubMed

ABSTRACT

The Clustered Regularly Interspaced Short Palindromic Repeats associated (CRISPR-Cas) systems consist of RNA-protein complexes that provide bacteria and archaea with sequence-specific immunity against bacteriophages, plasmids, and other mobile genetic elements. Bacteria and archaea become immune to phage or plasmid infections by inserting short pieces of the intruder DNA (spacer) site-specifically into the leader-repeat junction in a process called adaptation. Previous studies have shown that parts of the leader region, especially the 3′ end of the leader, are indispensable for adaptation. However, a comprehensive analysis of leader ends remains absent. Here, we have analyzed the leader, repeat, and Cas proteins from 167 type II-A CRISPR loci. Our results indicate two distinct conserved DNA motifs at the 3′ leader end: ATTTGAG (noted previously in the CRISPR1 locus of Streptococcus thermophilus DGCC7710) and a newly defined CTRCGAG, associated with the CRISPR3 locus of S. thermophilus DGCC7710. A third group with a very short CG DNA conservation at the 3′ leader end is observed mostly in lactobacilli. Analysis of the repeats and Cas proteins revealed clustering of these CRISPR components that mirrors the leader motif clustering, in agreement with the coevolution of CRISPR-Cas components. Based on our analysis of the type II-A CRISPR loci, we implicate leader end sequences that could confer site-specificity for the adaptation-machinery in the different subsets of type II-A CRISPR loci.

No MeSH data available.


Sequence alignment of the last 20 nucleotides of the 3′ end of the leader and the first repeat of selected Group 1 species.Height of the letters in the WebLogo indicates the degree of conservation at specific nucleotide locations. The leader-repeat end is conserved as ATTTGAG/GTTT.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5382924&req=5

fig-1: Sequence alignment of the last 20 nucleotides of the 3′ end of the leader and the first repeat of selected Group 1 species.Height of the letters in the WebLogo indicates the degree of conservation at specific nucleotide locations. The leader-repeat end is conserved as ATTTGAG/GTTT.

Mentions: An initial sequence alignment of the last 20 nucleotides of the leader plus the first repeat showed that the 167 loci clustered into distinct groups. These groups had recognizable conservation at the last 7 nucleotides of the 3′ end of the leader and the first 4 nucleotides of the 5′ end of the first repeat, or the leader-repeat junction. To obtain an unbiased separation of the different groups, a Cas1 phylogenetic tree was constructed based on protein sequence similarity. The loci belonging to the different clades of the Cas1 tree were grouped together and a sequence alignment of the last 20 nucleotides of the leader along with the first repeat was performed. In order to facilitate interpretation of the trees and alignments, a smaller representative sample of 60 loci was used to generate the main figures and show the relevant relationships. Figures incorporating all of the 167 loci can be found in the Supplementary Data. Each of the 3 groups was aligned separately to discern the level of conservation within each group (Figs. 1 and 2 and Fig. S1). Strict conservation is seen at the 3′ end of the leader as well as at the 5′ end of the repeat. Group 1 has the 3′ leader end conserved as ATTTGAG (Fig. 1) and Group 2 has the 3′-leader end conserved as CTRCGAG (where R represents a purine) (Fig. 2A). Group 3 has a shorter two nucleotide conservation of CG at the 3′ leader end. In Groups 1 and 2, the last three nucleotides are conserved as GAG (Fig. 2B). An A-rich region is partially conserved adjacent to the CG leader end of Group 3. Interestingly, the CRISPR1 locus of Sth DGCC7710 has the 3′ leader end conserved as ATTTGAG while the CRISPR3 locus has the 3′ leader end conserved as CTACGAG. Of the type II-A CRISPR loci analyzed, 87 belonged to Group1, 55 belonged to Group 2, and 25 belonged to Group 3. Out of the 50 genera analyzed, Group 2 consists of only 5 genera (Streptococcus, Enterococcus, Listeria, Lactobacillus and Weissella) while Group 1 is much more diverse with 42 different genera. Group 3 accounts for 7 genera, but has many loci belonging to the order Lactobacillales. The leader-repeat junction of Groups 1 and 2 is conserved as GAG/GTTT while in Group 3 it is weakly conserved as CG/GTTT.


Conserved DNA motifs in the type II-A CRISPR leader region
Sequence alignment of the last 20 nucleotides of the 3′ end of the leader and the first repeat of selected Group 1 species.Height of the letters in the WebLogo indicates the degree of conservation at specific nucleotide locations. The leader-repeat end is conserved as ATTTGAG/GTTT.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5382924&req=5

fig-1: Sequence alignment of the last 20 nucleotides of the 3′ end of the leader and the first repeat of selected Group 1 species.Height of the letters in the WebLogo indicates the degree of conservation at specific nucleotide locations. The leader-repeat end is conserved as ATTTGAG/GTTT.
Mentions: An initial sequence alignment of the last 20 nucleotides of the leader plus the first repeat showed that the 167 loci clustered into distinct groups. These groups had recognizable conservation at the last 7 nucleotides of the 3′ end of the leader and the first 4 nucleotides of the 5′ end of the first repeat, or the leader-repeat junction. To obtain an unbiased separation of the different groups, a Cas1 phylogenetic tree was constructed based on protein sequence similarity. The loci belonging to the different clades of the Cas1 tree were grouped together and a sequence alignment of the last 20 nucleotides of the leader along with the first repeat was performed. In order to facilitate interpretation of the trees and alignments, a smaller representative sample of 60 loci was used to generate the main figures and show the relevant relationships. Figures incorporating all of the 167 loci can be found in the Supplementary Data. Each of the 3 groups was aligned separately to discern the level of conservation within each group (Figs. 1 and 2 and Fig. S1). Strict conservation is seen at the 3′ end of the leader as well as at the 5′ end of the repeat. Group 1 has the 3′ leader end conserved as ATTTGAG (Fig. 1) and Group 2 has the 3′-leader end conserved as CTRCGAG (where R represents a purine) (Fig. 2A). Group 3 has a shorter two nucleotide conservation of CG at the 3′ leader end. In Groups 1 and 2, the last three nucleotides are conserved as GAG (Fig. 2B). An A-rich region is partially conserved adjacent to the CG leader end of Group 3. Interestingly, the CRISPR1 locus of Sth DGCC7710 has the 3′ leader end conserved as ATTTGAG while the CRISPR3 locus has the 3′ leader end conserved as CTACGAG. Of the type II-A CRISPR loci analyzed, 87 belonged to Group1, 55 belonged to Group 2, and 25 belonged to Group 3. Out of the 50 genera analyzed, Group 2 consists of only 5 genera (Streptococcus, Enterococcus, Listeria, Lactobacillus and Weissella) while Group 1 is much more diverse with 42 different genera. Group 3 accounts for 7 genera, but has many loci belonging to the order Lactobacillales. The leader-repeat junction of Groups 1 and 2 is conserved as GAG/GTTT while in Group 3 it is weakly conserved as CG/GTTT.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

The Clustered Regularly Interspaced Short Palindromic Repeats associated (CRISPR-Cas) systems consist of RNA-protein complexes that provide bacteria and archaea with sequence-specific immunity against bacteriophages, plasmids, and other mobile genetic elements. Bacteria and archaea become immune to phage or plasmid infections by inserting short pieces of the intruder DNA (spacer) site-specifically into the leader-repeat junction in a process called adaptation. Previous studies have shown that parts of the leader region, especially the 3′ end of the leader, are indispensable for adaptation. However, a comprehensive analysis of leader ends remains absent. Here, we have analyzed the leader, repeat, and Cas proteins from 167 type II-A CRISPR loci. Our results indicate two distinct conserved DNA motifs at the 3′ leader end: ATTTGAG (noted previously in the CRISPR1 locus of Streptococcus thermophilus DGCC7710) and a newly defined CTRCGAG, associated with the CRISPR3 locus of S. thermophilus DGCC7710. A third group with a very short CG DNA conservation at the 3′ leader end is observed mostly in lactobacilli. Analysis of the repeats and Cas proteins revealed clustering of these CRISPR components that mirrors the leader motif clustering, in agreement with the coevolution of CRISPR-Cas components. Based on our analysis of the type II-A CRISPR loci, we implicate leader end sequences that could confer site-specificity for the adaptation-machinery in the different subsets of type II-A CRISPR loci.

No MeSH data available.