Limits...
Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome.

Amid C, Rehaume LM, Brown KL, Gilbert JG, Dougan G, Hancock RE, Harrow JL - BMC Genomics (2009)

Bottom Line: Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified.Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements.Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ca1@sanger.ac.uk

ABSTRACT

Background: Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region.

Results: The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins.

Conclusions: Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.

Show MeSH

Related in: MedlinePlus

The polymorphic Defcr5 locus. A protein alignment between all potential Defcr5 copies and P28312.2. Variation in amino acids is highlighted in red.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2807441&req=5

Figure 3: The polymorphic Defcr5 locus. A protein alignment between all potential Defcr5 copies and P28312.2. Variation in amino acids is highlighted in red.

Mentions: Twenty six apparently intact defensin-related cryptdin genes and 22 related pseudogenes were observed within the mouse alpha-defensin cluster (Figure 1 and 2A). Furthermore six MYM-Type zinc finger protein pseudogenes as well as three ribosomal protein pseudogenes are also located in this region. Within the alpha-defensin gene cluster there is a region containing several genes very similar to Defcr5 but no identical match of the Swiss-Prot entry P28312.2 for Defcr5, which is derived from the genomic sequence of the 129 mouse strain. Two of these loci, OTTMUSG00000019785 and OTTMUSG00000018259 show only one amino acid difference in their signal peptides compared to the Defcr5 Swiss-Prot entry P28312.2 (Figure 3). Locus OTTMUSG00000018258 shows one amino acid difference in its pro-segment to P28312.2 and locus OTTMUSG00000019924 differs in one amino acid in the signal peptide and one in the pro-segment compared to P28312.2. These genes all have identical mature peptides compared to the P28312.2 Defcr5 sequence and have therefore been tagged as novel protein similar to defensin related cryptdin 5. Questions arise as to whether a common sequence for the mature peptide qualifies these genes to be named the same as a published sequence, whether they have the same functionality and how differences in the signal- and/or pro-segment might affect their expression. Consequently, these Defcr5 loci might be the result of chromosomal duplications or involved in copy number variation similar to a number of defensin genes where we observed 100% identity throughout the entire sequence (see below). Locus OTTMUSG00000019786 also has a best match to Defcr5 but there are three amino acid differences, one in the signal peptide, one in the pro-segment and another one in the mature peptide compared to P28312.2. Therefore, this locus has been annotated as a novel defensin related cryptdin without commenting on any similarity to Defcr5, since there are clear precedents for applying different names to defensins with small sequence changes. In some cases 100% identical copies of a gene were identified. One example of this is represented by two copies for Defcr23. To clarify this situation we have tagged one copy as Defcr23 and the other one as 'novel defensin related cryptdin identical to Defcr23'. Two other alpha-defensin genes, Defcr3 and Defcr20, have duplications in the mouse genome. Genes with duplicated copies are ideal candidates for copy number variation.


Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome.

Amid C, Rehaume LM, Brown KL, Gilbert JG, Dougan G, Hancock RE, Harrow JL - BMC Genomics (2009)

The polymorphic Defcr5 locus. A protein alignment between all potential Defcr5 copies and P28312.2. Variation in amino acids is highlighted in red.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2807441&req=5

Figure 3: The polymorphic Defcr5 locus. A protein alignment between all potential Defcr5 copies and P28312.2. Variation in amino acids is highlighted in red.
Mentions: Twenty six apparently intact defensin-related cryptdin genes and 22 related pseudogenes were observed within the mouse alpha-defensin cluster (Figure 1 and 2A). Furthermore six MYM-Type zinc finger protein pseudogenes as well as three ribosomal protein pseudogenes are also located in this region. Within the alpha-defensin gene cluster there is a region containing several genes very similar to Defcr5 but no identical match of the Swiss-Prot entry P28312.2 for Defcr5, which is derived from the genomic sequence of the 129 mouse strain. Two of these loci, OTTMUSG00000019785 and OTTMUSG00000018259 show only one amino acid difference in their signal peptides compared to the Defcr5 Swiss-Prot entry P28312.2 (Figure 3). Locus OTTMUSG00000018258 shows one amino acid difference in its pro-segment to P28312.2 and locus OTTMUSG00000019924 differs in one amino acid in the signal peptide and one in the pro-segment compared to P28312.2. These genes all have identical mature peptides compared to the P28312.2 Defcr5 sequence and have therefore been tagged as novel protein similar to defensin related cryptdin 5. Questions arise as to whether a common sequence for the mature peptide qualifies these genes to be named the same as a published sequence, whether they have the same functionality and how differences in the signal- and/or pro-segment might affect their expression. Consequently, these Defcr5 loci might be the result of chromosomal duplications or involved in copy number variation similar to a number of defensin genes where we observed 100% identity throughout the entire sequence (see below). Locus OTTMUSG00000019786 also has a best match to Defcr5 but there are three amino acid differences, one in the signal peptide, one in the pro-segment and another one in the mature peptide compared to P28312.2. Therefore, this locus has been annotated as a novel defensin related cryptdin without commenting on any similarity to Defcr5, since there are clear precedents for applying different names to defensins with small sequence changes. In some cases 100% identical copies of a gene were identified. One example of this is represented by two copies for Defcr23. To clarify this situation we have tagged one copy as Defcr23 and the other one as 'novel defensin related cryptdin identical to Defcr23'. Two other alpha-defensin genes, Defcr3 and Defcr20, have duplications in the mouse genome. Genes with duplicated copies are ideal candidates for copy number variation.

Bottom Line: Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified.Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements.Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ca1@sanger.ac.uk

ABSTRACT

Background: Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region.

Results: The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins.

Conclusions: Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.

Show MeSH
Related in: MedlinePlus