Limits...
Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome.

Amid C, Rehaume LM, Brown KL, Gilbert JG, Dougan G, Hancock RE, Harrow JL - BMC Genomics (2009)

Bottom Line: Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified.Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements.Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ca1@sanger.ac.uk

ABSTRACT

Background: Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region.

Results: The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins.

Conclusions: Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.

Show MeSH

Related in: MedlinePlus

Novel coding and non-coding variants. Vega presenting the region for Defb30 and Defb42, where three new variants per locus were annotated. Defb30: Variants 1 is a known variant with known CDS, variant 2 is a novel variant with the same CDS as variant 1 but has an alternative 3' UTR, variant 3 and 4 are novel variants with putative CDS and different 3'UTR. Defb42: Variant 1 represents a non-coding transcript, variant 2 is a novel variant with the same CDS as the known transcript (3) but with an alternative 5' UTR, variant 3 is a known variant with known CDS and variant 4 is a NMD candidate.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2807441&req=5

Figure 4: Novel coding and non-coding variants. Vega presenting the region for Defb30 and Defb42, where three new variants per locus were annotated. Defb30: Variants 1 is a known variant with known CDS, variant 2 is a novel variant with the same CDS as variant 1 but has an alternative 3' UTR, variant 3 and 4 are novel variants with putative CDS and different 3'UTR. Defb42: Variant 1 represents a non-coding transcript, variant 2 is a novel variant with the same CDS as the known transcript (3) but with an alternative 5' UTR, variant 3 is a known variant with known CDS and variant 4 is a NMD candidate.

Mentions: To complete the defensin gene set in mouse all other loci on Chromosomes 1, 2 and 14 were also annotated. The beta-defensin cluster on Chromosome 2 consisting of 11 gene loci is the largest among them. Interestingly, novel splice variants were annotated for Defb30 and Defb42 on Chromosome 14, which is in contrast to the family members on Chromosome 8. Defb30 has four different splice variants, one of which was previously known; three variants have been tagged as "putative coding" as they have a different first exon compared to the known variant. Two pairs of variants share the same 5' exon but differ in the 3' exons. In each pair, one variant consists of three exons and the other one of two (Figure 4). For Defb42 two coding and two non-coding variants have been identified and annotated. One of the transcripts that seems to lack coding properties has been tagged as a transcript likely to be subject to nonsense-mediated mRNA decay (NMD). All four Defb42 variants have differentially spliced 5' first exon and only one has previously been known in other gene sets. Tissue-specific and species-specific alternative splicing has been previously shown for primate SPAG11 [27]. The beta-defensin Defb42 has been discovered and characterized in mice and its expression has been shown to be epididymis-specific [32]. Looking at the origin of the manually annotated spice variants for Defb42 it is noticeable that all cDNA clones representing the main coding variant are derived from the adult male reproductive tract, specifically the epididymis. However, there is one coding cDNA with an alternative 5' UTR exon compared to the main variant that has been derived from the spleen of a four week old male mouse. The potential NMD splice variant is a two cells egg cDNA and another overlapping non-coding transcript is based on an 11 days embryo whole body cDNA. This observation suggests that alternative splicing for Defb42 is likely to be also development stage specific. An unusual feature was observed for Defb17 and Defb41 on Chromosome 1. These genes share the same start and first exon but differ in their second exon, which is crucial since it encodes the mature peptide. According to our general annotation guidelines these two genes would normally be merged and the two transcripts would represent splice variants of the same gene since they share the first coding exon. Differential splicing seems to be a rare event for defensin genes; however, the observed examples here indicate the potential functional differences for the affected transcripts.


Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome.

Amid C, Rehaume LM, Brown KL, Gilbert JG, Dougan G, Hancock RE, Harrow JL - BMC Genomics (2009)

Novel coding and non-coding variants. Vega presenting the region for Defb30 and Defb42, where three new variants per locus were annotated. Defb30: Variants 1 is a known variant with known CDS, variant 2 is a novel variant with the same CDS as variant 1 but has an alternative 3' UTR, variant 3 and 4 are novel variants with putative CDS and different 3'UTR. Defb42: Variant 1 represents a non-coding transcript, variant 2 is a novel variant with the same CDS as the known transcript (3) but with an alternative 5' UTR, variant 3 is a known variant with known CDS and variant 4 is a NMD candidate.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2807441&req=5

Figure 4: Novel coding and non-coding variants. Vega presenting the region for Defb30 and Defb42, where three new variants per locus were annotated. Defb30: Variants 1 is a known variant with known CDS, variant 2 is a novel variant with the same CDS as variant 1 but has an alternative 3' UTR, variant 3 and 4 are novel variants with putative CDS and different 3'UTR. Defb42: Variant 1 represents a non-coding transcript, variant 2 is a novel variant with the same CDS as the known transcript (3) but with an alternative 5' UTR, variant 3 is a known variant with known CDS and variant 4 is a NMD candidate.
Mentions: To complete the defensin gene set in mouse all other loci on Chromosomes 1, 2 and 14 were also annotated. The beta-defensin cluster on Chromosome 2 consisting of 11 gene loci is the largest among them. Interestingly, novel splice variants were annotated for Defb30 and Defb42 on Chromosome 14, which is in contrast to the family members on Chromosome 8. Defb30 has four different splice variants, one of which was previously known; three variants have been tagged as "putative coding" as they have a different first exon compared to the known variant. Two pairs of variants share the same 5' exon but differ in the 3' exons. In each pair, one variant consists of three exons and the other one of two (Figure 4). For Defb42 two coding and two non-coding variants have been identified and annotated. One of the transcripts that seems to lack coding properties has been tagged as a transcript likely to be subject to nonsense-mediated mRNA decay (NMD). All four Defb42 variants have differentially spliced 5' first exon and only one has previously been known in other gene sets. Tissue-specific and species-specific alternative splicing has been previously shown for primate SPAG11 [27]. The beta-defensin Defb42 has been discovered and characterized in mice and its expression has been shown to be epididymis-specific [32]. Looking at the origin of the manually annotated spice variants for Defb42 it is noticeable that all cDNA clones representing the main coding variant are derived from the adult male reproductive tract, specifically the epididymis. However, there is one coding cDNA with an alternative 5' UTR exon compared to the main variant that has been derived from the spleen of a four week old male mouse. The potential NMD splice variant is a two cells egg cDNA and another overlapping non-coding transcript is based on an 11 days embryo whole body cDNA. This observation suggests that alternative splicing for Defb42 is likely to be also development stage specific. An unusual feature was observed for Defb17 and Defb41 on Chromosome 1. These genes share the same start and first exon but differ in their second exon, which is crucial since it encodes the mature peptide. According to our general annotation guidelines these two genes would normally be merged and the two transcripts would represent splice variants of the same gene since they share the first coding exon. Differential splicing seems to be a rare event for defensin genes; however, the observed examples here indicate the potential functional differences for the affected transcripts.

Bottom Line: Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified.Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements.Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ca1@sanger.ac.uk

ABSTRACT

Background: Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region.

Results: The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins.

Conclusions: Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.

Show MeSH
Related in: MedlinePlus