Limits...
A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data.

Mo F, Hong X, Gao F, Du L, Wang J, Omenn GS, Lin B - BMC Bioinformatics (2008)

Bottom Line: It is estimated that about 74% of multi-exon human genes have alternative splicing.We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database.This database will be useful in annotating genome structures using rapidly accumulating proteomics data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Systems Biology Division, Zhejiang-California Nanosystems Institute (ZCNI) of Zhejiang University, Zhejiang University Huajiachi Campus, Hangzhou, PR China. mofan.hz@gmail.com

ABSTRACT

Background: Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched.

Results: We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database. Using our liver cancer MS/MS dataset, we identified a total of 488 non-redundant peptides that represent putative exon skipping events.

Conclusion: Our exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced (exon skipping) protein isoforms using mass spectrometry data. This database will be useful in annotating genome structures using rapidly accumulating proteomics data.

Show MeSH

Related in: MedlinePlus

Illustration of splicing of exons 3 and 7 of the proto-oncogene c-CBL that generates the peptide KAFGENYLFPDGR we identified by MS/MS. Translated amino acid residues (one letter code) were aligned with the combined exon 3 and 7 DNA sequences. The double forward-slash indicates the exon/exon junction. The underlined amino acid residues indicate the peptide corresponding to the mass spectrum we identified.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2636810&req=5

Figure 2: Illustration of splicing of exons 3 and 7 of the proto-oncogene c-CBL that generates the peptide KAFGENYLFPDGR we identified by MS/MS. Translated amino acid residues (one letter code) were aligned with the combined exon 3 and 7 DNA sequences. The double forward-slash indicates the exon/exon junction. The underlined amino acid residues indicate the peptide corresponding to the mass spectrum we identified.

Mentions: Figure 1 (top panel) shows the spectrum of a novel junction peptide LDEEVKIQR, which was not previously identified in either the ECgene or the human non-redundant database. This peptide was observed in both liver cancer and normal datasets. The peptide matched to gene ENSG00000143375, which encodes cingulin. The cingulin gene has 28 exons and is transcribed into a 5142 nucleotide mRNA (NM_020770). The b6 ion for LDEEVK and the b7 ion for LDEEVKI are clearly identified (Figure 1, top panel). The junction position is between amino acid K and I (LDEEVK/IQR, where "/" stands for junction position). It is derived from joining ENSE00000959672 (exon 6) and ENSE00000959682 (exon 21) of the cingulin gene. Cingulin is a protein with modular coiled coil domain, is localized in tight junctions, and may play a role in regulating paracellular permeability [11,12]. The significance and function of this spliced isoform remains to be investigated. The bottom panel of figure 1 shows the spectrum of the peptide KAFGENYLFPDGR. This peptide is derived from splicing of exon ENSE00001240795 (exon 3) and ENSE00001128289 (exon 7) of ENSG00000110395, which encodes for the proto-oncogene c-CBL (E3 ubiquitin-protein ligase CBL). Figure 2 illustrates the splicing event involving exon 3 and 7 of the gene. In our analysis, about 40% of the exon skipping events skipped 7 exons and about 25% of them skipped 14 exons (Additional file 2). Our two examples skip 4 and 15 exons respectively and are typical of the observed skips.


A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data.

Mo F, Hong X, Gao F, Du L, Wang J, Omenn GS, Lin B - BMC Bioinformatics (2008)

Illustration of splicing of exons 3 and 7 of the proto-oncogene c-CBL that generates the peptide KAFGENYLFPDGR we identified by MS/MS. Translated amino acid residues (one letter code) were aligned with the combined exon 3 and 7 DNA sequences. The double forward-slash indicates the exon/exon junction. The underlined amino acid residues indicate the peptide corresponding to the mass spectrum we identified.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2636810&req=5

Figure 2: Illustration of splicing of exons 3 and 7 of the proto-oncogene c-CBL that generates the peptide KAFGENYLFPDGR we identified by MS/MS. Translated amino acid residues (one letter code) were aligned with the combined exon 3 and 7 DNA sequences. The double forward-slash indicates the exon/exon junction. The underlined amino acid residues indicate the peptide corresponding to the mass spectrum we identified.
Mentions: Figure 1 (top panel) shows the spectrum of a novel junction peptide LDEEVKIQR, which was not previously identified in either the ECgene or the human non-redundant database. This peptide was observed in both liver cancer and normal datasets. The peptide matched to gene ENSG00000143375, which encodes cingulin. The cingulin gene has 28 exons and is transcribed into a 5142 nucleotide mRNA (NM_020770). The b6 ion for LDEEVK and the b7 ion for LDEEVKI are clearly identified (Figure 1, top panel). The junction position is between amino acid K and I (LDEEVK/IQR, where "/" stands for junction position). It is derived from joining ENSE00000959672 (exon 6) and ENSE00000959682 (exon 21) of the cingulin gene. Cingulin is a protein with modular coiled coil domain, is localized in tight junctions, and may play a role in regulating paracellular permeability [11,12]. The significance and function of this spliced isoform remains to be investigated. The bottom panel of figure 1 shows the spectrum of the peptide KAFGENYLFPDGR. This peptide is derived from splicing of exon ENSE00001240795 (exon 3) and ENSE00001128289 (exon 7) of ENSG00000110395, which encodes for the proto-oncogene c-CBL (E3 ubiquitin-protein ligase CBL). Figure 2 illustrates the splicing event involving exon 3 and 7 of the gene. In our analysis, about 40% of the exon skipping events skipped 7 exons and about 25% of them skipped 14 exons (Additional file 2). Our two examples skip 4 and 15 exons respectively and are typical of the observed skips.

Bottom Line: It is estimated that about 74% of multi-exon human genes have alternative splicing.We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database.This database will be useful in annotating genome structures using rapidly accumulating proteomics data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Systems Biology Division, Zhejiang-California Nanosystems Institute (ZCNI) of Zhejiang University, Zhejiang University Huajiachi Campus, Hangzhou, PR China. mofan.hz@gmail.com

ABSTRACT

Background: Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched.

Results: We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database. Using our liver cancer MS/MS dataset, we identified a total of 488 non-redundant peptides that represent putative exon skipping events.

Conclusion: Our exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced (exon skipping) protein isoforms using mass spectrometry data. This database will be useful in annotating genome structures using rapidly accumulating proteomics data.

Show MeSH
Related in: MedlinePlus