Limits...
Automatic detection of exonic splicing enhancers (ESEs) using SVMs.

Mersch B, Gepperth A, Suhai S, Hotz-Wagenblatt A - BMC Bioinformatics (2008)

Bottom Line: Once the classifier is trained, every potential ESE sequence can be passed to the SVM for verification.Using SVMs with the combined oligo kernel yields a high accuracy of about 90 percent and well interpretable parameters.The best results were obtained using an SVM with the combined oligo kernel, while oligo kernels with oligomers of a certain length could be used to extract relevant features.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular Biophysics, German Cancer Research Center DKFZ, Im Neuenheimer Feld 580, Heidelberg, Germany. b.mersch@dkfz.de

ABSTRACT

Background: Exonic splicing enhancers (ESEs) activate nearby splice sites and promote the inclusion (vs. exclusion) of exons in which they reside, while being a binding site for SR proteins. To study the impact of ESEs on alternative splicing it would be useful to have a possibility to detect them in exons. Identifying SR protein-binding sites in human DNA sequences by machine learning techniques is a formidable task, since the exon sequences are also constrained by their functional role in coding for proteins.

Results: The choice of training examples needed for machine learning approaches is difficult since there are only few exact locations of human ESEs described in the literature which could be considered as positive examples. Additionally, it is unclear which sequences are suitable as negative examples. Therefore, we developed a motif-oriented data-extraction method that extracts exon sequences around experimentally or theoretically determined ESE patterns. Positive examples are restricted by heuristics based on known properties of ESEs, e.g. location in the vicinity of a splice site, whereas negative examples are taken in the same way from the middle of long exons. We show that a suitably chosen SVM using optimized sequence kernels (e.g., combined oligo kernel) can extract meaningful properties from these training examples. Once the classifier is trained, every potential ESE sequence can be passed to the SVM for verification. Using SVMs with the combined oligo kernel yields a high accuracy of about 90 percent and well interpretable parameters.

Conclusion: The motif-oriented data-extraction method seems to produce consistent training and test data leading to good classification rates and thus allows verification of potential ESE motifs. The best results were obtained using an SVM with the combined oligo kernel, while oligo kernels with oligomers of a certain length could be used to extract relevant features.

Show MeSH
Recognition signals and proteins for splicing. The consensus sequences which occur in most of the eukaryotes are shown. Y = pyrimidine, R = purine, N = any nucleotide.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2567995&req=5

Figure 1: Recognition signals and proteins for splicing. The consensus sequences which occur in most of the eukaryotes are shown. Y = pyrimidine, R = purine, N = any nucleotide.

Mentions: In eukaryotes, after transcription from DNA to messenger RNA (mRNA), the mRNA is initially present as a precursor messenger RNA (pre-mRNA). This pre-mRNA still comprises the exons and introns of the gene. At this stage it is not known which exons will eventually be included into the mature mRNA. This decision is taken during a process called splicing. Then, the introns are cut out and the exons are reconnected in different ways to yield various mRNAs. The signals which are necessary to define the splice sites are located at the exon/intron boundaries and in the introns, see Figure 1. Especially the AG- and GU-dinucleotides at the ends of introns are highly conserved. In some proteins, the exons are joined in the order in which they appear on the pre-mRNA. However, mostly this is not the case. Different variants of a protein can occur when single exons are skipped or if two splice sites are present in an exon from which only one is used. This is called alternative splicing. Some of the exons are present in every mRNA (constitutive exons), others are alternatively spliced, and thus only present in some of the transcripts. The resulting mRNAs are then translated into proteins which can differ extremely in their function.


Automatic detection of exonic splicing enhancers (ESEs) using SVMs.

Mersch B, Gepperth A, Suhai S, Hotz-Wagenblatt A - BMC Bioinformatics (2008)

Recognition signals and proteins for splicing. The consensus sequences which occur in most of the eukaryotes are shown. Y = pyrimidine, R = purine, N = any nucleotide.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2567995&req=5

Figure 1: Recognition signals and proteins for splicing. The consensus sequences which occur in most of the eukaryotes are shown. Y = pyrimidine, R = purine, N = any nucleotide.
Mentions: In eukaryotes, after transcription from DNA to messenger RNA (mRNA), the mRNA is initially present as a precursor messenger RNA (pre-mRNA). This pre-mRNA still comprises the exons and introns of the gene. At this stage it is not known which exons will eventually be included into the mature mRNA. This decision is taken during a process called splicing. Then, the introns are cut out and the exons are reconnected in different ways to yield various mRNAs. The signals which are necessary to define the splice sites are located at the exon/intron boundaries and in the introns, see Figure 1. Especially the AG- and GU-dinucleotides at the ends of introns are highly conserved. In some proteins, the exons are joined in the order in which they appear on the pre-mRNA. However, mostly this is not the case. Different variants of a protein can occur when single exons are skipped or if two splice sites are present in an exon from which only one is used. This is called alternative splicing. Some of the exons are present in every mRNA (constitutive exons), others are alternatively spliced, and thus only present in some of the transcripts. The resulting mRNAs are then translated into proteins which can differ extremely in their function.

Bottom Line: Once the classifier is trained, every potential ESE sequence can be passed to the SVM for verification.Using SVMs with the combined oligo kernel yields a high accuracy of about 90 percent and well interpretable parameters.The best results were obtained using an SVM with the combined oligo kernel, while oligo kernels with oligomers of a certain length could be used to extract relevant features.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular Biophysics, German Cancer Research Center DKFZ, Im Neuenheimer Feld 580, Heidelberg, Germany. b.mersch@dkfz.de

ABSTRACT

Background: Exonic splicing enhancers (ESEs) activate nearby splice sites and promote the inclusion (vs. exclusion) of exons in which they reside, while being a binding site for SR proteins. To study the impact of ESEs on alternative splicing it would be useful to have a possibility to detect them in exons. Identifying SR protein-binding sites in human DNA sequences by machine learning techniques is a formidable task, since the exon sequences are also constrained by their functional role in coding for proteins.

Results: The choice of training examples needed for machine learning approaches is difficult since there are only few exact locations of human ESEs described in the literature which could be considered as positive examples. Additionally, it is unclear which sequences are suitable as negative examples. Therefore, we developed a motif-oriented data-extraction method that extracts exon sequences around experimentally or theoretically determined ESE patterns. Positive examples are restricted by heuristics based on known properties of ESEs, e.g. location in the vicinity of a splice site, whereas negative examples are taken in the same way from the middle of long exons. We show that a suitably chosen SVM using optimized sequence kernels (e.g., combined oligo kernel) can extract meaningful properties from these training examples. Once the classifier is trained, every potential ESE sequence can be passed to the SVM for verification. Using SVMs with the combined oligo kernel yields a high accuracy of about 90 percent and well interpretable parameters.

Conclusion: The motif-oriented data-extraction method seems to produce consistent training and test data leading to good classification rates and thus allows verification of potential ESE motifs. The best results were obtained using an SVM with the combined oligo kernel, while oligo kernels with oligomers of a certain length could be used to extract relevant features.

Show MeSH