Limits...
Stochastic noise in splicing machinery.

Melamud E, Moult J - Nucleic Acids Res. (2009)

Bottom Line: In this paper, we propose that most alternative splicing events are the result of noise in the splicing process.The results strongly support the hypothesis that most alternative splicing is a consequence of stochastic noise in the splicing machinery, and has no functional significance.The results are also consistent with error rates tuned to ensure that an adequate level of functional product is produced and to reduce the toxic effect of accumulation of misfolding proteins.

View Article: PubMed Central - PubMed

Affiliation: Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, USA. melamud@umbi.umd.edu

ABSTRACT
The number of known alternative human isoforms has been increasing steadily with the amount of available transcription data. To date, over 100 000 isoforms have been detected in EST libraries, and at least 75% of human genes have at least one alternative isoform. In this paper, we propose that most alternative splicing events are the result of noise in the splicing process. We show that the number of isoforms and their abundance can be predicted by a simple stochastic noise model that takes into account two factors: the number of introns in a gene and the expression level of a gene. The results strongly support the hypothesis that most alternative splicing is a consequence of stochastic noise in the splicing machinery, and has no functional significance. The results are also consistent with error rates tuned to ensure that an adequate level of functional product is produced and to reduce the toxic effect of accumulation of misfolding proteins. Based on simulation of sampling of virtual cDNA libraries, we estimate that error rates range from 1 to 10% depending on the number of introns and the expression level of a gene.

Show MeSH

Related in: MedlinePlus

Binary isoforms. Simulation and sampling of the isoform composition of a gene with 10 virtual transcripts and 6 introns. Exons are shown as rectangles. Alternative splicing events are indicated by red intron bridges. The binary intron representation is shown above each bridge, with the symbol ‘1’ indicating an alternative splicing event, and the symbol ‘0’ representing a major splicing event. In the set of 10 there are total of six alternative transcripts (those with at least one ‘1': transcripts 2, 4, 5, 7, 8 and 9) with five unique alternative isoforms (one pattern occurs twice, in transcripts 4 and 5). In this example, we assume that partial message sequencing only included the colored exons. With this particular sequencing, three alternative transcripts are selected (4, 5 and 7), containing two of the five unique alternative isoforms (represented by the patterns 01 and 010). If an EST sequence contains zero introns, it is truncated to a  string, illustrated with transcripts 6, 8 and 10.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2724286&req=5

Figure 5: Binary isoforms. Simulation and sampling of the isoform composition of a gene with 10 virtual transcripts and 6 introns. Exons are shown as rectangles. Alternative splicing events are indicated by red intron bridges. The binary intron representation is shown above each bridge, with the symbol ‘1’ indicating an alternative splicing event, and the symbol ‘0’ representing a major splicing event. In the set of 10 there are total of six alternative transcripts (those with at least one ‘1': transcripts 2, 4, 5, 7, 8 and 9) with five unique alternative isoforms (one pattern occurs twice, in transcripts 4 and 5). In this example, we assume that partial message sequencing only included the colored exons. With this particular sequencing, three alternative transcripts are selected (4, 5 and 7), containing two of the five unique alternative isoforms (represented by the patterns 01 and 010). If an EST sequence contains zero introns, it is truncated to a string, illustrated with transcripts 6, 8 and 10.

Mentions: Using the intron counts, error rate and numbers of transcripts per cell, we simulate the intron structure of a set of transcripts for each gene, as many transcripts as in a single cell. Figure 5 gives an illustrative example, for a gene with six introns and 10 transcripts. The intron structure of a given transcript is encoded as a binary string of length equal to the number of introns in the major isoform. The alternative introns—introns that differ from the major isoform in location of the 5′ or 3′ splice site—are represented by the symbol ‘1’, while introns with same genomic coordinates as the major isoform are represented by the symbol ‘0’. In this schema, transcripts 1, 3, 6 and 10 encode the major isoform of the gene, producing the string ‘000000’. Transcripts 2, 4 and 5 contain exon skips that are different from the major isoform for two introns, thus producing ‘01000’, ‘00001’ and ‘00001’ strings. Transcripts 7–9 contain alternative 5′ and 3′ splicing events that modify only one intron, thus producing ‘000010’, ‘000001’ and ‘001000’ strings respectively. In generating the strings, exon indels are chosen ∼49% of the time, alternative 5′ splice sites are chosen ∼25% and alternative 3′ splice sites are chosen the remaining ∼26% of the time, in accordance with the overall ratio found in 56 419 completely sequenced cDNAs (7). There are two drawbacks to binary representation of isoforms. First, events that modify both 3′ and 5′ ends of an intron are not taken into account. Second, the binary representation cannot distinguish between an alternative 3′ isoform and an alternative 5′ of the isoform of the same intron. As a consequence, we occasionally undercount the number of unique isoforms for a given gene. We tested a number of alternative alphabet representations, and found no significant difference in results, as the frequency of these events tends to be low.Figure 2.


Stochastic noise in splicing machinery.

Melamud E, Moult J - Nucleic Acids Res. (2009)

Binary isoforms. Simulation and sampling of the isoform composition of a gene with 10 virtual transcripts and 6 introns. Exons are shown as rectangles. Alternative splicing events are indicated by red intron bridges. The binary intron representation is shown above each bridge, with the symbol ‘1’ indicating an alternative splicing event, and the symbol ‘0’ representing a major splicing event. In the set of 10 there are total of six alternative transcripts (those with at least one ‘1': transcripts 2, 4, 5, 7, 8 and 9) with five unique alternative isoforms (one pattern occurs twice, in transcripts 4 and 5). In this example, we assume that partial message sequencing only included the colored exons. With this particular sequencing, three alternative transcripts are selected (4, 5 and 7), containing two of the five unique alternative isoforms (represented by the patterns 01 and 010). If an EST sequence contains zero introns, it is truncated to a  string, illustrated with transcripts 6, 8 and 10.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2724286&req=5

Figure 5: Binary isoforms. Simulation and sampling of the isoform composition of a gene with 10 virtual transcripts and 6 introns. Exons are shown as rectangles. Alternative splicing events are indicated by red intron bridges. The binary intron representation is shown above each bridge, with the symbol ‘1’ indicating an alternative splicing event, and the symbol ‘0’ representing a major splicing event. In the set of 10 there are total of six alternative transcripts (those with at least one ‘1': transcripts 2, 4, 5, 7, 8 and 9) with five unique alternative isoforms (one pattern occurs twice, in transcripts 4 and 5). In this example, we assume that partial message sequencing only included the colored exons. With this particular sequencing, three alternative transcripts are selected (4, 5 and 7), containing two of the five unique alternative isoforms (represented by the patterns 01 and 010). If an EST sequence contains zero introns, it is truncated to a string, illustrated with transcripts 6, 8 and 10.
Mentions: Using the intron counts, error rate and numbers of transcripts per cell, we simulate the intron structure of a set of transcripts for each gene, as many transcripts as in a single cell. Figure 5 gives an illustrative example, for a gene with six introns and 10 transcripts. The intron structure of a given transcript is encoded as a binary string of length equal to the number of introns in the major isoform. The alternative introns—introns that differ from the major isoform in location of the 5′ or 3′ splice site—are represented by the symbol ‘1’, while introns with same genomic coordinates as the major isoform are represented by the symbol ‘0’. In this schema, transcripts 1, 3, 6 and 10 encode the major isoform of the gene, producing the string ‘000000’. Transcripts 2, 4 and 5 contain exon skips that are different from the major isoform for two introns, thus producing ‘01000’, ‘00001’ and ‘00001’ strings. Transcripts 7–9 contain alternative 5′ and 3′ splicing events that modify only one intron, thus producing ‘000010’, ‘000001’ and ‘001000’ strings respectively. In generating the strings, exon indels are chosen ∼49% of the time, alternative 5′ splice sites are chosen ∼25% and alternative 3′ splice sites are chosen the remaining ∼26% of the time, in accordance with the overall ratio found in 56 419 completely sequenced cDNAs (7). There are two drawbacks to binary representation of isoforms. First, events that modify both 3′ and 5′ ends of an intron are not taken into account. Second, the binary representation cannot distinguish between an alternative 3′ isoform and an alternative 5′ of the isoform of the same intron. As a consequence, we occasionally undercount the number of unique isoforms for a given gene. We tested a number of alternative alphabet representations, and found no significant difference in results, as the frequency of these events tends to be low.Figure 2.

Bottom Line: In this paper, we propose that most alternative splicing events are the result of noise in the splicing process.The results strongly support the hypothesis that most alternative splicing is a consequence of stochastic noise in the splicing machinery, and has no functional significance.The results are also consistent with error rates tuned to ensure that an adequate level of functional product is produced and to reduce the toxic effect of accumulation of misfolding proteins.

View Article: PubMed Central - PubMed

Affiliation: Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, USA. melamud@umbi.umd.edu

ABSTRACT
The number of known alternative human isoforms has been increasing steadily with the amount of available transcription data. To date, over 100 000 isoforms have been detected in EST libraries, and at least 75% of human genes have at least one alternative isoform. In this paper, we propose that most alternative splicing events are the result of noise in the splicing process. We show that the number of isoforms and their abundance can be predicted by a simple stochastic noise model that takes into account two factors: the number of introns in a gene and the expression level of a gene. The results strongly support the hypothesis that most alternative splicing is a consequence of stochastic noise in the splicing machinery, and has no functional significance. The results are also consistent with error rates tuned to ensure that an adequate level of functional product is produced and to reduce the toxic effect of accumulation of misfolding proteins. Based on simulation of sampling of virtual cDNA libraries, we estimate that error rates range from 1 to 10% depending on the number of introns and the expression level of a gene.

Show MeSH
Related in: MedlinePlus