Limits...
Coding potential of the products of alternative splicing in human.

Leoni G, Le Pera L, Ferrè F, Raimondo D, Tramontano A - Genome Biol. (2011)

Bottom Line: A number of these transcripts are alternatively spliced forms of known protein coding genes; however, it is becoming clear that many of them do not necessarily correspond to a functional protein.The most effective strategy for correctly identifying translated products relies on the conservation of active sites, but it can only be applied to a small fraction of isoforms, while a reasonably high coverage, sensitivity and specificity can be achieved by analyzing the presence of non-truncated functional domains.Combining the latter with an assessment of the plausibility of the modeled structure of the isoform increases both coverage and specificity with a moderate cost in terms of sensitivity.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dipartimento di Scienze Biochimiche, Sapienza Università di Roma, P.le A. Moro, 5 - 00185 Rome, Italy.

ABSTRACT

Background: Analysis of the human genome has revealed that as much as an order of magnitude more of the genomic sequence is transcribed than accounted for by the predicted and characterized genes. A number of these transcripts are alternatively spliced forms of known protein coding genes; however, it is becoming clear that many of them do not necessarily correspond to a functional protein.

Results: In this study we analyze alternative splicing isoforms of human gene products that are unambiguously identified by mass spectrometry and compare their properties with those of isoforms of the same genes for which no peptide was found in publicly available mass spectrometry datasets. We analyze them in detail for the presence of uninterrupted functional domains, active sites as well as the plausibility of their predicted structure. We report how well each of these strategies and their combination can correctly identify translated isoforms and derive a lower limit for their specificity, that is, their ability to correctly identify non-translated products.

Conclusions: The most effective strategy for correctly identifying translated products relies on the conservation of active sites, but it can only be applied to a small fraction of isoforms, while a reasonably high coverage, sensitivity and specificity can be achieved by analyzing the presence of non-truncated functional domains. Combining the latter with an assessment of the plausibility of the modeled structure of the isoform increases both coverage and specificity with a moderate cost in terms of sensitivity.

Show MeSH
Scheme of possible scenarios for comparing different isoforms. Only peptides mapping in the products of shaded regions are considered specific.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3091307&req=5

Figure 1: Scheme of possible scenarios for comparing different isoforms. Only peptides mapping in the products of shaded regions are considered specific.

Mentions: Of the 22,320 Ensembl57 [15] protein coding genes, 15,914 produce more than one isoform, and are therefore subject to AS. We did not include in this dataset those isoforms annotated as non-protein coding by Ensembl, and those differing only in their UTRs at the 5' or 3' end (therefore having identical coding regions), and ended up with 60,568 isoforms. In this group of alternative transcripts, we identified all regions (whole exons or exon portions) of each gene that are included in only one isoform (Figure 1). The detection of peptides mapping to such specific regions in MS experiments allows the unambiguous identification of the translation of the corresponding transcripts. PeptideAtlas human build peptides (May 2010) were mapped to the exons of these isoforms and classified as specific or unspecific accordingly. A total of 1,124 isoforms (from 1,025 genes) are identified by at least one specific peptide, and represent the set of isoforms whose existence is confirmed at the protein level. This figure is somewhat different from that reported in [14], where specific transcripts for 3,059 human alternatively spliced genes were identified using PeptideAtlas peptides, but this was expected since we used a more up-to-date release of PeptideAtlas in which the peptide mapping criteria were more stringent.


Coding potential of the products of alternative splicing in human.

Leoni G, Le Pera L, Ferrè F, Raimondo D, Tramontano A - Genome Biol. (2011)

Scheme of possible scenarios for comparing different isoforms. Only peptides mapping in the products of shaded regions are considered specific.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3091307&req=5

Figure 1: Scheme of possible scenarios for comparing different isoforms. Only peptides mapping in the products of shaded regions are considered specific.
Mentions: Of the 22,320 Ensembl57 [15] protein coding genes, 15,914 produce more than one isoform, and are therefore subject to AS. We did not include in this dataset those isoforms annotated as non-protein coding by Ensembl, and those differing only in their UTRs at the 5' or 3' end (therefore having identical coding regions), and ended up with 60,568 isoforms. In this group of alternative transcripts, we identified all regions (whole exons or exon portions) of each gene that are included in only one isoform (Figure 1). The detection of peptides mapping to such specific regions in MS experiments allows the unambiguous identification of the translation of the corresponding transcripts. PeptideAtlas human build peptides (May 2010) were mapped to the exons of these isoforms and classified as specific or unspecific accordingly. A total of 1,124 isoforms (from 1,025 genes) are identified by at least one specific peptide, and represent the set of isoforms whose existence is confirmed at the protein level. This figure is somewhat different from that reported in [14], where specific transcripts for 3,059 human alternatively spliced genes were identified using PeptideAtlas peptides, but this was expected since we used a more up-to-date release of PeptideAtlas in which the peptide mapping criteria were more stringent.

Bottom Line: A number of these transcripts are alternatively spliced forms of known protein coding genes; however, it is becoming clear that many of them do not necessarily correspond to a functional protein.The most effective strategy for correctly identifying translated products relies on the conservation of active sites, but it can only be applied to a small fraction of isoforms, while a reasonably high coverage, sensitivity and specificity can be achieved by analyzing the presence of non-truncated functional domains.Combining the latter with an assessment of the plausibility of the modeled structure of the isoform increases both coverage and specificity with a moderate cost in terms of sensitivity.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dipartimento di Scienze Biochimiche, Sapienza Università di Roma, P.le A. Moro, 5 - 00185 Rome, Italy.

ABSTRACT

Background: Analysis of the human genome has revealed that as much as an order of magnitude more of the genomic sequence is transcribed than accounted for by the predicted and characterized genes. A number of these transcripts are alternatively spliced forms of known protein coding genes; however, it is becoming clear that many of them do not necessarily correspond to a functional protein.

Results: In this study we analyze alternative splicing isoforms of human gene products that are unambiguously identified by mass spectrometry and compare their properties with those of isoforms of the same genes for which no peptide was found in publicly available mass spectrometry datasets. We analyze them in detail for the presence of uninterrupted functional domains, active sites as well as the plausibility of their predicted structure. We report how well each of these strategies and their combination can correctly identify translated isoforms and derive a lower limit for their specificity, that is, their ability to correctly identify non-translated products.

Conclusions: The most effective strategy for correctly identifying translated products relies on the conservation of active sites, but it can only be applied to a small fraction of isoforms, while a reasonably high coverage, sensitivity and specificity can be achieved by analyzing the presence of non-truncated functional domains. Combining the latter with an assessment of the plausibility of the modeled structure of the isoform increases both coverage and specificity with a moderate cost in terms of sensitivity.

Show MeSH