Limits...
Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder.

Hegyi H, Kalmar L, Horvath T, Tompa P - Nucleic Acids Res. (2010)

Bottom Line: However, for 4000 human proteins in PDB, only 14 human proteins have structures of at least two alternative isoforms.We found that strict rules govern the selection of alternative splice variants aimed to preserve the integrity of globular domains: alternative splice sites (i) tend to avoid globular domains or (ii) affect them only marginally or (iii) tend to coincide with a location where the exposed hydrophobic surface is minimal or (iv) the protein is disordered.These observations provide the basis for a prediction method (currently under development) to predict the viability of splice variants.

View Article: PubMed Central - PubMed

Affiliation: Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, PO Box 7, 1518 Budapest, Hungary. hegyi@enzim.hu

ABSTRACT
According to current estimations ∼95% of multi-exonic human protein-coding genes undergo alternative splicing (AS). However, for 4000 human proteins in PDB, only 14 human proteins have structures of at least two alternative isoforms. Surveying these structural isoforms revealed that the maximum insertion accommodated by an isoform of a fully ordered protein domain was 5 amino acids, other instances of domain changes involved intrinsic structural disorder. After collecting 505 minor isoforms of human proteins with evidence for their existence we analyzed their length, protein disorder and exposed hydrophobic surface. We found that strict rules govern the selection of alternative splice variants aimed to preserve the integrity of globular domains: alternative splice sites (i) tend to avoid globular domains or (ii) affect them only marginally or (iii) tend to coincide with a location where the exposed hydrophobic surface is minimal or (iv) the protein is disordered. We also observed an inverse correlation between the domain fraction lost and the full length of the minor isoform containing the domain, possibly indicating a buffering effect for the isoform protein counteracting the domain truncation effect. These observations provide the basis for a prediction method (currently under development) to predict the viability of splice variants.

Show MeSH

Related in: MedlinePlus

‘Retained portion’ of the truncated domains (i.e. the remaining part divided by the full length of the domain) versus their ‘relative length’ related to the full length of the containing minor isoform (i.e. the remaining part of domain divided by the full length of the protein). (A) Each truncated domain indicated by a dot, shown for the ‘verified’ group. (B) Same data as in (A) but the population of the four quadrangles indicated with percentage numbers. (C) Data for the named group, same representation as in (B). (D) Data for the total number of alternative splice variants in Swissprot.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3045584&req=5

Figure 1: ‘Retained portion’ of the truncated domains (i.e. the remaining part divided by the full length of the domain) versus their ‘relative length’ related to the full length of the containing minor isoform (i.e. the remaining part of domain divided by the full length of the protein). (A) Each truncated domain indicated by a dot, shown for the ‘verified’ group. (B) Same data as in (A) but the population of the four quadrangles indicated with percentage numbers. (C) Data for the named group, same representation as in (B). (D) Data for the total number of alternative splice variants in Swissprot.

Mentions: At first, we determined the relative length of the truncated domains (i.e. the remaining part divided by the full length of the domain) and related these values to the full length of the containing proteins. It must be noted that all sequence analysis was carried out using the Pfam domain annotations whereas all structural domain analysis was done with SCOP domains. The results are shown in Figure 1. In Figure 1A, the results for the ‘verified’ group are shown, each truncated domain indicated with a dot, whereas in Figure 1B–D data are shown in terms of actual numbers, for the ‘verified’, ‘named’ and the total number of alternative splice variants (in Swissprot), respectively. (For the definition of ‘verified’, ‘named’ and ‘random’ groups of splice variants see ‘Materials and Methods’ section.) For the ‘verified’ group, all the truncated domains satisfy at least one of the following two criteria: truncated domain size/original domain size >0.6 OR truncated ‘domain size/protein length’ <0.3, i.e. the upper left quadrant of the rectangle is empty (Figure 1A and B). However, for the named group and the total of Swissprot, this area is increasingly populated (6 and 10%, respectively). According to a χ2-test, the difference is significant both between the ‘verified’ and the ‘named’ group (P = 0.011) and between ‘named’ and Swissprot (P = 0.0002).Figure 1.


Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder.

Hegyi H, Kalmar L, Horvath T, Tompa P - Nucleic Acids Res. (2010)

‘Retained portion’ of the truncated domains (i.e. the remaining part divided by the full length of the domain) versus their ‘relative length’ related to the full length of the containing minor isoform (i.e. the remaining part of domain divided by the full length of the protein). (A) Each truncated domain indicated by a dot, shown for the ‘verified’ group. (B) Same data as in (A) but the population of the four quadrangles indicated with percentage numbers. (C) Data for the named group, same representation as in (B). (D) Data for the total number of alternative splice variants in Swissprot.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3045584&req=5

Figure 1: ‘Retained portion’ of the truncated domains (i.e. the remaining part divided by the full length of the domain) versus their ‘relative length’ related to the full length of the containing minor isoform (i.e. the remaining part of domain divided by the full length of the protein). (A) Each truncated domain indicated by a dot, shown for the ‘verified’ group. (B) Same data as in (A) but the population of the four quadrangles indicated with percentage numbers. (C) Data for the named group, same representation as in (B). (D) Data for the total number of alternative splice variants in Swissprot.
Mentions: At first, we determined the relative length of the truncated domains (i.e. the remaining part divided by the full length of the domain) and related these values to the full length of the containing proteins. It must be noted that all sequence analysis was carried out using the Pfam domain annotations whereas all structural domain analysis was done with SCOP domains. The results are shown in Figure 1. In Figure 1A, the results for the ‘verified’ group are shown, each truncated domain indicated with a dot, whereas in Figure 1B–D data are shown in terms of actual numbers, for the ‘verified’, ‘named’ and the total number of alternative splice variants (in Swissprot), respectively. (For the definition of ‘verified’, ‘named’ and ‘random’ groups of splice variants see ‘Materials and Methods’ section.) For the ‘verified’ group, all the truncated domains satisfy at least one of the following two criteria: truncated domain size/original domain size >0.6 OR truncated ‘domain size/protein length’ <0.3, i.e. the upper left quadrant of the rectangle is empty (Figure 1A and B). However, for the named group and the total of Swissprot, this area is increasingly populated (6 and 10%, respectively). According to a χ2-test, the difference is significant both between the ‘verified’ and the ‘named’ group (P = 0.011) and between ‘named’ and Swissprot (P = 0.0002).Figure 1.

Bottom Line: However, for 4000 human proteins in PDB, only 14 human proteins have structures of at least two alternative isoforms.We found that strict rules govern the selection of alternative splice variants aimed to preserve the integrity of globular domains: alternative splice sites (i) tend to avoid globular domains or (ii) affect them only marginally or (iii) tend to coincide with a location where the exposed hydrophobic surface is minimal or (iv) the protein is disordered.These observations provide the basis for a prediction method (currently under development) to predict the viability of splice variants.

View Article: PubMed Central - PubMed

Affiliation: Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, PO Box 7, 1518 Budapest, Hungary. hegyi@enzim.hu

ABSTRACT
According to current estimations ∼95% of multi-exonic human protein-coding genes undergo alternative splicing (AS). However, for 4000 human proteins in PDB, only 14 human proteins have structures of at least two alternative isoforms. Surveying these structural isoforms revealed that the maximum insertion accommodated by an isoform of a fully ordered protein domain was 5 amino acids, other instances of domain changes involved intrinsic structural disorder. After collecting 505 minor isoforms of human proteins with evidence for their existence we analyzed their length, protein disorder and exposed hydrophobic surface. We found that strict rules govern the selection of alternative splice variants aimed to preserve the integrity of globular domains: alternative splice sites (i) tend to avoid globular domains or (ii) affect them only marginally or (iii) tend to coincide with a location where the exposed hydrophobic surface is minimal or (iv) the protein is disordered. We also observed an inverse correlation between the domain fraction lost and the full length of the minor isoform containing the domain, possibly indicating a buffering effect for the isoform protein counteracting the domain truncation effect. These observations provide the basis for a prediction method (currently under development) to predict the viability of splice variants.

Show MeSH
Related in: MedlinePlus