Limits...
Identifying and characterising key alternative splicing events in Drosophila development.

Lees JG, Ranea JA, Orengo CA - BMC Genomics (2015)

Bottom Line: We have identified a subset of protein isoforms which appear to have high functional significance, particularly in regulation.The methods and analyses we present here represent important first steps in the development of tools to address the near complete lack of isoform specific function annotation.In turn the tools allow us to better characterise the regulatory functions of alternative splicing in more detail.

View Article: PubMed Central - PubMed

Affiliation: Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK. ucbcjle@live.ucl.ac.uk.

ABSTRACT

Background: In complex Metazoans a given gene frequently codes for multiple protein isoforms, through processes such as alternative splicing. Large scale functional annotation of these isoforms is a key challenge for functional genomics. This annotation gap is increasing with the large numbers of multi transcript genes being identified by technologies such as RNASeq. Furthermore attempts to characterise the functions of splicing in an organism are complicated by the difficulty in distinguishing functional isoforms from those produced by splicing errors or transcription noise. Tools to help prioritise candidate isoforms for testing are largely absent.

Results: In this study we implement a Time-course Switch (TS) score for ranking isoforms by their likelihood of producing additional functions based on their developmental expression profiles, as reported by modENCODE. The TS score allows us to better investigate functional roles of different isoforms expressed in multi transcript genes. From this analysis, we find that isoforms with high TS scores have sequence feature changes consistent with more deterministic splicing and functional changes and tend to gain domains or whole exons which could carry additional functions. Furthermore these functions appear to be particularly important for essential regulatory roles, establishing functional isoform switching as key for regulatory processes. Based on the TS score we develop a Transcript Annotations Pipeline for Alternative Splicing (TAPAS) that identifies functional neighbourhoods of potentially interesting isoforms.

Conclusions: We have identified a subset of protein isoforms which appear to have high functional significance, particularly in regulation. This has been made possible through the development of novel methods that make use of transcript expression profiles. The methods and analyses we present here represent important first steps in the development of tools to address the near complete lack of isoform specific function annotation. In turn the tools allow us to better characterise the regulatory functions of alternative splicing in more detail.

No MeSH data available.


Explanation and validation of the TAPAS algorithm filtering step. For a given query isoform (orange node) a set of other genes (blue genes) are identified having correlated expression levels in their isoforms. The GO terms of these genes are compared with one another (excluding the query isoforms gene) to obtain an average GOSS score (see Methods). In example a isoform-P belongs to a cluster with low GOSS similarity and this cluster is discarded. In example b isoform-Q belongs to a cluster with high average GOSS similarity. The cluster is treated as valid and can be used to help characterise the functional neighbourhood of the query isoform. In c the link width represents GOSS scores between genes, the red links are used in the TAPAS filtering step. We find in the validation d that the average GOSS score of a cluster to the query isoforms parent gene (blue links in C) is significantly higher for filtered clusters. Note the filtering was applied only using the similarities between the non-query members of the cluster (red links in 5C). The ‘random’ plot is a control where the clusters have been generated randomly to show a background expected GOSS similarity between a cluster and the query isoform
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4537583&req=5

Fig6: Explanation and validation of the TAPAS algorithm filtering step. For a given query isoform (orange node) a set of other genes (blue genes) are identified having correlated expression levels in their isoforms. The GO terms of these genes are compared with one another (excluding the query isoforms gene) to obtain an average GOSS score (see Methods). In example a isoform-P belongs to a cluster with low GOSS similarity and this cluster is discarded. In example b isoform-Q belongs to a cluster with high average GOSS similarity. The cluster is treated as valid and can be used to help characterise the functional neighbourhood of the query isoform. In c the link width represents GOSS scores between genes, the red links are used in the TAPAS filtering step. We find in the validation d that the average GOSS score of a cluster to the query isoforms parent gene (blue links in C) is significantly higher for filtered clusters. Note the filtering was applied only using the similarities between the non-query members of the cluster (red links in 5C). The ‘random’ plot is a control where the clusters have been generated randomly to show a background expected GOSS similarity between a cluster and the query isoform

Mentions: This led us to develop our Transcript Annotation Pipeline for Alternative Splicing (TAPAS) (Fig. 6) (For details of the algorithm see Methods). In brief, the algorithm can be summarised as follows. Firstly, for a query isoform, it builds a cluster of isoforms from other genes (different to the query isoforms parent gene) whose expression patterns correlate to the query isoforms, and hence are possibly related in the same functional module. If data is available, TAPAS then builds a network between the members of the cluster using a combination of experimental protein interactions [43] and high confidence predicted functional interactions [44]. However if a cluster lacks annotated interactions we apply an optional filtering step that checks if the cluster of isoforms, have an average GO semantic similarity (GOSS) score (see methods) above a user specified cut-off (see methods), ensuring that the cluster of isoforms is both coherent in expression and function when other network data is not available.Fig. 6


Identifying and characterising key alternative splicing events in Drosophila development.

Lees JG, Ranea JA, Orengo CA - BMC Genomics (2015)

Explanation and validation of the TAPAS algorithm filtering step. For a given query isoform (orange node) a set of other genes (blue genes) are identified having correlated expression levels in their isoforms. The GO terms of these genes are compared with one another (excluding the query isoforms gene) to obtain an average GOSS score (see Methods). In example a isoform-P belongs to a cluster with low GOSS similarity and this cluster is discarded. In example b isoform-Q belongs to a cluster with high average GOSS similarity. The cluster is treated as valid and can be used to help characterise the functional neighbourhood of the query isoform. In c the link width represents GOSS scores between genes, the red links are used in the TAPAS filtering step. We find in the validation d that the average GOSS score of a cluster to the query isoforms parent gene (blue links in C) is significantly higher for filtered clusters. Note the filtering was applied only using the similarities between the non-query members of the cluster (red links in 5C). The ‘random’ plot is a control where the clusters have been generated randomly to show a background expected GOSS similarity between a cluster and the query isoform
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4537583&req=5

Fig6: Explanation and validation of the TAPAS algorithm filtering step. For a given query isoform (orange node) a set of other genes (blue genes) are identified having correlated expression levels in their isoforms. The GO terms of these genes are compared with one another (excluding the query isoforms gene) to obtain an average GOSS score (see Methods). In example a isoform-P belongs to a cluster with low GOSS similarity and this cluster is discarded. In example b isoform-Q belongs to a cluster with high average GOSS similarity. The cluster is treated as valid and can be used to help characterise the functional neighbourhood of the query isoform. In c the link width represents GOSS scores between genes, the red links are used in the TAPAS filtering step. We find in the validation d that the average GOSS score of a cluster to the query isoforms parent gene (blue links in C) is significantly higher for filtered clusters. Note the filtering was applied only using the similarities between the non-query members of the cluster (red links in 5C). The ‘random’ plot is a control where the clusters have been generated randomly to show a background expected GOSS similarity between a cluster and the query isoform
Mentions: This led us to develop our Transcript Annotation Pipeline for Alternative Splicing (TAPAS) (Fig. 6) (For details of the algorithm see Methods). In brief, the algorithm can be summarised as follows. Firstly, for a query isoform, it builds a cluster of isoforms from other genes (different to the query isoforms parent gene) whose expression patterns correlate to the query isoforms, and hence are possibly related in the same functional module. If data is available, TAPAS then builds a network between the members of the cluster using a combination of experimental protein interactions [43] and high confidence predicted functional interactions [44]. However if a cluster lacks annotated interactions we apply an optional filtering step that checks if the cluster of isoforms, have an average GO semantic similarity (GOSS) score (see methods) above a user specified cut-off (see methods), ensuring that the cluster of isoforms is both coherent in expression and function when other network data is not available.Fig. 6

Bottom Line: We have identified a subset of protein isoforms which appear to have high functional significance, particularly in regulation.The methods and analyses we present here represent important first steps in the development of tools to address the near complete lack of isoform specific function annotation.In turn the tools allow us to better characterise the regulatory functions of alternative splicing in more detail.

View Article: PubMed Central - PubMed

Affiliation: Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK. ucbcjle@live.ucl.ac.uk.

ABSTRACT

Background: In complex Metazoans a given gene frequently codes for multiple protein isoforms, through processes such as alternative splicing. Large scale functional annotation of these isoforms is a key challenge for functional genomics. This annotation gap is increasing with the large numbers of multi transcript genes being identified by technologies such as RNASeq. Furthermore attempts to characterise the functions of splicing in an organism are complicated by the difficulty in distinguishing functional isoforms from those produced by splicing errors or transcription noise. Tools to help prioritise candidate isoforms for testing are largely absent.

Results: In this study we implement a Time-course Switch (TS) score for ranking isoforms by their likelihood of producing additional functions based on their developmental expression profiles, as reported by modENCODE. The TS score allows us to better investigate functional roles of different isoforms expressed in multi transcript genes. From this analysis, we find that isoforms with high TS scores have sequence feature changes consistent with more deterministic splicing and functional changes and tend to gain domains or whole exons which could carry additional functions. Furthermore these functions appear to be particularly important for essential regulatory roles, establishing functional isoform switching as key for regulatory processes. Based on the TS score we develop a Transcript Annotations Pipeline for Alternative Splicing (TAPAS) that identifies functional neighbourhoods of potentially interesting isoforms.

Conclusions: We have identified a subset of protein isoforms which appear to have high functional significance, particularly in regulation. This has been made possible through the development of novel methods that make use of transcript expression profiles. The methods and analyses we present here represent important first steps in the development of tools to address the near complete lack of isoform specific function annotation. In turn the tools allow us to better characterise the regulatory functions of alternative splicing in more detail.

No MeSH data available.