Limits...
CoSREM: a graph mining algorithm for the discovery of combinatorial splicing regulatory elements.

Badr E, Heath LS - BMC Bioinformatics (2015)

Bottom Line: Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy.We show that our results intersect with previous results, including some that are experimental.Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA.

ABSTRACT

Background: Alternative splicing (AS) is a post-transcriptional regulatory mechanism for gene expression regulation. Splicing decisions are affected by the combinatorial behavior of different splicing factors that bind to multiple binding sites in exons and introns. These binding sites are called splicing regulatory elements (SREs). Here we develop CoSREM (Combinatorial SRE Miner), a graph mining algorithm to discover combinatorial SREs in human exons. Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy. CoSREM is able to identify sets of SREs and is not limited to SRE pairs as are current approaches.

Results: We identified 37 SRE sets that include both enhancer and silencer elements. We show that our results intersect with previous results, including some that are experimental. We also show that the SRE set GGGAGG and GAGGAC identified by CoSREM may play a role in exon skipping events in several tumor samples. We applied CoSREM to RNA-Seq data for multiple tissues to identify combinatorial SREs which may be responsible for exon inclusion or exclusion across tissues.

Conclusion: The new algorithm can identify different combinations of splicing enhancers and silencers without assuming a predefined size or limiting the algorithm to find only pairs of SREs. Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.

No MeSH data available.


Related in: MedlinePlus

An example of generating SRE sets. The MCS collection here contains three subgraphs representing enhancers. Applying a modified depth first traversal will result in the longest sequence from each subgraph. The last step is to locate the three sequences in the associated exon set. If any of the sequences are overlapping in an exon, they will be merged in one longer sequence which results in new SRE sets. We then count the number of exons each new SRE set resides in. The SRE set that resides in at least 100 exons will be included in the final result as the set highlighted with a red rectangle
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4559876&req=5

Fig6: An example of generating SRE sets. The MCS collection here contains three subgraphs representing enhancers. Applying a modified depth first traversal will result in the longest sequence from each subgraph. The last step is to locate the three sequences in the associated exon set. If any of the sequences are overlapping in an exon, they will be merged in one longer sequence which results in new SRE sets. We then count the number of exons each new SRE set resides in. The SRE set that resides in at least 100 exons will be included in the final result as the set highlighted with a red rectangle

Mentions: Therefore, for each MCS collection M, the corresponding sequences of each subgraph are generated. This is performed by applying a depth first traversal as in [20]. We eliminate the generated sequences that are subsumed by other sequences. Then, we check the first 50 nucelotides of each exon in the corresponding exon set T(M) to locate these sequences in the exon and generate a new SRE set if some of them are overlapping. For example, one of our MCS collections contains these four ESEs: CCCGGA, CCGGAG, CGGAGC, and GGAGCC. These sequences are found to overlap in some of the exons in the associated exon set, forming one 9-mer element CCCGGAGCC. In this case, we consider it only one ESE, and we do not include it in the final results. Another case was that only the first three ESEs overlap, forming an 8-mer sequence CCCGGAGC. This results in a new SRE set with two ESEs (CCCGGAGC, GGAGCC). It will be included in the final result if the number of exons, that this SRE set resides in, exceeds the original threshold for generating the MCS collection (θ≥100). Several other SRE sets are generated as well, based on the exons we are investigating such as (CCCGGAG, CGGAGCC), and (CCCGGA, CCGGAGCC). As a result, multiple SRE sets can be generated from one MCS collection, if they exceed the specified threshold. Figure 6 illustrates an example of the filtering process.Fig. 6


CoSREM: a graph mining algorithm for the discovery of combinatorial splicing regulatory elements.

Badr E, Heath LS - BMC Bioinformatics (2015)

An example of generating SRE sets. The MCS collection here contains three subgraphs representing enhancers. Applying a modified depth first traversal will result in the longest sequence from each subgraph. The last step is to locate the three sequences in the associated exon set. If any of the sequences are overlapping in an exon, they will be merged in one longer sequence which results in new SRE sets. We then count the number of exons each new SRE set resides in. The SRE set that resides in at least 100 exons will be included in the final result as the set highlighted with a red rectangle
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4559876&req=5

Fig6: An example of generating SRE sets. The MCS collection here contains three subgraphs representing enhancers. Applying a modified depth first traversal will result in the longest sequence from each subgraph. The last step is to locate the three sequences in the associated exon set. If any of the sequences are overlapping in an exon, they will be merged in one longer sequence which results in new SRE sets. We then count the number of exons each new SRE set resides in. The SRE set that resides in at least 100 exons will be included in the final result as the set highlighted with a red rectangle
Mentions: Therefore, for each MCS collection M, the corresponding sequences of each subgraph are generated. This is performed by applying a depth first traversal as in [20]. We eliminate the generated sequences that are subsumed by other sequences. Then, we check the first 50 nucelotides of each exon in the corresponding exon set T(M) to locate these sequences in the exon and generate a new SRE set if some of them are overlapping. For example, one of our MCS collections contains these four ESEs: CCCGGA, CCGGAG, CGGAGC, and GGAGCC. These sequences are found to overlap in some of the exons in the associated exon set, forming one 9-mer element CCCGGAGCC. In this case, we consider it only one ESE, and we do not include it in the final results. Another case was that only the first three ESEs overlap, forming an 8-mer sequence CCCGGAGC. This results in a new SRE set with two ESEs (CCCGGAGC, GGAGCC). It will be included in the final result if the number of exons, that this SRE set resides in, exceeds the original threshold for generating the MCS collection (θ≥100). Several other SRE sets are generated as well, based on the exons we are investigating such as (CCCGGAG, CGGAGCC), and (CCCGGA, CCGGAGCC). As a result, multiple SRE sets can be generated from one MCS collection, if they exceed the specified threshold. Figure 6 illustrates an example of the filtering process.Fig. 6

Bottom Line: Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy.We show that our results intersect with previous results, including some that are experimental.Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA.

ABSTRACT

Background: Alternative splicing (AS) is a post-transcriptional regulatory mechanism for gene expression regulation. Splicing decisions are affected by the combinatorial behavior of different splicing factors that bind to multiple binding sites in exons and introns. These binding sites are called splicing regulatory elements (SREs). Here we develop CoSREM (Combinatorial SRE Miner), a graph mining algorithm to discover combinatorial SREs in human exons. Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy. CoSREM is able to identify sets of SREs and is not limited to SRE pairs as are current approaches.

Results: We identified 37 SRE sets that include both enhancer and silencer elements. We show that our results intersect with previous results, including some that are experimental. We also show that the SRE set GGGAGG and GAGGAC identified by CoSREM may play a role in exon skipping events in several tumor samples. We applied CoSREM to RNA-Seq data for multiple tissues to identify combinatorial SREs which may be responsible for exon inclusion or exclusion across tissues.

Conclusion: The new algorithm can identify different combinations of splicing enhancers and silencers without assuming a predefined size or limiting the algorithm to find only pairs of SREs. Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.

No MeSH data available.


Related in: MedlinePlus