Limits...
CoSREM: a graph mining algorithm for the discovery of combinatorial splicing regulatory elements.

Badr E, Heath LS - BMC Bioinformatics (2015)

Bottom Line: Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy.We show that our results intersect with previous results, including some that are experimental.Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA.

ABSTRACT

Background: Alternative splicing (AS) is a post-transcriptional regulatory mechanism for gene expression regulation. Splicing decisions are affected by the combinatorial behavior of different splicing factors that bind to multiple binding sites in exons and introns. These binding sites are called splicing regulatory elements (SREs). Here we develop CoSREM (Combinatorial SRE Miner), a graph mining algorithm to discover combinatorial SREs in human exons. Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy. CoSREM is able to identify sets of SREs and is not limited to SRE pairs as are current approaches.

Results: We identified 37 SRE sets that include both enhancer and silencer elements. We show that our results intersect with previous results, including some that are experimental. We also show that the SRE set GGGAGG and GAGGAC identified by CoSREM may play a role in exon skipping events in several tumor samples. We applied CoSREM to RNA-Seq data for multiple tissues to identify combinatorial SREs which may be responsible for exon inclusion or exclusion across tissues.

Conclusion: The new algorithm can identify different combinations of splicing enhancers and silencers without assuming a predefined size or limiting the algorithm to find only pairs of SREs. Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.

No MeSH data available.


Related in: MedlinePlus

An example of an MCStree. The example shows a part of the tree where θ=100. The dotted boxes means that this MCS set does not satisfy the user threshold T(M)≥θ, where T(M) is the number of shared exons between the MCSs, and this branch will be pruned. all vertices with distance from the root ≥β threshold will be considered as potential MCS collection
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4559876&req=5

Fig5: An example of an MCStree. The example shows a part of the tree where θ=100. The dotted boxes means that this MCS set does not satisfy the user threshold T(M)≥θ, where T(M) is the number of shared exons between the MCSs, and this branch will be pruned. all vertices with distance from the root ≥β threshold will be considered as potential MCS collection

Mentions: It takes the MCStable as an input. MCStable is a hash table where the MCS IDs are the keys and the exon set of each MCS is the value. Each vertex at the first level of the tree represents one of the already calculated MCSs as an initial M. Therefore, the exon set T(M) is the exon set of the corresponding MCS (line 6). A child vertex u of vertex v is generated by extending Mv with one of the remaining MCSs and T(Mu) is then calculated as depicted in Fig. 4. As we build the MCStree as an ordered tree, Mv is extended by adding an MCS whose ID is only bigger than the largest MCS ID in the collection. Different pruning strategies are applied to reduce the search time and space. However, they do not affect the accuracy of the produced results, as we prune only the branches that do not satisfy the user constraint. This follows from utilizing the set enumeration tree structure, where all set combinations are generated and tree branches are only pruned if constraints are violated. For example, one pruning strategy is that we extend the tree branches in a depth-first manner as long as the generated M in the current vertex has shared exons with size /T(M)/≥θ. Once this constraint is violated, this branch is pruned (see Fig. 4). This is analogous to the subset-infrequency pruning strategy utilized in Max-Miner algorithm [35]. Another strategy is to prune the branch if the generated M has been generated in a previous part of the tree with the same exon set. However, if the generated M is a subset of a previously generated MCS collection but with a different exon set, the new M will still be included. Figure 5 illustrates an example of an MCStree.Fig. 4


CoSREM: a graph mining algorithm for the discovery of combinatorial splicing regulatory elements.

Badr E, Heath LS - BMC Bioinformatics (2015)

An example of an MCStree. The example shows a part of the tree where θ=100. The dotted boxes means that this MCS set does not satisfy the user threshold T(M)≥θ, where T(M) is the number of shared exons between the MCSs, and this branch will be pruned. all vertices with distance from the root ≥β threshold will be considered as potential MCS collection
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4559876&req=5

Fig5: An example of an MCStree. The example shows a part of the tree where θ=100. The dotted boxes means that this MCS set does not satisfy the user threshold T(M)≥θ, where T(M) is the number of shared exons between the MCSs, and this branch will be pruned. all vertices with distance from the root ≥β threshold will be considered as potential MCS collection
Mentions: It takes the MCStable as an input. MCStable is a hash table where the MCS IDs are the keys and the exon set of each MCS is the value. Each vertex at the first level of the tree represents one of the already calculated MCSs as an initial M. Therefore, the exon set T(M) is the exon set of the corresponding MCS (line 6). A child vertex u of vertex v is generated by extending Mv with one of the remaining MCSs and T(Mu) is then calculated as depicted in Fig. 4. As we build the MCStree as an ordered tree, Mv is extended by adding an MCS whose ID is only bigger than the largest MCS ID in the collection. Different pruning strategies are applied to reduce the search time and space. However, they do not affect the accuracy of the produced results, as we prune only the branches that do not satisfy the user constraint. This follows from utilizing the set enumeration tree structure, where all set combinations are generated and tree branches are only pruned if constraints are violated. For example, one pruning strategy is that we extend the tree branches in a depth-first manner as long as the generated M in the current vertex has shared exons with size /T(M)/≥θ. Once this constraint is violated, this branch is pruned (see Fig. 4). This is analogous to the subset-infrequency pruning strategy utilized in Max-Miner algorithm [35]. Another strategy is to prune the branch if the generated M has been generated in a previous part of the tree with the same exon set. However, if the generated M is a subset of a previously generated MCS collection but with a different exon set, the new M will still be included. Figure 5 illustrates an example of an MCStree.Fig. 4

Bottom Line: Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy.We show that our results intersect with previous results, including some that are experimental.Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA.

ABSTRACT

Background: Alternative splicing (AS) is a post-transcriptional regulatory mechanism for gene expression regulation. Splicing decisions are affected by the combinatorial behavior of different splicing factors that bind to multiple binding sites in exons and introns. These binding sites are called splicing regulatory elements (SREs). Here we develop CoSREM (Combinatorial SRE Miner), a graph mining algorithm to discover combinatorial SREs in human exons. Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy. CoSREM is able to identify sets of SREs and is not limited to SRE pairs as are current approaches.

Results: We identified 37 SRE sets that include both enhancer and silencer elements. We show that our results intersect with previous results, including some that are experimental. We also show that the SRE set GGGAGG and GAGGAC identified by CoSREM may play a role in exon skipping events in several tumor samples. We applied CoSREM to RNA-Seq data for multiple tissues to identify combinatorial SREs which may be responsible for exon inclusion or exclusion across tissues.

Conclusion: The new algorithm can identify different combinations of splicing enhancers and silencers without assuming a predefined size or limiting the algorithm to find only pairs of SREs. Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.

No MeSH data available.


Related in: MedlinePlus