Limits...
An Algorithm of Association Rule Mining for Microbial Energy Prospection

View Article: PubMed Central - PubMed

ABSTRACT

The presence of hydrocarbons beneath earth’s surface produces some microbiological anomalies in soils and sediments. The detection of such microbial populations involves pure bio chemical processes which are specialized, expensive and time consuming. This paper proposes a new algorithm of context based association rule mining on non spatial data. The algorithm is a modified form of already developed algorithm which was for spatial database only. The algorithm is applied to mine context based association rules on microbial database to extract interesting and useful associations of microbial attributes with existence of hydrocarbon reserve. The surface and soil manifestations caused by the presence of hydrocarbon oxidizing microbes are selected from existing literature and stored in a shared database. The algorithm is applied on the said database to generate direct and indirect associations among the stored microbial indicators. These associations are thencorrelated with the probability of hydrocarbon’s existence. The numerical evaluation shows better accuracy for non-spatial data as compared to conventional algorithms at generating reliable and robust rules.

No MeSH data available.


Min Support and avg confidence of Apriori, PNARM and CBPNARM.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5385557&req=5

f6: Min Support and avg confidence of Apriori, PNARM and CBPNARM.

Mentions: As stated in ref. 14, once again we studied the results of our algorithm with respect to number of rules, average confidence of rules and total time taken to extract association rules (execution time). The comparison may not manifest the significance of the approach. The comparison is infact made to compare existing association rule mining approach (Apriori and PNARM) on non-spatial data to the proposed technique. These results are compared with the results produced by positive/negative Association rule mining algorithm (PNARM) and Apriori. In Fig. 5, the algorithms are compared on the basis of number of rules. PNARM always produces fewer rules than the proposed algorithm. Two cases of the proposed algorithm are considered in this plot (1). If the proposed algorithm is applied without taking in consideration in context variable (2). If the algorithm is executed by considering abnormal value of context variable. Thealgorithm produces equal number of rules with assumption (1) hence the line in the plot is not visible, whereas it produced comparatively fewer rules with the assumption (2) because, in that case, the value of the context variable exceeds the upper limit of the range. The insight brought by the plot of Fig. 5 can be evaluated against different aspects. The greater the number of association rules, the greater the patterns extracted from the database. Increase in support value causes the algorithm’s results to show a divergence towards a single point because there are very few co-occurrences in real datasets with much larger support. No of rules produced by apriori algorithm are lesser for most of the times because apriori algorithm do not consider negative rules. The results of PNARM lies above the line of CBPNARM with abnormal context value. The average number of rules in PNARM seems to be greater than number of rules extracted fromCBPNARM but the significance of the CBPNARM algorithm can be viewed by analyzing both the plots of Figs 5 and 6 together. In the plot (Fig. 6), the confidence of rules extracted using our algorithm has the highest projection of all at various support values. The greater the value of confidence, the greater the accuracy of rules with the exception of the dependence of the variables. The confidence of rules gives a true projection up to a certain value of support because the increase in support returns very few co-occurrences from the data, hence limiting a broader capability to evaluate the algorithm. The confidence values of apriori rules is lesser than both PNARM and CBPNARM. CBPNARM produced rules with greater confidence up to 0.5 support when compared with the PNARM algorithm. The rules produced by PNARM are at the lowest confidence which demonstrates the significance of considering contextualinformation, especially for positive and negative spatial association rule mining—as far as PNARM produced the largest number of rules but with a smaller confidence value. The algorithm is an extended form of PNARM inheriting its total capability with an additional capability of context based mining. The execution time of the proposed algorithm is a bit higher as per our expectations. The additional module takes some extra time which seems to be negligible with the increase in minimum support. In Fig. 7, the three algorithms are compared on the basis of execution time. The evaluation of algorithm shows that it is more accurate in terms of granularity of output rules and confidence. Though the execution time of the algorithm is higher than the previous algorithms but the extracted patterns are decision-oriented, specific and clear. The increase in execution time is because of the inclusion of an external factor i.e. context which is notthe part of the adopted procedural approach and can be considered as external influencing factor.


An Algorithm of Association Rule Mining for Microbial Energy Prospection
Min Support and avg confidence of Apriori, PNARM and CBPNARM.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5385557&req=5

f6: Min Support and avg confidence of Apriori, PNARM and CBPNARM.
Mentions: As stated in ref. 14, once again we studied the results of our algorithm with respect to number of rules, average confidence of rules and total time taken to extract association rules (execution time). The comparison may not manifest the significance of the approach. The comparison is infact made to compare existing association rule mining approach (Apriori and PNARM) on non-spatial data to the proposed technique. These results are compared with the results produced by positive/negative Association rule mining algorithm (PNARM) and Apriori. In Fig. 5, the algorithms are compared on the basis of number of rules. PNARM always produces fewer rules than the proposed algorithm. Two cases of the proposed algorithm are considered in this plot (1). If the proposed algorithm is applied without taking in consideration in context variable (2). If the algorithm is executed by considering abnormal value of context variable. Thealgorithm produces equal number of rules with assumption (1) hence the line in the plot is not visible, whereas it produced comparatively fewer rules with the assumption (2) because, in that case, the value of the context variable exceeds the upper limit of the range. The insight brought by the plot of Fig. 5 can be evaluated against different aspects. The greater the number of association rules, the greater the patterns extracted from the database. Increase in support value causes the algorithm’s results to show a divergence towards a single point because there are very few co-occurrences in real datasets with much larger support. No of rules produced by apriori algorithm are lesser for most of the times because apriori algorithm do not consider negative rules. The results of PNARM lies above the line of CBPNARM with abnormal context value. The average number of rules in PNARM seems to be greater than number of rules extracted fromCBPNARM but the significance of the CBPNARM algorithm can be viewed by analyzing both the plots of Figs 5 and 6 together. In the plot (Fig. 6), the confidence of rules extracted using our algorithm has the highest projection of all at various support values. The greater the value of confidence, the greater the accuracy of rules with the exception of the dependence of the variables. The confidence of rules gives a true projection up to a certain value of support because the increase in support returns very few co-occurrences from the data, hence limiting a broader capability to evaluate the algorithm. The confidence values of apriori rules is lesser than both PNARM and CBPNARM. CBPNARM produced rules with greater confidence up to 0.5 support when compared with the PNARM algorithm. The rules produced by PNARM are at the lowest confidence which demonstrates the significance of considering contextualinformation, especially for positive and negative spatial association rule mining—as far as PNARM produced the largest number of rules but with a smaller confidence value. The algorithm is an extended form of PNARM inheriting its total capability with an additional capability of context based mining. The execution time of the proposed algorithm is a bit higher as per our expectations. The additional module takes some extra time which seems to be negligible with the increase in minimum support. In Fig. 7, the three algorithms are compared on the basis of execution time. The evaluation of algorithm shows that it is more accurate in terms of granularity of output rules and confidence. Though the execution time of the algorithm is higher than the previous algorithms but the extracted patterns are decision-oriented, specific and clear. The increase in execution time is because of the inclusion of an external factor i.e. context which is notthe part of the adopted procedural approach and can be considered as external influencing factor.

View Article: PubMed Central - PubMed

ABSTRACT

The presence of hydrocarbons beneath earth’s surface produces some microbiological anomalies in soils and sediments. The detection of such microbial populations involves pure bio chemical processes which are specialized, expensive and time consuming. This paper proposes a new algorithm of context based association rule mining on non spatial data. The algorithm is a modified form of already developed algorithm which was for spatial database only. The algorithm is applied to mine context based association rules on microbial database to extract interesting and useful associations of microbial attributes with existence of hydrocarbon reserve. The surface and soil manifestations caused by the presence of hydrocarbon oxidizing microbes are selected from existing literature and stored in a shared database. The algorithm is applied on the said database to generate direct and indirect associations among the stored microbial indicators. These associations are thencorrelated with the probability of hydrocarbon’s existence. The numerical evaluation shows better accuracy for non-spatial data as compared to conventional algorithms at generating reliable and robust rules.

No MeSH data available.