Limits...
A distribution-free multi-factorial profiler for harvesting information from high-density screenings.

Besseris GJ - PLoS ONE (2013)

Bottom Line: Partial effects are sliced off systematically from the investigated response to form individual contrasts using simple robust measures.Main benefits of the method are: 1) easy to grasp, 2) well-explained test-power properties, 3) distribution-free, 4) sparsity-free, 5) calibration-free, 6) simulation-free, 7) easy to implement, and 8) expanded usability to any type and size of multi-factorial screening designs.The method is elucidated with a benchmarked profiling effort for a water filtration process.

View Article: PubMed Central - PubMed

Affiliation: Department of Mechanical Engineering, Advanced Industrial & Manufacturing Systems Program, Technological Educational Institute of Piraeus, Aegaleo, Greece. besseris@teipir.gr

ABSTRACT
Data screening is an indispensable phase in initiating the scientific discovery process. Fractional factorial designs offer quick and economical options for engineering highly-dense structured datasets. Maximum information content is harvested when a selected fractional factorial scheme is driven to saturation while data gathering is suppressed to no replication. A novel multi-factorial profiler is presented that allows screening of saturated-unreplicated designs by decomposing the examined response to its constituent contributions. Partial effects are sliced off systematically from the investigated response to form individual contrasts using simple robust measures. By isolating each time the disturbance attributed solely to a single controlling factor, the Wilcoxon-Mann-Whitney rank stochastics are employed to assign significance. We demonstrate that the proposed profiler possesses its own self-checking mechanism for detecting a potential influence due to fluctuations attributed to the remaining unexplainable error. Main benefits of the method are: 1) easy to grasp, 2) well-explained test-power properties, 3) distribution-free, 4) sparsity-free, 5) calibration-free, 6) simulation-free, 7) easy to implement, and 8) expanded usability to any type and size of multi-factorial screening designs. The method is elucidated with a benchmarked profiling effort for a water filtration process.

Show MeSH

Related in: MedlinePlus

Main effects plot for the means of the filtration time.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3756950&req=5

pone-0073275-g004: Main effects plot for the means of the filtration time.

Mentions: To illustrate the potential capabilities of our technique, we tested it against a benchmarked filtration process. We discovered that the three dominant effects in our study's outcomes are congruent to the practical (non-statistical) explanations that have been posted in Box et al. [11]. In Figure 4, we provide a main effects plot that shows actually that WS, T and CS stir up the most variation across their two examined operating endpoints. What the main effects plot does not disclose intelligibly is where to draw the line for discovery in this situation. For example, a plausible philosophical question that may be raised contemplating Figure 4 could be: “Is WS an active factor or a mysterious transition among two possible states?” Another dilemma relates with how truly strong CS is. This can be rephrased in the form of question: “Is CS the preponderant effect?” Despite tracing the steepest slope to CS (Figure 3), the source of this disturbance is statistically indistinguishable with respect to the other two active effects (Table 3). Therefore, information that has been extracted from simple analytics ought to be also stochastically challenged. Additionally, the same group of the three active effects has been confirmed by a highly-specialized method which is based on the powerful maximum-likelihood-ratio principle [46]. However, the maximum-likelihood-ratio method has been formulated to serve only the restricted range of medium-sized unreplicated fractional factorials. Another immediate advantage that needs to be pointed here is that our method converges blindly to the correct final prediction without to be granted a priori refinement for a solution search region as in the Miller approach [46]. Even so, our method clearly outperforms the maximum likelihood ratio test which is operable to an individual error rate of 0.075. Moreover, the best experimentwise error rate for the likelihood ratio test is constrained at a level of 0.25 when in fact over-searching specifically for four active effects. In comparison to our technique (Results section), the Miller method lags in performance with respect to both criteria regarding efficiency in individual and the experimentwise error rates. Definitely, our approach is much simpler and extensible to FFDs larger than the limiting 16-run factorial design that the maximum likelihood ratio technique has been constructed to. By the same token, our technique seems to safeguard more reliably against the possibility of a false discovery event when it is compared with the inferential capabilities of composite non-parametrics. The performance of composite non-parametrics on the same filtration case study has been reported in a past analysis [34]. It was found that composite non-parametrics differentiate on the final prediction with respect to the number of the important effects when considering two different error rates. Depending on the choice of the error rate limit - being either 0.05 or 0.10 - composite non-parametrics underestimated (uncovering two effects) or overestimated (uncovering four effects) the size of the active group. Another gain over the composite non-parametrics method is realized through our proposed method by making explicit the quantification of statistical significance for each influence individually. This means that in an extreme case in which all seven factors were found simultaneously significant to a hypothetical level of 0.05, our method would ostensibly enable mining such overwhelming information. On the contrary, composite non-parametrics would need to lower performance severely to a p-value of 0.333 in order to detect such a rare occurrence (Table 1g in [34]).


A distribution-free multi-factorial profiler for harvesting information from high-density screenings.

Besseris GJ - PLoS ONE (2013)

Main effects plot for the means of the filtration time.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3756950&req=5

pone-0073275-g004: Main effects plot for the means of the filtration time.
Mentions: To illustrate the potential capabilities of our technique, we tested it against a benchmarked filtration process. We discovered that the three dominant effects in our study's outcomes are congruent to the practical (non-statistical) explanations that have been posted in Box et al. [11]. In Figure 4, we provide a main effects plot that shows actually that WS, T and CS stir up the most variation across their two examined operating endpoints. What the main effects plot does not disclose intelligibly is where to draw the line for discovery in this situation. For example, a plausible philosophical question that may be raised contemplating Figure 4 could be: “Is WS an active factor or a mysterious transition among two possible states?” Another dilemma relates with how truly strong CS is. This can be rephrased in the form of question: “Is CS the preponderant effect?” Despite tracing the steepest slope to CS (Figure 3), the source of this disturbance is statistically indistinguishable with respect to the other two active effects (Table 3). Therefore, information that has been extracted from simple analytics ought to be also stochastically challenged. Additionally, the same group of the three active effects has been confirmed by a highly-specialized method which is based on the powerful maximum-likelihood-ratio principle [46]. However, the maximum-likelihood-ratio method has been formulated to serve only the restricted range of medium-sized unreplicated fractional factorials. Another immediate advantage that needs to be pointed here is that our method converges blindly to the correct final prediction without to be granted a priori refinement for a solution search region as in the Miller approach [46]. Even so, our method clearly outperforms the maximum likelihood ratio test which is operable to an individual error rate of 0.075. Moreover, the best experimentwise error rate for the likelihood ratio test is constrained at a level of 0.25 when in fact over-searching specifically for four active effects. In comparison to our technique (Results section), the Miller method lags in performance with respect to both criteria regarding efficiency in individual and the experimentwise error rates. Definitely, our approach is much simpler and extensible to FFDs larger than the limiting 16-run factorial design that the maximum likelihood ratio technique has been constructed to. By the same token, our technique seems to safeguard more reliably against the possibility of a false discovery event when it is compared with the inferential capabilities of composite non-parametrics. The performance of composite non-parametrics on the same filtration case study has been reported in a past analysis [34]. It was found that composite non-parametrics differentiate on the final prediction with respect to the number of the important effects when considering two different error rates. Depending on the choice of the error rate limit - being either 0.05 or 0.10 - composite non-parametrics underestimated (uncovering two effects) or overestimated (uncovering four effects) the size of the active group. Another gain over the composite non-parametrics method is realized through our proposed method by making explicit the quantification of statistical significance for each influence individually. This means that in an extreme case in which all seven factors were found simultaneously significant to a hypothetical level of 0.05, our method would ostensibly enable mining such overwhelming information. On the contrary, composite non-parametrics would need to lower performance severely to a p-value of 0.333 in order to detect such a rare occurrence (Table 1g in [34]).

Bottom Line: Partial effects are sliced off systematically from the investigated response to form individual contrasts using simple robust measures.Main benefits of the method are: 1) easy to grasp, 2) well-explained test-power properties, 3) distribution-free, 4) sparsity-free, 5) calibration-free, 6) simulation-free, 7) easy to implement, and 8) expanded usability to any type and size of multi-factorial screening designs.The method is elucidated with a benchmarked profiling effort for a water filtration process.

View Article: PubMed Central - PubMed

Affiliation: Department of Mechanical Engineering, Advanced Industrial & Manufacturing Systems Program, Technological Educational Institute of Piraeus, Aegaleo, Greece. besseris@teipir.gr

ABSTRACT
Data screening is an indispensable phase in initiating the scientific discovery process. Fractional factorial designs offer quick and economical options for engineering highly-dense structured datasets. Maximum information content is harvested when a selected fractional factorial scheme is driven to saturation while data gathering is suppressed to no replication. A novel multi-factorial profiler is presented that allows screening of saturated-unreplicated designs by decomposing the examined response to its constituent contributions. Partial effects are sliced off systematically from the investigated response to form individual contrasts using simple robust measures. By isolating each time the disturbance attributed solely to a single controlling factor, the Wilcoxon-Mann-Whitney rank stochastics are employed to assign significance. We demonstrate that the proposed profiler possesses its own self-checking mechanism for detecting a potential influence due to fluctuations attributed to the remaining unexplainable error. Main benefits of the method are: 1) easy to grasp, 2) well-explained test-power properties, 3) distribution-free, 4) sparsity-free, 5) calibration-free, 6) simulation-free, 7) easy to implement, and 8) expanded usability to any type and size of multi-factorial screening designs. The method is elucidated with a benchmarked profiling effort for a water filtration process.

Show MeSH
Related in: MedlinePlus