Limits...
Reinforcement learning for adaptive threshold control of restorative brain-computer interfaces: a Bayesian simulation.

Bauer R, Gharabaghi A - Front Neurosci (2015)

Bottom Line: For each feedback iteration, we first determined the thresholds that result in minimal action entropy and maximal instructional efficiency.We then used the resulting vector for the simulation of continuous threshold adaptation.Finally, on the basis of information-theory, we provided an explanation for the achieved benefits of adaptive threshold setting.

View Article: PubMed Central - PubMed

Affiliation: Division of Functional and Restorative Neurosurgery and Division of Translational Neurosurgery, Department of Neurosurgery, Eberhard Karls University Tuebingen Tuebingen, Germany ; Neuroprosthetics Research Group, Werner Reichardt Centre for Integrative Neuroscience, Eberhard Karls University Tuebingen Tuebingen, Germany.

ABSTRACT
Restorative brain-computer interfaces (BCI) are increasingly used to provide feedback of neuronal states in a bid to normalize pathological brain activity and achieve behavioral gains. However, patients and healthy subjects alike often show a large variability, or even inability, of brain self-regulation for BCI control, known as BCI illiteracy. Although current co-adaptive algorithms are powerful for assistive BCIs, their inherent class switching clashes with the operant conditioning goal of restorative BCIs. Moreover, due to the treatment rationale, the classifier of restorative BCIs usually has a constrained feature space, thus limiting the possibility of classifier adaptation. In this context, we applied a Bayesian model of neurofeedback and reinforcement learning for different threshold selection strategies to study the impact of threshold adaptation of a linear classifier on optimizing restorative BCIs. For each feedback iteration, we first determined the thresholds that result in minimal action entropy and maximal instructional efficiency. We then used the resulting vector for the simulation of continuous threshold adaptation. We could thus show that threshold adaptation can improve reinforcement learning, particularly in cases of BCI illiteracy. Finally, on the basis of information-theory, we provided an explanation for the achieved benefits of adaptive threshold setting.

No MeSH data available.


Shows the visualization of the point-wise Kullback-Leibler divergence between the probability of reward for the trained/false actions, with threshold θ on the y-axis and classification accuracy on the x-axis. Red contour lines indicate negative values and blue lines positive values (lines have a distance of 0.05). The black line depicts the threshold resulting in maximum classification accuracy. (A) Shows the reward caused by preference of the trained action. (B) Shows the loss caused by preference of the false action.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4325901&req=5

Figure 6: Shows the visualization of the point-wise Kullback-Leibler divergence between the probability of reward for the trained/false actions, with threshold θ on the y-axis and classification accuracy on the x-axis. Red contour lines indicate negative values and blue lines positive values (lines have a distance of 0.05). The black line depicts the threshold resulting in maximum classification accuracy. (A) Shows the reward caused by preference of the trained action. (B) Shows the loss caused by preference of the false action.

Mentions: This point-wise Kullback-Leibler divergence for each threshold measures the relative informational content of the reward gained by preferring the trained action (see Figure 6A) or the reward lost by preferring the false action (see Figure 6B). The visualization for different classification accuracies shows that the gain information peaks at positive thresholds (see Figure 6A), while the loss information peaks at negative thresholds (see Figure 6B). As classification accuracy increases, the divergence becomes stronger and narrower without affecting the peak location. We postulate that these two stable peaks explain not only the asymmetry and the decreased magnitude of deflection but also the narrow learning space for the expert environment (see Figure 4). In the same vein, classification accuracy narrows down and assumes a more peaked shape in the expert environment (see Figure 2). This indicates that the classification accuracy encompasses a zone in which learning may occur, while the ideal threshold within this zone would have to be selected dynamically in accordance with the subject's current bias. This perspective would tally with the theory that the classification accuracy is the zone of proximal development (Schnotz and Kürschner, 2007; Bauer and Gharabaghi, 2015).


Reinforcement learning for adaptive threshold control of restorative brain-computer interfaces: a Bayesian simulation.

Bauer R, Gharabaghi A - Front Neurosci (2015)

Shows the visualization of the point-wise Kullback-Leibler divergence between the probability of reward for the trained/false actions, with threshold θ on the y-axis and classification accuracy on the x-axis. Red contour lines indicate negative values and blue lines positive values (lines have a distance of 0.05). The black line depicts the threshold resulting in maximum classification accuracy. (A) Shows the reward caused by preference of the trained action. (B) Shows the loss caused by preference of the false action.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4325901&req=5

Figure 6: Shows the visualization of the point-wise Kullback-Leibler divergence between the probability of reward for the trained/false actions, with threshold θ on the y-axis and classification accuracy on the x-axis. Red contour lines indicate negative values and blue lines positive values (lines have a distance of 0.05). The black line depicts the threshold resulting in maximum classification accuracy. (A) Shows the reward caused by preference of the trained action. (B) Shows the loss caused by preference of the false action.
Mentions: This point-wise Kullback-Leibler divergence for each threshold measures the relative informational content of the reward gained by preferring the trained action (see Figure 6A) or the reward lost by preferring the false action (see Figure 6B). The visualization for different classification accuracies shows that the gain information peaks at positive thresholds (see Figure 6A), while the loss information peaks at negative thresholds (see Figure 6B). As classification accuracy increases, the divergence becomes stronger and narrower without affecting the peak location. We postulate that these two stable peaks explain not only the asymmetry and the decreased magnitude of deflection but also the narrow learning space for the expert environment (see Figure 4). In the same vein, classification accuracy narrows down and assumes a more peaked shape in the expert environment (see Figure 2). This indicates that the classification accuracy encompasses a zone in which learning may occur, while the ideal threshold within this zone would have to be selected dynamically in accordance with the subject's current bias. This perspective would tally with the theory that the classification accuracy is the zone of proximal development (Schnotz and Kürschner, 2007; Bauer and Gharabaghi, 2015).

Bottom Line: For each feedback iteration, we first determined the thresholds that result in minimal action entropy and maximal instructional efficiency.We then used the resulting vector for the simulation of continuous threshold adaptation.Finally, on the basis of information-theory, we provided an explanation for the achieved benefits of adaptive threshold setting.

View Article: PubMed Central - PubMed

Affiliation: Division of Functional and Restorative Neurosurgery and Division of Translational Neurosurgery, Department of Neurosurgery, Eberhard Karls University Tuebingen Tuebingen, Germany ; Neuroprosthetics Research Group, Werner Reichardt Centre for Integrative Neuroscience, Eberhard Karls University Tuebingen Tuebingen, Germany.

ABSTRACT
Restorative brain-computer interfaces (BCI) are increasingly used to provide feedback of neuronal states in a bid to normalize pathological brain activity and achieve behavioral gains. However, patients and healthy subjects alike often show a large variability, or even inability, of brain self-regulation for BCI control, known as BCI illiteracy. Although current co-adaptive algorithms are powerful for assistive BCIs, their inherent class switching clashes with the operant conditioning goal of restorative BCIs. Moreover, due to the treatment rationale, the classifier of restorative BCIs usually has a constrained feature space, thus limiting the possibility of classifier adaptation. In this context, we applied a Bayesian model of neurofeedback and reinforcement learning for different threshold selection strategies to study the impact of threshold adaptation of a linear classifier on optimizing restorative BCIs. For each feedback iteration, we first determined the thresholds that result in minimal action entropy and maximal instructional efficiency. We then used the resulting vector for the simulation of continuous threshold adaptation. We could thus show that threshold adaptation can improve reinforcement learning, particularly in cases of BCI illiteracy. Finally, on the basis of information-theory, we provided an explanation for the achieved benefits of adaptive threshold setting.

No MeSH data available.