Limits...
A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.

Legenstein R, Pecevski D, Maass W - PLoS Comput. Biol. (2008)

Bottom Line: This article provides tools for an analytic treatment of reward-modulated STDP, which allows us to predict under which conditions reward-modulated STDP will achieve a desired learning effect.These analytical results imply that neurons can learn through reward-modulated STDP to classify not only spatial but also temporal firing patterns of presynaptic neurons.Our model for this experiment relies on a combination of reward-modulated STDP with variable spontaneous firing activity.

View Article: PubMed Central - PubMed

Affiliation: Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria.

ABSTRACT
Reward-modulated spike-timing-dependent plasticity (STDP) has recently emerged as a candidate for a learning rule that could explain how behaviorally relevant adaptive changes in complex networks of spiking neurons could be achieved in a self-organizing manner through local synaptic plasticity. However, the capabilities and limitations of this learning rule could so far only be tested through computer simulations. This article provides tools for an analytic treatment of reward-modulated STDP, which allows us to predict under which conditions reward-modulated STDP will achieve a desired learning effect. These analytical results imply that neurons can learn through reward-modulated STDP to classify not only spatial but also temporal firing patterns of presynaptic neurons. They also can learn to respond to specific presynaptic firing patterns with particular spike patterns. Finally, the resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP. This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker. In this experiment monkeys were rewarded for increasing the firing rate of a particular neuron in the cortex and were able to solve this extremely difficult credit assignment problem. Our model for this experiment relies on a combination of reward-modulated STDP with variable spontaneous firing activity. Hence it also provides a possible functional explanation for trial-to-trial variability, which is characteristic for cortical networks of neurons but has no analogue in currently existing artificial computing systems. In addition our model demonstrates that reward-modulated STDP can be applied to all synapses in a large recurrent neural network without endangering the stability of the network dynamics.

Show MeSH

Related in: MedlinePlus

Results for reinforcement learning of exact spike times through                            reward-modulated STDP.(A) Synaptic weight changes of the trained LIF neuron, for 5 different                            runs of the experiment. The curves show the average of the synaptic                            weights that should converge to  (dashed lines), and the average of the synaptic                            weights that should converge to  (solid lines) with different colors for each                            simulation run. (B) Comparison of the output of the trained neuron                            before (top trace) and after learning (bottom trace). The same input                            spike trains and the same noise inputs were used before and after                            training for 2 hours. The second trace from above shows those spike                            times S* which are rewarded, the third trace                            shows the realizable part of S* (i.e. those                            spikes which the trained neuron could potentially learn to reproduce,                            since the neuron μ* produces them                            without its 10 extra spike inputs). The close match between the third                            and fourth trace shows that the trained neuron performs very well. (C)                            Evolution of the spike correlation between the spike train of the                            trained neuron and the realizable part of the target spike train                                S*. (D) The angle between the weight vector                            w of the trained neuron and the weight vector w* of the neuron                                μ* during the simulation, in                            radians. (E) Synaptic weights at the beginning of the simulation are                            marked with ×, and at the end of the simulation with                            •, for each plastic synapse of the trained neuron. (F)                            Evolution of the synaptic weights                                w/wmax during the                            simulation (we had chosen  for i<50,  for i≥50).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2543108&req=5

pcbi-1000180-g007: Results for reinforcement learning of exact spike times through reward-modulated STDP.(A) Synaptic weight changes of the trained LIF neuron, for 5 different runs of the experiment. The curves show the average of the synaptic weights that should converge to (dashed lines), and the average of the synaptic weights that should converge to (solid lines) with different colors for each simulation run. (B) Comparison of the output of the trained neuron before (top trace) and after learning (bottom trace). The same input spike trains and the same noise inputs were used before and after training for 2 hours. The second trace from above shows those spike times S* which are rewarded, the third trace shows the realizable part of S* (i.e. those spikes which the trained neuron could potentially learn to reproduce, since the neuron μ* produces them without its 10 extra spike inputs). The close match between the third and fourth trace shows that the trained neuron performs very well. (C) Evolution of the spike correlation between the spike train of the trained neuron and the realizable part of the target spike train S*. (D) The angle between the weight vector w of the trained neuron and the weight vector w* of the neuron μ* during the simulation, in radians. (E) Synaptic weights at the beginning of the simulation are marked with ×, and at the end of the simulation with •, for each plastic synapse of the trained neuron. (F) Evolution of the synaptic weights w/wmax during the simulation (we had chosen for i<50, for i≥50).

Mentions: We performed 5 repetitions of the experiment, each time with different randomly generated inputs and different initial weight values for the trained neuron. In each of the 5 runs, the average synaptic weights of synapses with and approached their target values, as shown in Figure 7A. In order to test how closely the trained neuron reproduces the target spike train S* after learning, we performed additional simulations where the same spike input was applied to the trained neuron before and after the learning. Then we compared the output of the trained neuron before and after learning with the output S* of neuron μ*. Figure 7B shows that the trained neuron approximates the part of S* which is accessible to it quite well. Figure 7C–F provide more detailed analyses of the evolution of weights during learning. The computer simulations confirmed the theoretical prediction that the neuron can learn well through reward-modulated STDP only if a certain level of noise is injected into the neuron (see preceding discussion and Figure S6).


A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.

Legenstein R, Pecevski D, Maass W - PLoS Comput. Biol. (2008)

Results for reinforcement learning of exact spike times through                            reward-modulated STDP.(A) Synaptic weight changes of the trained LIF neuron, for 5 different                            runs of the experiment. The curves show the average of the synaptic                            weights that should converge to  (dashed lines), and the average of the synaptic                            weights that should converge to  (solid lines) with different colors for each                            simulation run. (B) Comparison of the output of the trained neuron                            before (top trace) and after learning (bottom trace). The same input                            spike trains and the same noise inputs were used before and after                            training for 2 hours. The second trace from above shows those spike                            times S* which are rewarded, the third trace                            shows the realizable part of S* (i.e. those                            spikes which the trained neuron could potentially learn to reproduce,                            since the neuron μ* produces them                            without its 10 extra spike inputs). The close match between the third                            and fourth trace shows that the trained neuron performs very well. (C)                            Evolution of the spike correlation between the spike train of the                            trained neuron and the realizable part of the target spike train                                S*. (D) The angle between the weight vector                            w of the trained neuron and the weight vector w* of the neuron                                μ* during the simulation, in                            radians. (E) Synaptic weights at the beginning of the simulation are                            marked with ×, and at the end of the simulation with                            •, for each plastic synapse of the trained neuron. (F)                            Evolution of the synaptic weights                                w/wmax during the                            simulation (we had chosen  for i<50,  for i≥50).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2543108&req=5

pcbi-1000180-g007: Results for reinforcement learning of exact spike times through reward-modulated STDP.(A) Synaptic weight changes of the trained LIF neuron, for 5 different runs of the experiment. The curves show the average of the synaptic weights that should converge to (dashed lines), and the average of the synaptic weights that should converge to (solid lines) with different colors for each simulation run. (B) Comparison of the output of the trained neuron before (top trace) and after learning (bottom trace). The same input spike trains and the same noise inputs were used before and after training for 2 hours. The second trace from above shows those spike times S* which are rewarded, the third trace shows the realizable part of S* (i.e. those spikes which the trained neuron could potentially learn to reproduce, since the neuron μ* produces them without its 10 extra spike inputs). The close match between the third and fourth trace shows that the trained neuron performs very well. (C) Evolution of the spike correlation between the spike train of the trained neuron and the realizable part of the target spike train S*. (D) The angle between the weight vector w of the trained neuron and the weight vector w* of the neuron μ* during the simulation, in radians. (E) Synaptic weights at the beginning of the simulation are marked with ×, and at the end of the simulation with •, for each plastic synapse of the trained neuron. (F) Evolution of the synaptic weights w/wmax during the simulation (we had chosen for i<50, for i≥50).
Mentions: We performed 5 repetitions of the experiment, each time with different randomly generated inputs and different initial weight values for the trained neuron. In each of the 5 runs, the average synaptic weights of synapses with and approached their target values, as shown in Figure 7A. In order to test how closely the trained neuron reproduces the target spike train S* after learning, we performed additional simulations where the same spike input was applied to the trained neuron before and after the learning. Then we compared the output of the trained neuron before and after learning with the output S* of neuron μ*. Figure 7B shows that the trained neuron approximates the part of S* which is accessible to it quite well. Figure 7C–F provide more detailed analyses of the evolution of weights during learning. The computer simulations confirmed the theoretical prediction that the neuron can learn well through reward-modulated STDP only if a certain level of noise is injected into the neuron (see preceding discussion and Figure S6).

Bottom Line: This article provides tools for an analytic treatment of reward-modulated STDP, which allows us to predict under which conditions reward-modulated STDP will achieve a desired learning effect.These analytical results imply that neurons can learn through reward-modulated STDP to classify not only spatial but also temporal firing patterns of presynaptic neurons.Our model for this experiment relies on a combination of reward-modulated STDP with variable spontaneous firing activity.

View Article: PubMed Central - PubMed

Affiliation: Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria.

ABSTRACT
Reward-modulated spike-timing-dependent plasticity (STDP) has recently emerged as a candidate for a learning rule that could explain how behaviorally relevant adaptive changes in complex networks of spiking neurons could be achieved in a self-organizing manner through local synaptic plasticity. However, the capabilities and limitations of this learning rule could so far only be tested through computer simulations. This article provides tools for an analytic treatment of reward-modulated STDP, which allows us to predict under which conditions reward-modulated STDP will achieve a desired learning effect. These analytical results imply that neurons can learn through reward-modulated STDP to classify not only spatial but also temporal firing patterns of presynaptic neurons. They also can learn to respond to specific presynaptic firing patterns with particular spike patterns. Finally, the resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP. This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker. In this experiment monkeys were rewarded for increasing the firing rate of a particular neuron in the cortex and were able to solve this extremely difficult credit assignment problem. Our model for this experiment relies on a combination of reward-modulated STDP with variable spontaneous firing activity. Hence it also provides a possible functional explanation for trial-to-trial variability, which is characteristic for cortical networks of neurons but has no analogue in currently existing artificial computing systems. In addition our model demonstrates that reward-modulated STDP can be applied to all synapses in a large recurrent neural network without endangering the stability of the network dynamics.

Show MeSH
Related in: MedlinePlus