Circuits that contain the Model Concept : Reinforcement Learning

(A neural network learning method where the network has amoung its inputs a (positive or negative) reward dependent on it's behavior as it explores a solution space.)
Re-display model names without descriptions
    Models   Description
1. A large-scale model of the functioning brain (spaun) (Eliasmith et al. 2012)
" ... In this work, we present a 2.5-million-neuron model of the brain (called “Spaun”) that bridges this gap (between neural activity and biological function) by exhibiting many different behaviors. The model is presented only with visual image sequences, and it draws all of its responses with a physically modeled arm. Although simplified, the model captures many aspects of neuroanatomy, neurophysiology, and psychological behavior, which we demonstrate via eight diverse tasks."
2. A reinforcement learning example (Sutton and Barto 1998)
This MATLAB script demonstrates an example of reinforcement learning functions guiding the movements of an agent (a black square) in a gridworld environment. See at the top of the matlab script and the book for more details.
3. A spiking neural network model of model-free reinforcement learning (Nakano et al 2015)
"Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. ... In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL (partially observable reinforcement learning) problems with high-dimensional observations. ... The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. "
4. Alleviating catastrophic forgetting: context gating and synaptic stabilization (Masse et al 2018)
"Artificial neural networks can suffer from catastrophic forgetting, in which learning a new task causes the network to forget how to perform previous tasks. While previous studies have proposed various methods that can alleviate forgetting over small numbers (<10) of tasks, it is uncertain whether they can prevent forgetting across larger numbers of tasks. In this study, we propose a neuroscience-inspired scheme, called “context-dependent gating,” in which mostly nonoverlapping sets of units are active for any one task. Importantly, context-dependent gating has a straightforward implementation, requires little extra computational overhead, and when combined with previous methods to stabilize connection weights, can allow networks to maintain high performance across large numbers of sequentially presented tasks."
5. Alternative time representation in dopamine models (Rivest et al. 2009)
Combines a long short-term memory (LSTM) model of the cortex to a temporal difference learning (TD) model of the basal ganglia. Code to run simulations similar to the published data: Rivest, F, Kalaska, J.F., Bengio, Y. (2009) Alternative time representation in dopamine models. Journal of Computational Neuroscience. See http://dx.doi.org/10.1007/s10827-009-0191-1 for details.
6. Cortex learning models (Weber at al. 2006, Weber and Triesch, 2006, Weber and Wermter 2006/7)
A simulator and the configuration files for three publications are provided. First, "A hybrid generative and predictive model of the motor cortex" (Weber at al. 2006) which uses reinforcement learning to set up a toy action scheme, then uses unsupervised learning to "copy" the learnt action, and an attractor network to predict the hidden code of the unsupervised network. Second, "A Self-Organizing Map of Sigma-Pi Units" (Weber and Wermter 2006/7) learns frame of reference transformations on population codes in an unsupervised manner. Third, "A possible representation of reward in the learning of saccades" (Weber and Triesch, 2006) implements saccade learning with two possible learning schemes for horizontal and vertical saccades, respectively.
7. Cortical model with reinforcement learning drives realistic virtual arm (Dura-Bernal et al 2015)
We developed a 3-layer sensorimotor cortical network of consisting of 704 spiking model-neurons, including excitatory, fast-spiking and low-threshold spiking interneurons. Neurons were interconnected with AMPA/NMDA, and GABAA synapses. We trained our model using spike-timing-dependent reinforcement learning to control a virtual musculoskeletal human arm, with realistic anatomical and biomechanical properties, to reach a target. Virtual arm position was used to simultaneously control a robot arm via a network interface.
8. Dynamic dopamine modulation in the basal ganglia: Learning in Parkinson (Frank et al 2004,2005)
See README file for all info on how to run models under different tasks and simulated Parkinson's and medication conditions.
9. First-Spike-Based Visual Categorization Using Reward-Modulated STDP (Mozafari et al. 2018)
"...Here, for the first time, we show that (Reinforcement Learning) RL can be used efficiently to train a spiking neural network (SNN) to perform object recognition in natural images without using an external classifier. We used a feedforward convolutional SNN and a temporal coding scheme where the most strongly activated neurons fire first, while less activated ones fire later, or not at all. In the highest layers, each neuron was assigned to an object category, and it was assumed that the stimulus category was the category of the first neuron to fire. ..."
10. Fixed point attractor (Hasselmo et al 1995)
"... In the model, cholinergic suppression of synaptic transmission at excitatory feedback synapses is shown to determine the extent to which activity depends upon new features of the afferent input versus components of previously stored representations. ..." See paper for more and details. The MATLAB script demonstrates the model of fixed point attractors mediated by excitatory feedback with subtractive inhibition in a continuous firing rate model.
11. Hippocampal context-dependent retrieval (Hasselmo and Eichenbaum 2005)
"... The model simulates the context-sensitive firing properties of hippocampal neurons including trial-specific firing during spatial alternation and trial by trial changes in theta phase precession on a linear track. ..." See paper for more and details.
12. Motor system model with reinforcement learning drives virtual arm (Dura-Bernal et al 2017)
"We implemented a model of the motor system with the following components: dorsal premotor cortex (PMd), primary motor cortex (M1), spinal cord and musculoskeletal arm (Figure 1). PMd modulated M1 to select the target to reach, M1 excited the descending spinal cord neurons that drove the arm muscles, and received arm proprioceptive feedback (information about the arm position) via the ascending spinal cord neurons. The large-scale model of M1 consisted of 6,208 spiking Izhikevich model neurons [37] of four types: regular-firing and bursting pyramidal neurons, and fast-spiking and low-threshold-spiking interneurons. These were distributed across cortical layers 2/3, 5A, 5B and 6, with cell properties, proportions, locations, connectivity, weights and delays drawn primarily from mammalian experimental data [38], [39], and described in detail in previous work [29]. The network included 486,491 connections, with synapses modeling properties of four different receptors ..."
13. Odor supported place cell model and goal navigation in rodents (Kulvicius et al. 2008)
" ... Here we model odor supported place cells by using a simple feed-forward network and analyze the impact of olfactory cues on place cell formation and spatial navigation. The obtained place cells are used to solve a goal navigation task by a novel mechanism based on self-marking by odor patches combined with a Q-learning algorithm. We also analyze the impact of place cell remapping on goal directed behavior when switching between two environments. ..."
14. Prefrontal cortical mechanisms for goal-directed behavior (Hasselmo 2005)
".. a model of prefrontal cortex function emphasizing the influence of goal-related activity on the choice of the next motor output. ... Different neocortical minicolumns represent distinct sensory input states and distinct motor output actions. The dynamics of each minicolumn include separate phases of encoding and retrieval. During encoding, strengthening of excitatory connections forms forward and reverse associations between each state, the following action, and a subsequent state, which may include reward. During retrieval, activity spreads from reward states throughout the network. The interaction of this spreading activity with a specific input state directs selection of the next appropriate action. Simulations demonstrate how these mechanisms can guide performance in a range of goal directed tasks, and provide a functional framework for some of the neuronal responses previously observed in the medial prefrontal cortex during performance of spatial memory tasks in rats."
15. Reinforcement learning of targeted movement (Chadderdon et al. 2012)
"Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint “forearm” to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. ..."
16. Reward modulated STDP (Legenstein et al. 2008)
"... This article provides tools for an analytic treatment of reward-modulated STDP, which allows us to predict under which conditions reward-modulated STDP will achieve a desired learning effect. These analytical results imply that neurons can learn through reward-modulated STDP to classify not only spatial but also temporal firing patterns of presynaptic neurons. They also can learn to respond to specific presynaptic firing patterns with particular spike patterns. Finally, the resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP. This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker. In this experiment monkeys were rewarded for increasing the firing rate of a particular neuron in the cortex and were able to solve this extremely difficult credit assignment problem. ... In addition our model demonstrates that reward-modulated STDP can be applied to all synapses in a large recurrent neural network without endangering the stability of the network dynamics."
17. Roles of subthalamic nucleus and DBS in reinforcement conflict-based decision making (Frank 2006)
Deep brain stimulation (DBS) of the subthalamic nucleus dramatically improves the motor symptoms of Parkinson's disease, but causes cognitive side effects such as impulsivity. This model from Frank (2006) simulates the role of the subthalamic nucleus (STN) within the basal ganglia circuitry in decision making. The STN dynamically modulates network decision thresholds in proportion to decision conflict. The STN ``hold your horses'' signal adaptively allows the system more time to settle on the best choice when multiple options are valid. The model also replicates effects in Parkinson's patients on and off DBS in experiments designed to test the model (Frank et al, 2007).
18. Sensorimotor cortex reinforcement learning of 2-joint virtual arm reaching (Neymotin et al. 2013)
"... We developed a model of sensory and motor neocortex consisting of 704 spiking model-neurons. Sensory and motor populations included excitatory cells and two types of interneurons. Neurons were interconnected with AMPA/NMDA, and GABAA synapses. We trained our model using spike-timing-dependent reinforcement learning to control a 2-joint virtual arm to reach to a fixed target. ... "
19. Striatal dopamine ramping: an explanation by reinforcement learning with decay (Morita & Kato, 2014)
Incorporation of decay of learned values into temporal-difference (TD) learning (Sutton & Barto, 1998, Reinforcement Learning (MIT Press)) causes ramping of TD reward prediction error (RPE), which could explain, given the hypothesis that dopamine represents TD RPE (Montague et al., 1996, J Neurosci 16:1936; Schultz et al., 1997, Science 275:1593), the reported ramping of the dopamine concentration in the striatum in a reward-associated spatial navigation task (Howe et al., 2013, Nature 500:575).

Re-display model names without descriptions