Citations for A reinforcement learning example (Sutton and Barto 1998)

Legends: Link to a Model Reference cited by multiple papers


Sutton RS, Barto AG (2002) Reinforcement learning: An introduction (2nd ed)

References and models cited by this paper

References and models that cite this paper

Grüning A (2007) Elman backpropagation as reinforcement for simple recurrent networks. Neural Comput 19:3108-31 [Journal] [PubMed]
Porr B, Wörgötter F (2006) Strongly improved stability and faster convergence of temporal sequence learning by using input correlations only. Neural Comput 18:1380-412 [Journal] [PubMed]
(2 refs)

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction

References and models cited by this paper

References and models that cite this paper

Anastasio TJ, Gad YP (2007) Sparse cerebellar innervation can morph the dynamics of a model oculomotor neural integrator. J Comput Neurosci 22:239-54 [Journal] [PubMed]
Baras D, Meir R (2007) Reinforcement learning, spike-time-dependent plasticity, and the BCM rule. Neural Comput 19:2245-79 [Journal] [PubMed]
Bogacz R, Gurney K (2007) The basal ganglia and cortex implement optimal decision making between alternative actions. Neural Comput 19:442-77 [Journal] [PubMed]
Brzosko Z, Zannone S, Schultz W, Clopath C, Paulsen O (2017) Sequential neuromodulation of Hebbian plasticity offers mechanism for effective reward-based navigation. Elife [Journal] [PubMed]
   Sequential neuromodulation of Hebbian plasticity in reward-based navigation (Brzosko et al 2017) [Model]
Chadderdon GL, Neymotin SA, Kerr CC, Lytton WW (2012) Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex. PLoS One 7:e47251 [Journal] [PubMed]
   Reinforcement learning of targeted movement (Chadderdon et al. 2012) [Model]
Clopath C, Ziegler L, Vasilaki E, Büsing L, Gerstner W (2008) Tag-trigger-consolidation: a model of early and late long-term-potentiation and depression. PLoS Comput Biol 4:e1000248 [Journal] [PubMed]
   Tag Trigger Consolidation (Clopath and Ziegler et al. 2008) [Model]
Daw ND, Courville AC, Tourtezky DS, Touretzky DS (2006) Representation and timing in theories of the dopamine system. Neural Comput 18:1637-77 [Journal] [PubMed]
Florian RV (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19:1468-502 [Journal] [PubMed]
Fujita H, Ishii S (2007) Model-based reinforcement learning for partially observable games with sampling-based state estimation. Neural Comput 19:3051-87 [Journal] [PubMed]
Gutkin BS, Dehaene S, Changeux JP (2006) A neurocomputational hypothesis for nicotine addiction. Proc Natl Acad Sci U S A 103:1106-11 [Journal] [PubMed]
Hasselmo ME (2005) A model of prefrontal cortical mechanisms for goal-directed behavior. J Cogn Neurosci 17:1115-29 [Journal] [PubMed]
   Prefrontal cortical mechanisms for goal-directed behavior (Hasselmo 2005) [Model]
Hasselmo ME, Eichenbaum H (2005) Hippocampal mechanisms for the context-dependent retrieval of episodes. Neural Netw 18:1172-90 [Journal] [PubMed]
   Hippocampal context-dependent retrieval (Hasselmo and Eichenbaum 2005) [Model]
Hazy TE, Frank MJ, O'reilly RC (2007) Towards an executive without a homunculus: computational models of the prefrontal cortex/basal ganglia system. Philos Trans R Soc Lond B Biol Sci 362:1601-13 [Journal] [PubMed]
Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17:2443-52 [Journal] [PubMed]
   Linking STDP and Dopamine action to solve the distal reward problem (Izhikevich 2007) [Model]
Low KH, Leow WK, Ang MH Jr (2005) An Ensemble of Cooperative Extended Kohonen Maps for Complex Robot Motion Tasks Neural Comput 17:1411-1445
Morimoto J, Doya K (2007) Reinforcement learning state estimator. Neural Comput 19:730-56 [Journal] [PubMed]
Morita K, Kato A (2014) Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front Neural Circuits 8:36 [Journal] [PubMed]
   Striatal dopamine ramping: an explanation by reinforcement learning with decay (Morita & Kato, 2014) [Model]
Moustafa AA, Cohen MX, Sherman SJ, Frank MJ (2008) A role for dopamine in temporal decision making and reward maximization in parkinsonism. J Neurosci 28:12294-304 [Journal] [PubMed]
Nakano T, Otsuka M, Yoshimoto J, Doya K (2015) A spiking neural network model of model-free reinforcement learning with high-dimensional sensory input and perceptual ambiguity. PLoS One 10:e0115620 [Journal] [PubMed]
   A spiking neural network model of model-free reinforcement learning (Nakano et al 2015) [Model]
O'Reilly RC, Frank MJ (2005) Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia Neural Comput 18:283-328
O'Reilly RC, Frank MJ (2006) Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput 18:283-328 [Journal] [PubMed]
Richmond P, Buesing L, Giugliano M, Vasilaki E (2011) Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations. PLoS One 6:e18539 [Journal] [PubMed]
   Democratic population decisions result in robust policy-gradient learning (Richmond et al. 2011) [Model]
Rivest F, Kalaska JF, Bengio Y (2010) Alternative time representation in dopamine models. J Comput Neurosci 28:107-30 [Journal] [PubMed]
   Alternative time representation in dopamine models (Rivest et al. 2009) [Model]
Roelfsema PR, van Ooyen A (2005) Attention-gated reinforcement learning of internal representations for classification. Neural Comput 17:2176-214 [Journal] [PubMed]
Sakai Y, Fukai T (2008) The actor-critic learning is behind the matching law: matching versus optimal behaviors. Neural Comput 20:227-51 [Journal] [PubMed]
Smith AJ, Becker S, Kapur S (2005) A computational model of the functional role of the ventral-striatal D2 receptor in the expression of previously acquired behaviors. Neural Comput 17:361-95 [Journal] [PubMed]
Soltani A, Wang XJ (2006) A biophysically based neural model of matching law behavior: melioration by stochastic synapses. J Neurosci 26:3731-44 [Journal] [PubMed]
Todorov E (2005) Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Comput 17:1084-108 [Journal] [PubMed]
Toussaint M (2006) A sensorimotor map: modulating lateral interactions for anticipation and planning. Neural Comput 18:1132-55 [Journal] [PubMed]
Triesch J (2007) Synergies between intrinsic and synaptic plasticity mechanisms. Neural Comput 19:885-909 [Journal] [PubMed]
Troyer TW, Doupe AJ (2000) An associational model of birdsong sensorimotor learning I. Efference copy and the learning of song syllables. J Neurophysiol 84:1204-23 [Journal] [PubMed]
Troyer TW, Doupe AJ (2000) An associational model of birdsong sensorimotor learning II. Temporal hierarchies and the learning of song sequence. J Neurophysiol 84:1224-39 [Journal] [PubMed]
Wörgötter F, Porr B (2005) Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms. Neural Comput 17:245-319 [Journal] [PubMed]
(34 refs)