Incorporation of decay of learned values into temporal-difference (TD) learning (Sutton & Barto, 1998, Reinforcement Learning (MIT Press)) causes ramping of TD reward prediction error (RPE), which could explain, given the hypothesis that dopamine represents TD RPE (Montague et al., 1996, J Neurosci 16:1936; Schultz et al., 1997, Science 275:1593), the reported ramping of the dopamine concentration in the striatum in a reward-associated spatial navigation task (Howe et al., 2013, Nature 500:575).
Reference:
1 .
Morita K, Kato A (2014) Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits Front. Neural Circuits 8:36
|