Date of this Version
Advances in Neural Information Processing Systems
Substantial data support a temporal difference (TD) model of dopamine (DA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain circumstances, DA activity seems anomalous under the TD model, responding to non-rewarding stimuli. We address these anomalies by suggesting that DA cells multiplex information about reward bonuses, including Sutton's exploration bonuses and Ng et al's non-distorting shaping bonuses. We interpret this additional role for DA in terms of the unconditional attentional and psychomotor effects of dopamine, having the computational role of guiding exploration.
Kakade, S., & Dayan, P. (2000). Dopamine Bonuses. Advances in Neural Information Processing Systems, 13 Retrieved from https://repository.upenn.edu/statistics_papers/472
Date Posted: 27 November 2017