Reinforcement learning of targeted movement (Chadderdon et al. 2012)

 Download zip file   Auto-launch 
Help downloading and running models
Accession:144538
"Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint “forearm” to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. ..."
Reference:
1 . Chadderdon GL, Neymotin SA, Kerr CC, Lytton WW (2012) Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex PLoS ONE 2012 7(10):e47251
Model Information (Click on a link to find other models with that property)
Model Type: Realistic Network;
Brain Region(s)/Organism: Neocortex;
Cell Type(s): Neocortex fast spiking (FS) interneuron; Neocortex spiking regular (RS) neuron; Neocortex spiking low threshold (LTS) neuron;
Channel(s):
Gap Junctions:
Receptor(s): GabaA; AMPA; NMDA;
Gene(s):
Transmitter(s): Dopamine; Gaba; Glutamate;
Simulation Environment: NEURON;
Model Concept(s): Simplified Models; Synaptic Plasticity; Long-term Synaptic Plasticity; Reinforcement Learning; Reward-modulated STDP;
Implementer(s): Neymotin, Sam [samn at neurosim.downstate.edu]; Chadderdon, George [gchadder3 at gmail.com];
Search NeuronDB for information about:  GabaA; AMPA; NMDA; Dopamine; Gaba; Glutamate;
/
arm1d
README
drspk.mod *
infot.mod *
intf6_.mod *
intfsw.mod *
misc.mod *
nstim.mod *
stats.mod *
updown.mod *
vecst.mod *
arm.hoc
basestdp.hoc
col.hoc *
colors.hoc *
declist.hoc *
decmat.hoc *
decnqs.hoc *
decvec.hoc *
default.hoc *
drline.hoc *
filtutils.hoc *
geom.hoc
grvec.hoc *
hinton.hoc *
infot.hoc *
init.hoc
intfsw.hoc *
labels.hoc *
local.hoc *
misc.h *
mosinit.hoc
network.hoc
nload.hoc
nqs.hoc *
nqsnet.hoc *
nrnoc.hoc *
params.hoc
run.hoc
samutils.hoc *
sense.hoc *
setup.hoc *
sim.hoc
simctrl.hoc *
stats.hoc *
stim.hoc
syncode.hoc *
units.hoc *
xgetargs.hoc *
                            
: $Id: updown.mod,v 1.16 2009/02/16 22:56:52 billl Exp $

NEURON {
  SUFFIX nothing
  : BVBASE is bit vector base number (typically 0 or -1)
  GLOBAL UPDOWN_INSTALLED, SHM_UPDOWN, NOV_UPDOWN, DEBUG_UPDOWN
}

PARAMETER {
  UPDOWN_INSTALLED=0
  DEBUG_UPDOWN=0
  SHM_UPDOWN=4   : used in updown() for measuring sharpness
  NOV_UPDOWN=1   : used in updown() to eliminate overlap of spikes
  CREEP_UPDOWN=0 : used in updown() to allow left/right "creep" to local minima
}

VERBATIM
#include <stdlib.h>
#include <math.h>
#include <limits.h> // contains LONG_MAX 
#include <sys/time.h> 
extern double* hoc_pgetarg();
extern double hoc_call_func(Symbol*, int narg);
extern FILE* hoc_obj_file_arg(int narg);
extern Object** hoc_objgetarg();
extern void vector_resize();
extern int vector_instance_px();
extern void* vector_arg();
extern double* vector_vec();
extern double hoc_epsilon;
extern double chkarg();
extern void set_seed();
extern int ivoc_list_count(Object*);
extern Object* ivoc_list_item(Object*, int);
extern int hoc_is_double_arg(int narg);
extern char* hoc_object_name(Object*);
char ** hoc_pgargstr();
int list_vector_px();
int list_vector_px2();
int list_vector_px3();
double *list_vector_resize();
int ismono1();
static void hxe() { hoc_execerror("",0); }
static void hxf(void *ptr) { free(ptr); hoc_execerror("",0); }
ENDVERBATIM

:* src.updown(thresh,dlist,nqslist)
:  dest.updown(src)  -- default thresh=0; returns indices
: look for multiple threshold crossings to define peaks
: creates multiple parallel vectors for an NQS db
: counts peaks pointing upward -- should be all pos
: see eg decnqs.hoc:fudup() for usage
VERBATIM

  //** declarations
#define UDSL 500
#define UDNQ 11
// nq=new NQS("LOC","PEAK","WIDTH","BASE","HEIGHT","START","SLICES","SHARP","INDEX","FILE","NESTED")
#define LOC     nq[0] // loc of peak of spike
#define PEAK  	nq[1] // value at peak (absolute height)
#define WIDTH  	nq[2] // rt flank - lt flanks (? isn't it rt flank - LOC ?)
#define BASE  	nq[3] // height at base
#define HEIGHT  nq[4] // peak - base
#define START  	nq[5] // left flank of spike?
#define SLICES  nq[6] // how many slices found this spike
#define SHARP  	nq[7] // 2nd deriv at peak
#define INDEX  	nq[8] // consecutive numbering of spikes
//        	nq[9] // will use to fill in trace's file name at hoc level
#define NESTED  nq[10] // how many bumps are nested within this one
  //** procedure updown()
static double updown (void* vv) {
  int i, k, m, n, nqsz, nsrc, jj[UDSL], f[UDSL], lc, dsz[UDSL], nqmax, thsz, lc2, done, dbn;
  double *src, *tvec, *th, *dest[UDSL], *nq[UDNQ], *tmp, *dbx, lt, thdist;
  Object *ob, *ob2;
  void *vvd[UDSL], *vvth, *vnq[UDNQ];
  //** read in vectors and verify sizes, etc
  nsrc = vector_instance_px(vv, &src); // trace to analyze
  thsz = vector_arg_px(1, &th);        // vector of thresholds to check
  ob =  *hoc_objgetarg(2);             // storage for values for each threshold
  ob2 = *hoc_objgetarg(3);             // list of NQS vectors for returning values
  tmp = (double *)ecalloc(nsrc, sizeof(double));  // tmp is size of trace
  lc =  ivoc_list_count(ob);
  lc2 = ivoc_list_count(ob2);
  if (lc>UDSL) {printf("updown ERRF mismatch: max slice list:%d %d\n",UDSL,lc); hxf(tmp);}
  if (lc2!=UDNQ){printf("updown ERRB mismatch: NQS sz is %d (%d in list)\n",UDNQ,lc2);hxf(tmp);}
  if (nsrc<lc) {printf("updown ERRC mismatch: %d %d\n",lc,nsrc); hxf(tmp);} // ??
  if (lc!=thsz) {printf("updown ERRA mismatch: %d %d\n",lc,thsz); hxf(tmp);}
  if (!ismono1(th,thsz,-1)) {printf("updown ERRD: not mono dec %g %d\n",th[0],thsz); hxf(tmp);}
  // thdist=(th[thsz-2]-th[thsz-1])/2; // NOT BEING USED: the smallest spike we will accept
  for (k=0;k <lc;k++)  dsz[k] =list_vector_px3(ob , k, &dest[k], &vvd[k]);
  for (k=0;k<lc2;k++) {
    i=list_vector_px3(ob2, k, &nq[k],   &vnq[k]);
    if (k==0) nqmax=i; else if (i!=nqmax) { // all NQ vecs same size
      printf("updown ERRE mismatch: %d %d %d\n",k,i,nqmax); hxf(tmp); }
  }
  //** store crossing points and midpoints in dest[k]
  // dest vectors dest[k] will store crossing points and midpoints at each th[k] slice location
  // as triplets: up/max/down
  for (k=0; k<lc; k++) {   // iterate thru thresholds
    jj[k]=f[k]=0; // jj[k] is ind into dest[k]; f[k] is flag for threshold  crossings
    for (i=0;i<nsrc && src[i]>th[k];i++) {} // start somewhere below this thresh th[k]
    for (; i<nsrc; i++) { // iterate through trace
      if (src[i]>th[k]) { 
        if (f[k]==0) { // ? passing thresh 
          if (jj[k]>=dsz[k]){printf("(%d,%d,%d) :: ",k,jj[k],dsz[k]);
            hoc_execerror("Dest vec too small in updown ", 0); }
          dest[k][jj[k]++] = (i-1) + (th[k]-src[i-1])/(src[i]-src[i-1]); // interpolate
          f[k]=1; 
          tmp[k]=-1e9; dest[k][jj[k]]=-1.; // flag in tmp says that a thresh found here
        }
        if (f[k]==1 && src[i]>tmp[k]) { // use tmp[] even more temporarily
          tmp[k]=src[i]; // pick out max
          dest[k][jj[k]] = (double)i; // location of this peak
        }
      } else {          // below thresh 
        if (f[k]==1) {  // just passed going down 
          jj[k]++;      // triplet will be indices of cross-up/peak/cross-down
          dest[k][jj[k]++] = (i-1) + (src[i-1]-th[k])/(src[i-1]-src[i]);
          f[k]=0; 
        }
      }
    }
  }
  //** truncate dest vectors to multiples of 3:
  for (k=0;k<lc;k++) vector_resize(vvd[k],(int)(floor((double)jj[k]/3.)*3.));
  for (i=0; i<nsrc; i++) tmp[i]=0.; // clear temp space
  //** go through all the slices to find identical peaks and save widths and locations
  // tmp[] uses triplets centered around a location corresponding to a max loc in the
  // original vector; the widest flanks for each are then on either side of this loc
  for (k=0;k<lc;k++) { // need to go from top to bottom to widen flanks
    for (i=1;i<jj[k];i+=3) { // through centers (peaks)
      m=(int)dest[k][i]; // hash: place center at location
      if (tmp[m-2]<0 || tmp[m-1]<0 || tmp[m+1]<0 || tmp[m+2]<0) continue; // ignore; too crowded
      tmp[m]--;  // count how many slices have found this peak (use negative)
      tmp[m-1]=dest[k][i-1]; tmp[m+1]=dest[k][i+1]; // flanks
    }
  }
  //** 1st (of 2) loops through tmp[] -- pick up flanks
  // step through tmp[] looking for negatives which indicate the slice count and pick up 
  // flanks from these
  // nq=new NQS("LOC","PEAK","WIDTH","BASE","HEIGHT","START","SLICES","SHARP","INDEX","FILE")
  for (i=0,k=0; i<nsrc; i++) if (tmp[i]<0.) { // tmp holds neg of count of slices
    if (k>=nqmax) { printf("updown ERRG OOR in NQ db: %d %d\n",k,nqmax); hxf(tmp); }
    LOC[k]=(double)i;  // approx location of the peak of the spike
    WIDTH[k]=tmp[i+1]; // location of right side -- temp storage
    START[k]=tmp[i-1]; // start of spike (left side)
    SLICES[k]=-tmp[i];  // # of slices
    k++;
  }
  nqsz=k;   // k ends up as size of NQS db
  if (DEBUG_UPDOWN && ifarg(4)) { dbn=vector_arg_px(4, &dbx); // DEBUG -- save tmp vector
    if (dbn<nsrc) printf("updown ERRH: Insufficient room in debug vec (%d<%d)\n",dbn,nsrc); 
    else for (i=0;i<nsrc;i++) dbx[i]=tmp[i]; 
  }
  //** adjust flanks to handle nested bumps
  // 3 ways to handle spike nested in a spike or elongated base:
  // NB always using same slice for both L and R flanks; NOV_UPDOWN flag: (no-overlap)
  //   0. nested spike(s) share flanks determined by shared base
  //   1. nested spike(s) have individual bases, 1st and last use flanks from base
  //   2. nested spike(s) have individual bases, base flanks listed separately w/out peak
  // here use 
  // search nq vecs to compare flanks to neighboring centers
  // if flanks overlap the centers on LT or RT side,
  // correct them by going back to original slice loc info (in dest[])
  //*** look at left side -- is this flank to left of center of another bump?
  if (NOV_UPDOWN) for (i=0;i<nqsz;i++) { // iterate through NQS db
    if ((i-1)>0 && START[i] < LOC[i-1]) { // flank is to left of prior center
      if (DEBUG_UPDOWN) printf("LT problem %d %g %g<%g\n",i,LOC[i],START[i],LOC[i-1]);
      for (m=lc-1,done=0;m>=0 && !done;m--) { // m:go from bottom (widest) to top
        for (n=1;n<jj[m] && !done;n+=3) {     // n:through centers
          // pick out lowest slice with this peak LOC whose flank is to RT of prior peak
          if (floor(dest[m][n])==LOC[i] && dest[m][n-1]>LOC[i-1]) {
            // ??[i]=START[i]; // temp storage for L end of this overlap
            // replace both left and right flanks at this level -- #1 above
            START[i]=dest[m][n-1]; WIDTH[i]=dest[m][n+1]; done=1; 
          }
        }
      }
    }
    //*** now look at RT side
    if ((i+1)<nqsz && WIDTH[i]>LOC[i+1]) {
      if (DEBUG_UPDOWN) printf("RT problem %d %g %g>%g\n",i,LOC[i],WIDTH[i],LOC[i+1]);
      for (m=lc-1,done=0;m>=0 && !done;m--) { // m: go from bottom to top
        for (n=1;n<jj[m] && !done;n+=3) {     // n: through centers
          // pick out lowest slice with this peak LOC whose flank is to LT of next peak
          if (floor(dest[m][n])==LOC[i] && dest[m][n+1]<LOC[i+1]) {
            // ??[i]=WIDTH[i]; // end of overlap
            START[i]=dest[m][n-1]; WIDTH[i]=dest[m][n+1]; done=1;
          }
        }
      }        
    }
  }

  //make sure left and right sides of bump occur at local minima
  //shouldn't creeping be before NOV_UPDOWN=1 overlap check???
  //creeping can result only in equal borders btwn two bumps
  //on one side, so it should be ok here...
  if(CREEP_UPDOWN) for(i=0,k=0;i<nsrc;i++) if(tmp[i]<0.){

    //move left side to local minima
    int idx = (int)START[k];
    while(idx >= 1 && src[idx] >= src[idx-1]) idx--;
    START[k] = idx;

    //move right side to local minima
    idx = (int)WIDTH[k];
    while(idx < nsrc-1 && src[idx] >= src[idx+1]) idx++;
    WIDTH[k] = idx;

    k++;
  }

  //** 2nd loop through tmp[] used to fill in the rest of NQS
  // needed to split into 2 loops so that could check for overlaps and correct those
  // before filling in the rest of nq
  for (i=0,k=0; i<nsrc; i++) if (tmp[i]<0.) { // tmp holds neg of count of slices
    // calculate a base voltage lt as interpolated value on left side
    lt=src[(int)floor(START[k])]+(START[k]-floor(START[k]))*\
      (src[(int)floor(START[k]+1.)]-src[(int)floor(START[k])]);
    BASE[k]=lt;         // base voltage
    PEAK[k]=src[i];     // peak voltage
    WIDTH[k] = WIDTH[k] - START[k]; // width = RT_flank-LT_flank
    HEIGHT[k]=PEAK[k]-BASE[k]; // redund measure -- can eliminate
    // measure of sharpness diff of 1st derivs btwn peak and SHM_UPDOWN dist from peak
    // to get 2nd deriv would be normalized by 2*SHM_UPDOWN*tstep
    // ??could take an ave. or max first deriv for certain distance on either side
    SHARP[k]=(src[i]-src[i-(int)SHM_UPDOWN])-(src[i+(int)SHM_UPDOWN]-src[i]);
    INDEX[k]=(double)k;
    k++;
  }
  
  int iNumBumps = k;

  //count # of other bumps nested within each bump
  if(!NOV_UPDOWN){
    for(i=0; i<iNumBumps; i++){
      NESTED[i] = 0;
      int j = 0;
      for(;j<iNumBumps;j++){
        if(i!=j && LOC[j] >= START[i] && LOC[j] <= START[i]+WIDTH[i]){
          NESTED[i]+=1.0;
        }
      }
    }
  } else for(i=0;i<iNumBumps;i++) NESTED[i]=0.0;

  //** finish up
  for (i=0;i<lc2;i++) vector_resize(vnq[i], nqsz);
  if (k!=nqsz) { printf("updown ERRI INT ERR: %d %d\n",k,nqsz); hxf(tmp); }
  free(tmp);
  return jj[0];
}


ENDVERBATIM

:* PROCEDURE install_updown()
PROCEDURE install_updown () {
  if (UPDOWN_INSTALLED==1) {
    printf("$Id: updown.mod,v 1.16 2009/02/16 22:56:52 billl Exp $\n")
  } else {
  UPDOWN_INSTALLED=1
  VERBATIM {
  install_vector_method("updown", updown);
  }
  ENDVERBATIM
  }
}

Chadderdon GL, Neymotin SA, Kerr CC, Lytton WW (2012) Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex PLoS ONE 2012 7(10):e47251

References and models cited by this paper

References and models that cite this paper

Almassy N, Edelman GM, Sporns O (1998) Behavioral constraints in the development of neuronal properties: a cortical model embedded in a real-world device. Cereb Cortex 8:346-61 [PubMed]

Baker SN, Kilner JM, Pinches EM, Lemon RN (1999) The role of synchrony and oscillations in the motor output. Exp Brain Res 128:109-17 [PubMed]

Bush G, Vogt BA, Holmes J, Dale AM, Greve D, Jenike MA, Rosen BR (2002) Dorsal anterior cingulate cortex: a role in reward-based decision making. Proc Natl Acad Sci U S A 99:523-8

Carnevale NT, Hines ML (2006) The NEURON Book

Chadderdon G (2009) A neurocomputational model of the functional role of dopamine in stimulus-response task learning and performance Ph.D. thesis, Indiana University [Journal]

Cools R (2006) Dopaminergic modulation of cognitive function-implications for L-DOPA treatment in Parkinson's disease. Neurosci Biobehav Rev 30:1-23 [PubMed]

Dan Y, Poo MM (2004) Spike timing-dependent plasticity of neural circuits. Neuron 44:23-30 [PubMed]

Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441:876-9 [PubMed]

Demiris Y, Dearden A (2005) From motor babbling to hierarchical learning by imitation: a robot developmental pathway From Animals To Animats

Der R, Martius G (2006) (2006) From motor babbling to purposive actions: Emerging self-exploration in a dynamical systems approach to early robot development From Animals to Animats 9:406-421

Edelman GM (1987) Neural Darwinism: The Theory of Neural Group Selection

Farries MA, Fairhall AL (2007) Reinforcement learning with modulated spike timing dependent synaptic plasticity. J Neurophysiol 98:3648-65 [PubMed]

Faure A, Haberland U, Conde F, El Massioui N (2005) Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. J Neurosci 25:2771-80 [PubMed]

Florian RV (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19:1468-502 [PubMed]

Frank MJ, O'reilly RC (2006) A mechanistic account of striatal dopamine function in human cognition: psychopharmacological studies with cabergoline and haloperidol. Behav Neurosci 120:497-517 [PubMed]

Frank MJ, Seeberger LC, O`Reilly RC (2004) By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306:1940-3 [Journal] [PubMed]

   Dynamic dopamine modulation in the basal ganglia: Learning in Parkinson (Frank et al 2004,2005) [Model]

Hecht-nielsen R (1989) Theory of the backpropagation neural network Neural Networks IJCNN., International Joint Conference on. IEEE :593-605

Hollerman JR, Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1:304-9 [PubMed]

Hosp JA, Pekanovic A, Rioult-Pedotti MS, Luft AR (2011) Dopaminergic projections from midbrain to primary motor cortex mediate motor skill learning. J Neurosci 31:2481-7 [PubMed]

Izhikevich EM (2007) Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling. Cereb Cortex 17(10):2443-2452 [Journal] [PubMed]

   Linking STDP and Dopamine action to solve the distal reward problem (Izhikevich 2007) [Model]

Joel D, Niv Y, Ruppin E (2005) Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw 15:535-47 [PubMed]

Kao MH, Doupe AJ, Brainard MS (2005) Contributions of an avian basal ganglia-forebrain circuit to real-time modulation of song. Nature 433:638-43 [PubMed]

Koechlin E, Hyafil A (2007) Anterior prefrontal function and the limits of human decision-making. Science 318:594-8 [PubMed]

Kohonen T (1990) The self-organizing map Proc IEEE 78:1464-1480

Luft AR, Schwarz S (2009) Dopaminergic signals in primary motor cortex. Int J Dev Neurosci 27:415-21 [PubMed]

Lytton WW, Neymotin SA, Hines ML (2008) The virtual slice setup. J Neurosci Methods 171:309-15 [Journal] [PubMed]

   The virtual slice setup (Lytton et al. 2008) [Model]

Lytton WW, Omurtag A (2007) Tonic-clonic transitions in computer simulation. J Clin Neurophysiol 24:175-81 [PubMed]

   Tonic-clonic transitions in a seizure simulation (Lytton and Omurtag 2007) [Model]

Lytton WW, Omurtag A, Neymotin SA, Hines ML (2008) Just in time connectivity for large spiking networks Neural Comput 20(11):2745-56 [Journal] [PubMed]

   JitCon: Just in time connectivity for large spiking networks (Lytton et al. 2008) [Model]

Lytton WW, Stewart M (2005) A rule-based firing model for neural networks Int J Bioelectromagn 7:47-50

Lytton WW, Stewart M (2006) Rule-based firing for network simulations. Neurocomputing 69:1160-1164

Magee JC, Johnston D (1995) Synaptic activation of voltage-gated channels in the dendrites of hippocampal pyramidal neurons. Science 268:301-4 [PubMed]

Molina-Luna K, Pekanovic A, Rohrich S, Hertler B, Schubring-Giese M, Rioult-Pedotti MS, Luft (2009) Dopamine in motor cortex is necessary for skill learning and synaptic plasticity. PLoS One 4:e7082-21 [PubMed]

Mufson EJ, Pandya DN (1984) Some observations on the course and composition of the cingulum bundle in the rhesus monkey. J Comp Neurol 225:31-43 [PubMed]

Neymotin SA, Lee H, Park E, Fenton AA, Lytton WW (2011) Emergence of physiological oscillation frequencies in a computer model of neocortex. Front Comput Neurosci 5:19-75 [Journal] [PubMed]

   Emergence of physiological oscillation frequencies in neocortex simulations (Neymotin et al. 2011) [Model]

O'Neill M, Brown V (2007) The effect of striatal dopamine depletion and the adenosine A2A antagonist KW-6002 on reversal learning in rats Neurobiology of Learning and Memory 88:75-81

Parkinson JA, Dalley JW, Cardinal RN, Bamford A, Fehnert B, Lachenal G, Rudarakanchana N, Hal (2002) Nucleus accumbens dopamine depletion impairs both acquisition and performance of appetitive Pavlovian approach behaviour: implications for mesoaccumbens dopamine function. Behav Brain Res 137:149-63 [PubMed]

Potjans W, Morrison A, Diesmann M (2009) A spiking neural network model of an actor-critic learning agent. Neural Comput 21:301-39 [PubMed]

Reynolds JN, Wickens JR (2005) Dopamine-dependent plasticity of corticostriatal synapses. Neural Netw 15:507-21 [PubMed]

Robbins TW, Giardini V, Jones GH, Reading P, Sahakian BJ (1990) Effects of dopamine depletion from the caudate-putamen and nucleus accumbens septi on the acquisition and performance of a conditional discrimination task. Behav Brain Res 38:243-61 [PubMed]

Roberts PD, Bell CC (2002) Spike timing dependent synaptic plasticity in biological systems. Biol Cybern 87:392-403 [PubMed]

Rumelhart D, Mccleland J (1986) Parallel Distributed Processing

Sanchez J, Tarigoppula A, Choi J, Marsh B, Chhatbar P (2011) Control of a center-out reaching task using a reinforcement learning brain-machine interface Neural Engineering (NER), 2011 5th International IEEE-EMBS Conference on. IEEE :525-528

Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80:1-27 [Journal] [PubMed]

Seung HS (2003) Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 40:1063-73 [PubMed]

Shen W, Flajolet M, Greengard P, Surmeier DJ (2008) Dichotomous dopaminergic control of striatal synaptic plasticity. Science 321:848-51 [PubMed]

Shima K, Tanji J (1998) Role for cingulate motor area cells in voluntary movement selection based on reward. Science 282:1335-8 [PubMed]

Singer W (2003) Synchronization, binding and expectancy The handbook of brain theory and neural networks, Arbib MA, ed. pp.1136

Smith-Roe SL, Kelley AE (2000) Coincident activation of NMDA and dopamine D1 receptors within the nucleus accumbens core is required for appetitive instrumental learning. J Neurosci 20:7737-42 [PubMed]

Sober SJ, Brainard MS (2009) Adult birdsong is actively maintained by error correction. Nat Neurosci 12:927-31 [PubMed]

Song S, Miller KD, Abbott LF (2000) Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nat Neurosci 3:919-26 [PubMed]

Sporns O, Alexander WH (2005) Neuromodulation and plasticity in an autonomous robot. Neural Netw 15:761-74 [PubMed]

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction [Journal]

   A reinforcement learning example (Sutton and Barto 1998) [Model]

Takechi H, Eilers J, Konnerth A (2000) A new class of synaptic response involving calcium release in dendritic spines. Nature 396:757-60 [PubMed]

Tesauro G (1995) Temporal difference learning and TD-Gammon Comm ACM 38:58-68

Thorndike E (1911) Animal intelligence

Thorpe S, Fize D, Marlot C (1996) Speed of processing in the human visual system. Nature 381:520-2 [PubMed]

Tumer EC, Brainard MS (2007) Performance variability enables adaptive plasticity of 'crystallized' adult birdsong. Nature 450:1240-4 [PubMed]

Ungless MA, Magill PJ, Bolam JP (2004) Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science 303:2040-2 [PubMed]

VanRullen R, Guyonneau R, Thorpe SJ (2005) Spike times make sense. Trends Neurosci 28:1-4 [PubMed]

Wanjerkhede SM, Bapi RS (2007) Modeling the sub-cellular signaling pathways involved in reinforcement learning at the striatum. Prog Brain Res 168:193-206 [PubMed]

Neymotin SA, Chadderdon GL, Kerr CC, Francis JT, Lytton WW (2013) Reinforcement learning of 2-joint virtual arm reaching in a computer model of sensorimotor cortex Neural Computation 25(12):3263-93 [Journal] [PubMed]

   Sensorimotor cortex reinforcement learning of 2-joint virtual arm reaching (Neymotin et al. 2013) [Model]

(61 refs)