Citation Relationships

Fujita H, Ishii S (2007) Model-based reinforcement learning for partially observable games with sampling-based state estimation. Neural Comput 19:3051-87 [PubMed]

References and models cited by this paper

References and models that cite this paper

Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning Discrete Event Dynamic Systems 13:341-379

Barto AG, Sutton RS, Anderson CW (1983) Neuronlike elements that can solve difficult learning control problems IEEE Trans Systems Man Cybern 13:835-846

Boutilier C, Poole D (1996) Computing optimal policies for partially observable decision processes using compact representations Proc 13th Natl Conf Art Intel :9-16

Bowling M, Veloso M (2000) An analysis of stochastic game theory for multiagent reinforcement learning Tech Rep No CS-00-165 Carnegie Mellon

Bradtke S, Barto A (1996) Linear least-squares algorithms for temporal difference learning Mach Learn 22:33-57

Brafman RI (1997) A heuristic variable grid solution method for POMDPs Proc 14th Natl Conf Art Intel :727-733

Chang YH, Ho T, Kaelbling LP (2003) All learning is local: Multi-agent learning in global reward games Advances in neural information processing systems, Thrun S:Saul LK:Scholkopf B, ed. pp.807

Chrisman L (1992) Reinforcement learning with perceptual aliasing: The perceptual distinctions approach Proceedings of the Tenth National Conference on Artificial Intelligence :183-188

Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems Proc 15th Natl Conf Art Intel :746-752

Crites RH, Barto AG (1996) Elevator group control using multiple reinforcement learning agents Mach Learn 33:235-262

Dahl FA (2002) The lagging anchor algorithm: Reinforcement learning in two player zero-sum games with imperfect information Mach Learn 49:5-37

Dayan P, Schraudolph NN, Sejnowski TJ (2001) Learning to evaluate go positions via temporal difference methods (Tech. Rep.)

Doya K, Samejima K, Katagiri K, Kawato M (2002) Multiple model-based reinforcement learning. Neural Comput 14:1347-69 [Journal] [PubMed]

Emery-montemerlo R, Gordon G, Schneider J (2004) Approximate solutions for partially observable stochastic games with common payoffs Proc 3rd Intl Joint Conf Autonomous Agents and Multi-Agent Systems :136-143

Freeverse Software (2004) 3D Hearts deluxe

Fudenberg D, Tirole J (1991) Game Theory

Gilks WR, Richardson S, Spiegelhalter DJ (1996) Markov chain Monte Carlo in practice

Hansen EA, Bernstein DS, Zilberstein S (2004) Dynamic programming for partially observable stochastic games Proc 19th Natl Conf Art Intel :709-715

Hauskrecht M (2000) Value-function approximations for partially observable Markov decision processes J Artif Intell Res 13:33-94

Hu J, Wellman MP (2003) Nash Q-learning for general-sum stochastic games J Mach Learn Res 4:1039-1069

Ishii S, Fujita H, Mitsutake M, Yamazaki T, Matsuda J, Matsuno Y (2005) A reinforcement learning scheme for a partially-observable multi-agent game Mach Learn 59:31-54

Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains Art Intell 101:99-134

Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey J Art Intell Res 4:237-285

Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning Proceedings of the Eleventh International Conference on Machine Learning :157-163

Littman ML, Cassandra AR, Kaelbling LP (1995) Learning policies for partially observable environments: Scaling up Proc 12th Intl Conf Mach Learn :363-370

Littman ML, Majercik SM (1997) Large-scale planning under uncertainty: A survey Paper presented at the NASA Workshop on Planning and Scheduling in Space

Loch J, Singh SP (1998) Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes Proc 15th Intl Conf Mach Learn :323-331

Mccallum A (1993) Overcoming incomplete perception with Util distinction memory Proc 10th Intl Conf Mach Learn :190-196

Meuleau N, Peshkin L, Kim KE, Kaelbling LP (2000) Learning finite state controllers for partially observable environments Proc 15th Ann Conf Uncertainty in Artificial Intelligence :427-436

Moore AW, Atkeson CG (1993) Prioritized sweeping: Reinforcement learning with less data and less real time Mach Learn 13:103-130

Mori T, Nakamura Y, Ishii S (2004) Reinforcement learning for CPG-driven biped robot Proc 19th Natl Conf Art Intel :623-630

Morimoto J, Doya K (2001) Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning Robotics And Autonomous Systems 36:37-51

Nair R, Marsella S, Tambe M, Pynadath D, Yokoo M (2003) Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings Proc 18th Intl Joint Conf Art Intel :705-711

Nikovski D, Nourbakhsh I (2000) Learning probabilistic models for decision theoretic navigation of mobile robots Proc 17th Intl Conf Mach Learn :671-678

Perkins T (1998) Two search techniques for imperfect information games and application to hearts (Tech. Rep.)

Peshkin L, Meuleau N, Kaelbling LP (1999) Learning policies with external memory Proc 16th Intl Conf Mach Learn :307-314

Pfahringer B, Kaindl H, Kramer S, Furnkranz J (1999) Learning to make good use of operational advice Paper presented at the International Conference on Machine Learning, Workshop on Machine Learning in Game Playing

Pineau J, Gordon G, Thrun S (2003) Point-based value iteration: An anytime algorithm for POMDPs Proc 18th Intl Joint Conf Art Intel :1025-1032

Sato M, Ishii S (2000) On-line EM algorithm for the normalized gaussian network. Neural Comput 12:407-32 [PubMed]

Shani G (2004) A survey of model-based and model-free methods for resolving perceptual aliasing Tech Rep Department of Computer Science Ben-Gurion University

Shoham Y, Powers R, Grenager T (2004) Multi-agent reinforcement learning: A critical survey Paper presented at the Proceedings of AAAI Fall Symposium on Artificial Multi-Agent Learning

Singh SP, Bertsekas D (1996) Reinforcement learning for dynamic channel allocation in cellular telephone systems Advances In Neural Information Processing Systems 9:974-980

Singh SP, Jaakkola T, Jordan MI (1994) Learning without state-estimation in partially observable Markovian decision processes Proc 11th Intl Conf Mach Learn :284-292

Smallwood RD, Sondik EJ (1973) The optimal control of partially observable processes over a finite horizon Operations Res 21:1071-1088

Stone P, Veloso MM (2000) Multiagent systems: A survey from a machine learning perspective Auto Rob 8:345-383

Sturtevant NR (2003) Multi-player games: Algorithms and approaches Unpublished doctoral dissertation, University of California, Los Angeles

Sturtevant NR, White AM (2006) Feature construction for reinforcement learning in hearts Proc 5th Intl Conf Learn Games

Suematsu N, Hayashi A (2002) A multiagent reinforcement learning algorithm using extended optimal response Proc 1st Intl Joint Conf Auto Agents and Multi-Agent Systems :370-377

Sutton RS (1988) Learning to predict by the method of temporal diferences Machine Learning 3:9-44

Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming Proceedings of the Seventh International Conference on Machine Learning :216-224

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction [Journal]

   A reinforcement learning example (Sutton and Barto 1998) [Model]

Tesauro G (1994) TD-gammon, a self-teaching backgammon program, achieves master-level play Neural Comput 6:215-219

Theocharous G, Mahadevan S (2002) Approximate planning with hierarchical partially observable Markov decision process models for robot navigation Proc IEEE Intl Conf Robots and Automation :1374-1352

Thrun S (2000) Monte Carlo POMDPs Advances in neural information processing systems, Solla SA:Leen TK:Muller KR, ed. pp.1064

Wang X, Sandholm T (2003) Reinforcement learning to play an optimal Nash equilibrium in team Markov games Advances in neural information processing systems, Beaker S:Thrun S:Obermayer K, ed. pp.554

Watkins C, Dayan P (1992) Q-learning Mach Learn 8:279-292

Whitehead SD, Lin LJ (1995) Reinforcement learning of non-Markov decision processes Artificial Intel 73:271-306

Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning Mach Learn 8:229-256

Yoshimoto J, Ishii S, Sato M (2003) System identification based on on-line variational Bayes method and its application to reinforcement learning Proc Intl Conf Art Neural Netw Neural Inform Process 2714:123-131

(59 refs)