Citation Relationships

Legends: Link to a Model Reference cited by multiple papers


Fujita H, Ishii S (2007) Model-based reinforcement learning for partially observable games with sampling-based state estimation. Neural Comput 19:3051-87 [PubMed]

References and models cited by this paper

References and models that cite this paper

Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning Discrete Event Dynamic Systems 13:341-379
Barto AG, Sutton RS, Anderson CW (1983) Neuronlike elements that can solve difficult learning control problems IEEE Trans Systems Man Cybern 13:835-846
Boutilier C, Poole D (1996) Computing optimal policies for partially observable decision processes using compact representations Proc 13th Natl Conf Art Intel :9-16
Bowling M, Veloso M (2000) An analysis of stochastic game theory for multiagent reinforcement learning Tech Rep No CS-00-165 Carnegie Mellon
Bradtke S, Barto A (1996) Linear least-squares algorithms for temporal difference learning Mach Learn 22:33-57
Brafman RI (1997) A heuristic variable grid solution method for POMDPs Proc 14th Natl Conf Art Intel :727-733
Chang YH, Ho T, Kaelbling LP (2003) All learning is local: Multi-agent learning in global reward games Advances in neural information processing systems, Thrun S:Saul LK:Scholkopf B, ed. pp.807
Chrisman L (1992) Reinforcement learning with perceptual aliasing: The perceptual distinctions approach Proceedings of the Tenth National Conference on Artificial Intelligence :183-188
Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems Proc 15th Natl Conf Art Intel :746-752
Crites RH, Barto AG (1996) Elevator group control using multiple reinforcement learning agents Mach Learn 33:235-262
Dahl FA (2002) The lagging anchor algorithm: Reinforcement learning in two player zero-sum games with imperfect information Mach Learn 49:5-37
Dayan P, Schraudolph NN, Sejnowski TJ (2001) Learning to evaluate go positions via temporal difference methods (Tech. Rep.)
Doya K, Samejima K, Katagiri K, Kawato M (2002) Multiple model-based reinforcement learning. Neural Comput 14:1347-69 [Journal] [PubMed]
Emery-montemerlo R, Gordon G, Schneider J (2004) Approximate solutions for partially observable stochastic games with common payoffs Proc 3rd Intl Joint Conf Autonomous Agents and Multi-Agent Systems :136-143
Freeverse Software (2004) 3D Hearts deluxe
Gilks WR, Richardson S, Spiegelhalter DJ (1996) Markov chain Monte Carlo in practice
Hansen EA, Bernstein DS, Zilberstein S (2004) Dynamic programming for partially observable stochastic games Proc 19th Natl Conf Art Intel :709-715
Hauskrecht M (2000) Value-function approximations for partially observable Markov decision processes J Artif Intell Res 13:33-94
Hu J, Wellman MP (2003) Nash Q-learning for general-sum stochastic games J Mach Learn Res 4:1039-1069
Ishii S, Fujita H, Mitsutake M, Yamazaki T, Matsuda J, Matsuno Y (2005) A reinforcement learning scheme for a partially-observable multi-agent game Mach Learn 59:31-54
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains Art Intell 101:99-134
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey J Art Intell Res 4:237-285
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning Proceedings of the Eleventh International Conference on Machine Learning :157-163
Littman ML, Cassandra AR, Kaelbling LP (1995) Learning policies for partially observable environments: Scaling up Proc 12th Intl Conf Mach Learn :363-370
Littman ML, Majercik SM (1997) Large-scale planning under uncertainty: A survey Paper presented at the NASA Workshop on Planning and Scheduling in Space
Loch J, Singh SP (1998) Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes Proc 15th Intl Conf Mach Learn :323-331
Mccallum A (1993) Overcoming incomplete perception with Util distinction memory Proc 10th Intl Conf Mach Learn :190-196
Meuleau N, Peshkin L, Kim KE, Kaelbling LP (2000) Learning finite state controllers for partially observable environments Proc 15th Ann Conf Uncertainty in Artificial Intelligence :427-436
Moore AW, Atkeson CG (1993) Prioritized sweeping: Reinforcement learning with less data and less real time Mach Learn 13:103-130
Mori T, Nakamura Y, Ishii S (2004) Reinforcement learning for CPG-driven biped robot Proc 19th Natl Conf Art Intel :623-630
Morimoto J, Doya K (2001) Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning Robotics And Autonomous Systems 36:37-51
Nair R, Marsella S, Tambe M, Pynadath D, Yokoo M (2003) Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings Proc 18th Intl Joint Conf Art Intel :705-711
Nikovski D, Nourbakhsh I (2000) Learning probabilistic models for decision theoretic navigation of mobile robots Proc 17th Intl Conf Mach Learn :671-678
Perkins T (1998) Two search techniques for imperfect information games and application to hearts (Tech. Rep.)
Peshkin L, Meuleau N, Kaelbling LP (1999) Learning policies with external memory Proc 16th Intl Conf Mach Learn :307-314
Pfahringer B, Kaindl H, Kramer S, Furnkranz J (1999) Learning to make good use of operational advice Paper presented at the International Conference on Machine Learning, Workshop on Machine Learning in Game Playing
Pineau J, Gordon G, Thrun S (2003) Point-based value iteration: An anytime algorithm for POMDPs Proc 18th Intl Joint Conf Art Intel :1025-1032
Sato M, Ishii S (2000) On-line EM algorithm for the normalized gaussian network. Neural Comput 12:407-32 [PubMed]
Shani G (2004) A survey of model-based and model-free methods for resolving perceptual aliasing Tech Rep Department of Computer Science Ben-Gurion University
Shoham Y, Powers R, Grenager T (2004) Multi-agent reinforcement learning: A critical survey Paper presented at the Proceedings of AAAI Fall Symposium on Artificial Multi-Agent Learning
Singh SP, Bertsekas D (1996) Reinforcement learning for dynamic channel allocation in cellular telephone systems Advances In Neural Information Processing Systems 9:974-980
Singh SP, Jaakkola T, Jordan MI (1994) Learning without state-estimation in partially observable Markovian decision processes Proc 11th Intl Conf Mach Learn :284-292
Smallwood RD, Sondik EJ (1973) The optimal control of partially observable processes over a finite horizon Operations Res 21:1071-1088
Stone P, Veloso MM (2000) Multiagent systems: A survey from a machine learning perspective Auto Rob 8:345-383
Sturtevant NR (2003) Multi-player games: Algorithms and approaches Unpublished doctoral dissertation, University of California, Los Angeles
Sturtevant NR, White AM (2006) Feature construction for reinforcement learning in hearts Proc 5th Intl Conf Learn Games
Suematsu N, Hayashi A (2002) A multiagent reinforcement learning algorithm using extended optimal response Proc 1st Intl Joint Conf Auto Agents and Multi-Agent Systems :370-377
Sutton RS (1988) Learning to predict by the method of temporal diferences Machine Learning 3:9-44
Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming Proceedings of the Seventh International Conference on Machine Learning :216-224
Tesauro G (1994) TD-gammon, a self-teaching backgammon program, achieves master-level play Neural Comput 6:215-219
Theocharous G, Mahadevan S (2002) Approximate planning with hierarchical partially observable Markov decision process models for robot navigation Proc IEEE Intl Conf Robots and Automation :1374-1352
Thrun S (2000) Monte Carlo POMDPs Advances in neural information processing systems, Solla SA:Leen TK:Muller KR, ed. pp.1064
Wang X, Sandholm T (2003) Reinforcement learning to play an optimal Nash equilibrium in team Markov games Advances in neural information processing systems, Beaker S:Thrun S:Obermayer K, ed. pp.554
Watkins C, Dayan P (1992) Q-learning Mach Learn 8:279-292
Whitehead SD, Lin LJ (1995) Reinforcement learning of non-Markov decision processes Artificial Intel 73:271-306
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning Mach Learn 8:229-256
Yoshimoto J, Ishii S, Sato M (2003) System identification based on on-line variational Bayes method and its application to reinforcement learning Proc Intl Conf Art Neural Netw Neural Inform Process 2714:123-131
(59 refs)