参考文献

1: Watkins, C. J. C. H.: Learning from delayed rewards, PhD Thesis, Cambridge University (1989).
2: 銅谷賢治: 強化学習 Reinforcement Learning, 日本神経回路学会第7回全国大会講演論文集, pp. 158--162 (1996).
3: Sutton, R. S.: Learning to Predict by the Methods of Temporal Differences, Machine Learning, Vol. 3, pp. 9--44 (1988).
4: 浅田, 野田, 俵積田, 細田: 視覚に基づく強化学習によるロボットの行動学習, 日本ロボット学会誌, Vol. 13, pp. 68--74 (1995).
5: Barto, A. G.: Adaptive critics and the basal ganglia, Models of Information processing in the Basal Ganglia, pp. 215--232 (1994).

Q-learning により学習を行う Tic-Tac-Toe プログラム