van Otterlo, M.; Wiering, M. Reinforcement Learning and Markov Decision Processes. Reinforcement Learning. Adaptation, Learning, and Optimization 12. 2012: 3–42. ISBN 978-3-642-27644-6. doi:10.1007/978-3-642-27645-3_1.
Tokic, Michel; Palm, Günther, Value-Difference Based Exploration: Adaptive Control Between Epsilon-Greedy and Softmax, KI 2011: Advances in Artificial Intelligence(PDF), Lecture Notes in Computer Science 7006, Springer: 335–346, 2011 [2018-09-03], ISBN 978-3-642-24455-1, (原始内容存档(PDF)于2018-11-23)
Tokic, Michel; Palm, Günther, Value-Difference Based Exploration: Adaptive Control Between Epsilon-Greedy and Softmax, KI 2011: Advances in Artificial Intelligence(PDF), Lecture Notes in Computer Science 7006, Springer: 335–346, 2011 [2018-09-03], ISBN 978-3-642-24455-1, (原始内容存档(PDF)于2018-11-23)