Tokic, Michel; Palm, Günther, Value-Difference Based Exploration: Adaptive Control Between Epsilon-Greedy and Softmax, KI 2011: Advances in Artificial Intelligence(PDF), Lecture Notes in Computer Science 7006, Springer: 335–346, 2011 [2018-09-03], ISBN 978-3-642-24455-1, (原始内容存档(PDF)于2018-11-23)
Tokic, Michel; Palm, Günther, Value-Difference Based Exploration: Adaptive Control Between Epsilon-Greedy and Softmax, KI 2011: Advances in Artificial Intelligence(PDF), Lecture Notes in Computer Science 7006, Springer: 335–346, 2011 [2018-09-03], ISBN 978-3-642-24455-1, (原始内容存档(PDF)于2018-11-23)