Multi-armed bandit (English Wikipedia)

Analysis of information sources in references of the Wikipedia article "Multi-armed bandit" in English language version.

refsWebsite

Global rank English rank

31doi.org

2^nd place

22arxiv.org

69^th place

59^th place

16harvard.edu

18^th place

17^th place

13semanticscholar.org

11^th place

8^th place

10nih.gov

4^th place

9jmlr.org

low place

8web.archive.org

1^st place

6nips.cc

low place

7,050^th place

5psu.edu

207^th place

136^th place

3jstor.org

26^th place

20^th place

3mlr.press

low place

2worldcat.org

5^th place

2tokic.com

low place

1aaai.org

9,352^nd place

5,696^th place

1sourceforge.net

1,669^th place

1,290^th place

1ams.org

451^st place

277^th place

1auai.org

low place

1neurips.cc

low place

1scitepress.org

low place

1academia.edu

121^st place

142^nd place

1jair.org

low place

aaai.org

Shen, Weiwei; Wang, Jun; Jiang, Yu-Gang; Zha, Hongyuan (2015), "Portfolio Choices with Orthogonal Bandit Learning", Proceedings of International Joint Conferences on Artificial Intelligence (IJCAI2015), archived from the original on 2021-12-04, retrieved 2016-03-20

academia.edu

Gai, Y.; Krishnamachari, B.; Jain, R. (2010), "Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation", 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (PDF), pp. 1–9^{[dead link]}

ams.org

mathscinet.ams.org

Whittle, Peter (1988), "Restless bandits: Activity allocation in a changing world", Journal of Applied Probability, 25A: 287–298, doi:10.2307/3214163, JSTOR 3214163, MR 0974588, S2CID 202109695

arxiv.org

Bubeck, Sébastien (2012). "Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems". Foundations and Trends in Machine Learning. 5: 1–122. arXiv:1204.5721. doi:10.1561/2200000024.
Soare, Marta; Lazaric, Alessandro; Munos, Rémi (2014). "Best-Arm Identification in Linear Bandits". arXiv:1409.6110 [cs.LG].
Brochu, Eric; Hoffman, Matthew W.; de Freitas, Nando (September 2010). "Portfolio Allocation for Bayesian Optimization". arXiv:1009.5419 [cs.LG].
Aurelien Garivier; Emilie Kaufmann (2016). "Optimal Best Arm Identification with Fixed Confidence". arXiv:1602.04589 [math.ST].
Honda, J.; Takemura, A. (2011). "An asymptotically optimal policy for finite support models in the multi-armed bandit problem". Machine Learning. 85 (3): 361–391. arXiv:0905.2776. doi:10.1007/s10994-011-5257-4. S2CID 821462.
Lihong Li; Wei Chu; John Langford; Robert E. Schapire (2010), "A contextual-bandit approach to personalized news article recommendation", Proceedings of the 19th international conference on World wide web, pp. 661–670, arXiv:1003.0146, doi:10.1145/1772690.1772758, ISBN 9781605587998, S2CID 207178795
Rigollet, Philippe; Zeevi, Assaf (2010), Nonparametric Bandits with Covariates, Conference on Learning Theory, COLT 2010, arXiv:1003.1630, Bibcode:2010arXiv1003.1630R
Perchet, Vianney; Rigollet, Philippe (2013), "The multi-armed bandit problem with covariates", Annals of Statistics, 41 (2): 693–721, arXiv:1110.6084, doi:10.1214/13-aos1101, S2CID 14258665
Lihong Li; Yu Lu; Dengyong Zhou (2017), "Provably optimal algorithms for generalized linear contextual bandits", Proceedings of the 34th International Conference on Machine Learning: 2071–2080, arXiv:1703.00048, Bibcode:2017arXiv170300048L
Kwang-Sung Jun; Aniruddha Bhargava; Robert D. Nowak; Rebecca Willett (2017), "Scalable generalized linear bandits: Online computation and hashing", Advances in Neural Information Processing Systems, 30, Curran Associates: 99–109, arXiv:1706.00136, Bibcode:2017arXiv170600136J
Branislav Kveton; Manzil Zaheer; Csaba Szepesvári; Lihong Li; Mohammad Ghavamzadeh; Craig Boutilier (2020), "Randomized exploration in generalized linear bandits", Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), arXiv:1906.08947, Bibcode:2019arXiv190608947K
Michal Valko; Nathan Korda; Rémi Munos; Ilias Flaounas; Nello Cristianini (2013), Finite-Time Analysis of Kernelised Contextual Bandits, 29th Conference on Uncertainty in Artificial Intelligence (UAI 2013) and (JFPDA 2013)., arXiv:1309.6869, Bibcode:2013arXiv1309.6869V
Alekh Agarwal; Daniel J. Hsu; Satyen Kale; John Langford; Lihong Li; Robert E. Schapire (2014), "Taming the monster: A fast and simple algorithm for contextual bandits", Proceedings of the 31st International Conference on Machine Learning: 1638–1646, arXiv:1402.0555, Bibcode:2014arXiv1402.0555A
Wu, Huasen; Srikant, R.; Liu, Xin; Jiang, Chong (2015), "Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits", The 29th Annual Conference on Neural Information Processing Systems (NIPS), 28, Curran Associates: 433–441, arXiv:1504.06937, Bibcode:2015arXiv150406937W
Burtini, Giuseppe; Loeppky, Jason; Lawrence, Ramon (2015). "A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit". arXiv:1510.00757 [stat.ML].
Garivier, Aurélien; Moulines, Eric (2008). "On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems". arXiv:0805.3415 [math.ST].
Zoghi, Masrour; Karnin, Zohar S; Whiteson, Shimon; Rijke, Maarten D (2015), "Copeland Dueling Bandits", Advances in Neural Information Processing Systems, NIPS'15, arXiv:1506.00312, Bibcode:2015arXiv150600312Z
Wu, Huasen; Liu, Xin (2016), "Double Thompson Sampling for Dueling Bandits", The 30th Annual Conference on Neural Information Processing Systems (NIPS), arXiv:1604.07101, Bibcode:2016arXiv160407101W
Cesa-Bianchi, Nicolo; Gentile, Claudio; Zappella, Giovanni (2013), A Gang of Bandits, Advances in Neural Information Processing Systems 26, NIPS 2013, arXiv:1306.0811
Gentile, Claudio; Li, Shuai; Zappella, Giovanni (2014), "Online Clustering of Bandits", The 31st International Conference on Machine Learning, Journal of Machine Learning Research (ICML 2014), arXiv:1401.8257, Bibcode:2014arXiv1401.8257G
Li, Shuai; Alexandros, Karatzoglou; Gentile, Claudio (2016), "Collaborative Filtering Bandits", The 39th International ACM SIGIR Conference on Information Retrieval (SIGIR 2016), arXiv:1502.03473, Bibcode:2015arXiv150203473L
Santiago Ontañón (2017), "Combinatorial Multi-armed Bandits for Real-Time Strategy Games", Journal of Artificial Intelligence Research, 58: 665–702, arXiv:1710.04805, Bibcode:2017arXiv171004805O, doi:10.1613/jair.5398, S2CID 8517525

auai.org

Gimelfarb, Michel; Sanner, Scott; Lee, Chi-Guhn (2019), "ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning" (PDF), Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, p. 162.

doi.org

Auer, P.; Cesa-Bianchi, N.; Fischer, P. (2002). "Finite-time Analysis of the Multiarmed Bandit Problem". Machine Learning. 47 (2/3): 235–256. doi:10.1023/A:1013689704352.
Katehakis, Michael N.; Veinott, Jr., Arthur F. (1987). "The Multi-Armed Bandit Problem: Decomposition and Computation". Mathematics of Operations Research. 12 (2): 262–268. doi:10.1287/moor.12.2.262. S2CID 656323.
Bubeck, Sébastien (2012). "Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems". Foundations and Trends in Machine Learning. 5: 1–122. arXiv:1204.5721. doi:10.1561/2200000024.
Weber, Richard (1992), "On the Gittins index for multiarmed bandits", Annals of Applied Probability, 2 (4): 1024–1033, doi:10.1214/aoap/1177005588, JSTOR 2959678
Robbins, H. (1952). "Some aspects of the sequential design of experiments". Bulletin of the American Mathematical Society. 58 (5): 527–535. doi:10.1090/S0002-9904-1952-09620-8.
J. C. Gittins (1979). "Bandit Processes and Dynamic Allocation Indices". Journal of the Royal Statistical Society. Series B (Methodological). 41 (2): 148–177. doi:10.1111/j.2517-6161.1979.tb01068.x. JSTOR 2985029. S2CID 17724147.
Press, William H. (2009), "Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research", Proceedings of the National Academy of Sciences, 106 (52): 22387–22392, Bibcode:2009PNAS..10622387P, doi:10.1073/pnas.0912378106, PMC 2793317, PMID 20018711.
Farias, Vivek F; Ritesh, Madan (2011), "The irrevocable multiarmed bandit problem", Operations Research, 59 (2): 383–399, CiteSeerX 10.1.1.380.6983, doi:10.1287/opre.1100.0891
Whittle, Peter (1979), "Discussion of Dr Gittins' paper", Journal of the Royal Statistical Society, Series B, 41 (2): 148–177, doi:10.1111/j.2517-6161.1979.tb01069.x
Whittle, Peter (1988), "Restless bandits: Activity allocation in a changing world", Journal of Applied Probability, 25A: 287–298, doi:10.2307/3214163, JSTOR 3214163, MR 0974588, S2CID 202109695
Whittle, Peter (1981), "Arm-acquiring bandits", Annals of Probability, 9 (2): 284–292, doi:10.1214/aop/1176994469
Auer, P.; Cesa-Bianchi, N.; Freund, Y.; Schapire, R. E. (2002). "The Nonstochastic Multiarmed Bandit Problem". SIAM J. Comput. 32 (1): 48–77. CiteSeerX 10.1.1.130.158. doi:10.1137/S0097539701398375. S2CID 13209702.
Lai, T.L.; Robbins, H. (1985). "Asymptotically efficient adaptive allocation rules". Advances in Applied Mathematics. 6 (1): 4–22. doi:10.1016/0196-8858(85)90002-8.
Katehakis, M.N.; Robbins, H. (1995). "Sequential choice from several populations". Proceedings of the National Academy of Sciences of the United States of America. 92 (19): 8584–5. Bibcode:1995PNAS...92.8584K. doi:10.1073/pnas.92.19.8584. PMC 41010. PMID 11607577.
Burnetas, A.N.; Katehakis, M.N. (1996). "Optimal adaptive policies for sequential allocation problems". Advances in Applied Mathematics. 17 (2): 122–142. doi:10.1006/aama.1996.0007.
Burnetas, Apostolos N.; Katehakis, Michael N. (1997). "Optimal adaptive policies for Markov decision processes". Mathematics of Operations Research. 22 (1): 222–255. doi:10.1287/moor.22.1.222.
Ortner, R. (2010). "Online regret bounds for Markov decision processes with deterministic transitions". Theoretical Computer Science. 411 (29): 2684–2695. doi:10.1016/j.tcs.2010.04.005.
Honda, J.; Takemura, A. (2011). "An asymptotically optimal policy for finite support models in the multi-armed bandit problem". Machine Learning. 85 (3): 361–391. arXiv:0905.2776. doi:10.1007/s10994-011-5257-4. S2CID 821462.
Pilarski, Sebastian; Pilarski, Slawomir; Varró, Dániel (February 2021). "Optimal Policy for Bernoulli Bandits: Computation and Algorithm Gauge". IEEE Transactions on Artificial Intelligence. 2 (1): 2–17. doi:10.1109/TAI.2021.3074122. ISSN 2691-4581. S2CID 235475602.
Pilarski, Sebastian; Pilarski, Slawomir; Varro, Daniel (2021). "Delayed Reward Bernoulli Bandits: Optimal Policy and Predictive Meta-Algorithm PARDI". IEEE Transactions on Artificial Intelligence. 3 (2): 152–163. doi:10.1109/TAI.2021.3117743. ISSN 2691-4581. S2CID 247682940.
Averbeck, B.B. (2015). "Theory of choice in bandit, information sampling, and foraging tasks". PLOS Computational Biology. 11 (3): e1004164. Bibcode:2015PLSCB..11E4164A. doi:10.1371/journal.pcbi.1004164. PMC 4376795. PMID 25815510.
Costa, V.D.; Averbeck, B.B. (2019). "Subcortical Substrates of Explore-Exploit Decisions in Primates". Neuron. 103 (3): 533–535. doi:10.1016/j.neuron.2019.05.017. PMC 6687547. PMID 31196672.
Tokic, Michel (2010), "Adaptive ε-greedy exploration in reinforcement learning based on value differences" (PDF), KI 2010: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 6359, Springer-Verlag, pp. 203–210, CiteSeerX 10.1.1.458.464, doi:10.1007/978-3-642-16111-7_23, ISBN 978-3-642-16110-0.
Scott, S.L. (2010), "A modern Bayesian look at the multi-armed bandit", Applied Stochastic Models in Business and Industry, 26 (2): 639–658, doi:10.1002/asmb.874, S2CID 573750
Lihong Li; Wei Chu; John Langford; Robert E. Schapire (2010), "A contextual-bandit approach to personalized news article recommendation", Proceedings of the 19th international conference on World wide web, pp. 661–670, arXiv:1003.0146, doi:10.1145/1772690.1772758, ISBN 9781605587998, S2CID 207178795
Auer, P. (2000). "Using upper confidence bounds for online learning". Proceedings 41st Annual Symposium on Foundations of Computer Science. IEEE Comput. Soc. pp. 270–279. doi:10.1109/sfcs.2000.892116. ISBN 978-0769508504. S2CID 28713091.
Hong, Tzung-Pei; Song, Wei-Ping; Chiu, Chu-Tien (November 2011). "Evolutionary Composite Attribute Clustering". 2011 International Conference on Technologies and Applications of Artificial Intelligence. IEEE. pp. 305–308. doi:10.1109/taai.2011.59. ISBN 9781457721748. S2CID 14125100.
Perchet, Vianney; Rigollet, Philippe (2013), "The multi-armed bandit problem with covariates", Annals of Statistics, 41 (2): 693–721, arXiv:1110.6084, doi:10.1214/13-aos1101, S2CID 14258665
Cavenaghi, Emanuele; Sottocornola, Gabriele; Stella, Fabio; Zanker, Markus (2021). "Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm". Entropy. 23 (3): 380. Bibcode:2021Entrp..23..380C. doi:10.3390/e23030380. PMC 8004723. PMID 33807028.
Yue, Yisong; Broder, Josef; Kleinberg, Robert; Joachims, Thorsten (2012), "The K-armed dueling bandits problem", Journal of Computer and System Sciences, 78 (5): 1538–1556, CiteSeerX 10.1.1.162.2764, doi:10.1016/j.jcss.2011.12.028
Santiago Ontañón (2017), "Combinatorial Multi-armed Bandits for Real-Time Strategy Games", Journal of Artificial Intelligence Research, 58: 665–702, arXiv:1710.04805, Bibcode:2017arXiv171004805O, doi:10.1613/jair.5398, S2CID 8517525

harvard.edu

ui.adsabs.harvard.edu

Press, William H. (2009), "Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research", Proceedings of the National Academy of Sciences, 106 (52): 22387–22392, Bibcode:2009PNAS..10622387P, doi:10.1073/pnas.0912378106, PMC 2793317, PMID 20018711.
Katehakis, M.N.; Robbins, H. (1995). "Sequential choice from several populations". Proceedings of the National Academy of Sciences of the United States of America. 92 (19): 8584–5. Bibcode:1995PNAS...92.8584K. doi:10.1073/pnas.92.19.8584. PMC 41010. PMID 11607577.
Averbeck, B.B. (2015). "Theory of choice in bandit, information sampling, and foraging tasks". PLOS Computational Biology. 11 (3): e1004164. Bibcode:2015PLSCB..11E4164A. doi:10.1371/journal.pcbi.1004164. PMC 4376795. PMID 25815510.
Rigollet, Philippe; Zeevi, Assaf (2010), Nonparametric Bandits with Covariates, Conference on Learning Theory, COLT 2010, arXiv:1003.1630, Bibcode:2010arXiv1003.1630R
Lihong Li; Yu Lu; Dengyong Zhou (2017), "Provably optimal algorithms for generalized linear contextual bandits", Proceedings of the 34th International Conference on Machine Learning: 2071–2080, arXiv:1703.00048, Bibcode:2017arXiv170300048L
Kwang-Sung Jun; Aniruddha Bhargava; Robert D. Nowak; Rebecca Willett (2017), "Scalable generalized linear bandits: Online computation and hashing", Advances in Neural Information Processing Systems, 30, Curran Associates: 99–109, arXiv:1706.00136, Bibcode:2017arXiv170600136J
Branislav Kveton; Manzil Zaheer; Csaba Szepesvári; Lihong Li; Mohammad Ghavamzadeh; Craig Boutilier (2020), "Randomized exploration in generalized linear bandits", Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), arXiv:1906.08947, Bibcode:2019arXiv190608947K
Michal Valko; Nathan Korda; Rémi Munos; Ilias Flaounas; Nello Cristianini (2013), Finite-Time Analysis of Kernelised Contextual Bandits, 29th Conference on Uncertainty in Artificial Intelligence (UAI 2013) and (JFPDA 2013)., arXiv:1309.6869, Bibcode:2013arXiv1309.6869V
Alekh Agarwal; Daniel J. Hsu; Satyen Kale; John Langford; Lihong Li; Robert E. Schapire (2014), "Taming the monster: A fast and simple algorithm for contextual bandits", Proceedings of the 31st International Conference on Machine Learning: 1638–1646, arXiv:1402.0555, Bibcode:2014arXiv1402.0555A
Wu, Huasen; Srikant, R.; Liu, Xin; Jiang, Chong (2015), "Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits", The 29th Annual Conference on Neural Information Processing Systems (NIPS), 28, Curran Associates: 433–441, arXiv:1504.06937, Bibcode:2015arXiv150406937W
Cavenaghi, Emanuele; Sottocornola, Gabriele; Stella, Fabio; Zanker, Markus (2021). "Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm". Entropy. 23 (3): 380. Bibcode:2021Entrp..23..380C. doi:10.3390/e23030380. PMC 8004723. PMID 33807028.
Zoghi, Masrour; Karnin, Zohar S; Whiteson, Shimon; Rijke, Maarten D (2015), "Copeland Dueling Bandits", Advances in Neural Information Processing Systems, NIPS'15, arXiv:1506.00312, Bibcode:2015arXiv150600312Z
Wu, Huasen; Liu, Xin (2016), "Double Thompson Sampling for Dueling Bandits", The 30th Annual Conference on Neural Information Processing Systems (NIPS), arXiv:1604.07101, Bibcode:2016arXiv160407101W
Gentile, Claudio; Li, Shuai; Zappella, Giovanni (2014), "Online Clustering of Bandits", The 31st International Conference on Machine Learning, Journal of Machine Learning Research (ICML 2014), arXiv:1401.8257, Bibcode:2014arXiv1401.8257G
Li, Shuai; Alexandros, Karatzoglou; Gentile, Claudio (2016), "Collaborative Filtering Bandits", The 39th International ACM SIGIR Conference on Information Retrieval (SIGIR 2016), arXiv:1502.03473, Bibcode:2015arXiv150203473L
Santiago Ontañón (2017), "Combinatorial Multi-armed Bandits for Real-Time Strategy Games", Journal of Artificial Intelligence Research, 58: 665–702, arXiv:1710.04805, Bibcode:2017arXiv171004805O, doi:10.1613/jair.5398, S2CID 8517525

jair.org

Santiago Ontañón (2017), "Combinatorial Multi-armed Bandits for Real-Time Strategy Games", Journal of Artificial Intelligence Research, 58: 665–702, arXiv:1710.04805, Bibcode:2017arXiv171004805O, doi:10.1613/jair.5398, S2CID 8517525

jmlr.org

Slivkins, Aleksandrs (2011), Contextual bandits with similarity information. (PDF), Conference on Learning Theory, COLT 2011
Féraud, Raphaël; Allesiardo, Robin; Urvoy, Tanguy; Clérot, Fabrice (2016). "Random Forest for the Contextual Bandit Problem". Aistats: 93–101. Archived from the original on 2016-08-10. Retrieved 2016-06-10.
Badanidiyuru, A.; Langford, J.; Slivkins, A. (2014), "Resourceful contextual bandits" (PDF), Proceeding of Conference on Learning Theory (COLT)^{[permanent dead link]}
Hutter, M. and Poland, J., 2005. Adaptive online prediction by following the perturbed leader. Journal of Machine Learning Research, 6 (Apr), pp.639–660.
Urvoy, Tanguy; Clérot, Fabrice; Féraud, Raphaël; Naamane, Sami (2013), "Generic Exploration and K-armed Voting Bandits" (PDF), Proceedings of the 30th International Conference on Machine Learning (ICML-13), archived from the original (PDF) on 2016-10-02, retrieved 2016-04-29
Zoghi, Masrour; Whiteson, Shimon; Munos, Remi; Rijke, Maarten D (2014), "Relative Upper Confidence Bound for the $K$-Armed Dueling Bandit Problem" (PDF), Proceedings of the 31st International Conference on Machine Learning (ICML-14), archived from the original (PDF) on 2016-03-26, retrieved 2016-04-27
Gajane, Pratik; Urvoy, Tanguy; Clérot, Fabrice (2015), "A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits" (PDF), Proceedings of the 32nd International Conference on Machine Learning (ICML-15), archived from the original (PDF) on 2015-09-08, retrieved 2016-04-29
Komiyama, Junpei; Honda, Junya; Kashima, Hisashi; Nakagawa, Hiroshi (2015), "Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem" (PDF), Proceedings of the 28th Conference on Learning Theory, archived from the original (PDF) on 2016-06-17, retrieved 2016-04-27
Chen, Wei; Wang, Yajun; Yuan, Yang (2013), "Combinatorial multi-armed bandit: General framework and applications", Proceedings of the 30th International Conference on Machine Learning (ICML 2013) (PDF), pp. 151–159, archived from the original (PDF) on 2016-11-19, retrieved 2019-06-14

jstor.org

Weber, Richard (1992), "On the Gittins index for multiarmed bandits", Annals of Applied Probability, 2 (4): 1024–1033, doi:10.1214/aoap/1177005588, JSTOR 2959678
J. C. Gittins (1979). "Bandit Processes and Dynamic Allocation Indices". Journal of the Royal Statistical Society. Series B (Methodological). 41 (2): 148–177. doi:10.1111/j.2517-6161.1979.tb01068.x. JSTOR 2985029. S2CID 17724147.
Whittle, Peter (1988), "Restless bandits: Activity allocation in a changing world", Journal of Applied Probability, 25A: 287–298, doi:10.2307/3214163, JSTOR 3214163, MR 0974588, S2CID 202109695

mlr.press

proceedings.mlr.press

Wei Chu; Lihong Li; Lev Reyzin; Robert E. Schapire (2011), "Contextual bandits with linear payoff functions" (PDF), Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS): 208–214
Lihong Li; Yu Lu; Dengyong Zhou (2017), "Provably optimal algorithms for generalized linear contextual bandits", Proceedings of the 34th International Conference on Machine Learning: 2071–2080, arXiv:1703.00048, Bibcode:2017arXiv170300048L
Alekh Agarwal; Daniel J. Hsu; Satyen Kale; John Langford; Lihong Li; Robert E. Schapire (2014), "Taming the monster: A fast and simple algorithm for contextual bandits", Proceedings of the 31st International Conference on Machine Learning: 1638–1646, arXiv:1402.0555, Bibcode:2014arXiv1402.0555A

neurips.cc

proceedings.neurips.cc

Besbes, O.; Gur, Y.; Zeevi, A. Stochastic multi-armed-bandit problem with non-stationary rewards. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 199–207<https://proceedings.neurips.cc/paper/2014/file/903ce9225fca3e988c2af215d4e544d3-Paper.pdf>

nih.gov

ncbi.nlm.nih.gov

Press, William H. (2009), "Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research", Proceedings of the National Academy of Sciences, 106 (52): 22387–22392, Bibcode:2009PNAS..10622387P, doi:10.1073/pnas.0912378106, PMC 2793317, PMID 20018711.
Katehakis, M.N.; Robbins, H. (1995). "Sequential choice from several populations". Proceedings of the National Academy of Sciences of the United States of America. 92 (19): 8584–5. Bibcode:1995PNAS...92.8584K. doi:10.1073/pnas.92.19.8584. PMC 41010. PMID 11607577.
Averbeck, B.B. (2015). "Theory of choice in bandit, information sampling, and foraging tasks". PLOS Computational Biology. 11 (3): e1004164. Bibcode:2015PLSCB..11E4164A. doi:10.1371/journal.pcbi.1004164. PMC 4376795. PMID 25815510.
Costa, V.D.; Averbeck, B.B. (2019). "Subcortical Substrates of Explore-Exploit Decisions in Primates". Neuron. 103 (3): 533–535. doi:10.1016/j.neuron.2019.05.017. PMC 6687547. PMID 31196672.
Cavenaghi, Emanuele; Sottocornola, Gabriele; Stella, Fabio; Zanker, Markus (2021). "Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm". Entropy. 23 (3): 380. Bibcode:2021Entrp..23..380C. doi:10.3390/e23030380. PMC 8004723. PMID 33807028.

pubmed.ncbi.nlm.nih.gov

Press, William H. (2009), "Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research", Proceedings of the National Academy of Sciences, 106 (52): 22387–22392, Bibcode:2009PNAS..10622387P, doi:10.1073/pnas.0912378106, PMC 2793317, PMID 20018711.
Katehakis, M.N.; Robbins, H. (1995). "Sequential choice from several populations". Proceedings of the National Academy of Sciences of the United States of America. 92 (19): 8584–5. Bibcode:1995PNAS...92.8584K. doi:10.1073/pnas.92.19.8584. PMC 41010. PMID 11607577.
Averbeck, B.B. (2015). "Theory of choice in bandit, information sampling, and foraging tasks". PLOS Computational Biology. 11 (3): e1004164. Bibcode:2015PLSCB..11E4164A. doi:10.1371/journal.pcbi.1004164. PMC 4376795. PMID 25815510.
Costa, V.D.; Averbeck, B.B. (2019). "Subcortical Substrates of Explore-Exploit Decisions in Primates". Neuron. 103 (3): 533–535. doi:10.1016/j.neuron.2019.05.017. PMC 6687547. PMID 31196672.
Cavenaghi, Emanuele; Sottocornola, Gabriele; Stella, Fabio; Zanker, Markus (2021). "Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm". Entropy. 23 (3): 380. Bibcode:2021Entrp..23..380C. doi:10.3390/e23030380. PMC 8004723. PMID 33807028.

nips.cc

papers.nips.cc

Olivier Chapelle; Lihong Li (2011), "An empirical evaluation of Thompson sampling", Advances in Neural Information Processing Systems, 24, Curran Associates: 2249–2257
Langford, John; Zhang, Tong (2008), "The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits", Advances in Neural Information Processing Systems, vol. 20, Curran Associates, Inc., pp. 817–824
Sarah Filippi; Olivier Cappé; Aurélien Garivier; Csaba Szepesvári (2010), "Parametric Bandits: The Generalized Linear Case", Advances in Neural Information Processing Systems, 23, Curran Associates: 586–594
Kwang-Sung Jun; Aniruddha Bhargava; Robert D. Nowak; Rebecca Willett (2017), "Scalable generalized linear bandits: Online computation and hashing", Advances in Neural Information Processing Systems, 30, Curran Associates: 99–109, arXiv:1706.00136, Bibcode:2017arXiv170600136J
Wu, Huasen; Srikant, R.; Liu, Xin; Jiang, Chong (2015), "Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits", The 29th Annual Conference on Neural Information Processing Systems (NIPS), 28, Curran Associates: 433–441, arXiv:1504.06937, Bibcode:2015arXiv150406937W

books.nips.cc

Tewari, A.; Bartlett, P.L. (2008). "Optimistic linear programming gives logarithmic regret for irreducible MDPs" (PDF). Advances in Neural Information Processing Systems. 20. CiteSeerX 10.1.1.69.5482. Archived from the original (PDF) on 2012-05-25. Retrieved 2012-10-12.

psu.edu

citeseerx.ist.psu.edu

Farias, Vivek F; Ritesh, Madan (2011), "The irrevocable multiarmed bandit problem", Operations Research, 59 (2): 383–399, CiteSeerX 10.1.1.380.6983, doi:10.1287/opre.1100.0891
Auer, P.; Cesa-Bianchi, N.; Freund, Y.; Schapire, R. E. (2002). "The Nonstochastic Multiarmed Bandit Problem". SIAM J. Comput. 32 (1): 48–77. CiteSeerX 10.1.1.130.158. doi:10.1137/S0097539701398375. S2CID 13209702.
Tewari, A.; Bartlett, P.L. (2008). "Optimistic linear programming gives logarithmic regret for irreducible MDPs" (PDF). Advances in Neural Information Processing Systems. 20. CiteSeerX 10.1.1.69.5482. Archived from the original (PDF) on 2012-05-25. Retrieved 2012-10-12.
Tokic, Michel (2010), "Adaptive ε-greedy exploration in reinforcement learning based on value differences" (PDF), KI 2010: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 6359, Springer-Verlag, pp. 203–210, CiteSeerX 10.1.1.458.464, doi:10.1007/978-3-642-16111-7_23, ISBN 978-3-642-16110-0.
Yue, Yisong; Broder, Josef; Kleinberg, Robert; Joachims, Thorsten (2012), "The K-armed dueling bandits problem", Journal of Computer and System Sciences, 78 (5): 1538–1556, CiteSeerX 10.1.1.162.2764, doi:10.1016/j.jcss.2011.12.028

scitepress.org

Improving Online Marketing Experiments with Drifting Multi-armed Bandits, Giuseppe Burtini, Jason Loeppky, Ramon Lawrence, 2015 <http://www.scitepress.org/DigitalLibrary/PublicationsDetail.aspx?ID=Dx2xXEB0PJE=&t=1>

semanticscholar.org

api.semanticscholar.org

Katehakis, Michael N.; Veinott, Jr., Arthur F. (1987). "The Multi-Armed Bandit Problem: Decomposition and Computation". Mathematics of Operations Research. 12 (2): 262–268. doi:10.1287/moor.12.2.262. S2CID 656323.
J. C. Gittins (1979). "Bandit Processes and Dynamic Allocation Indices". Journal of the Royal Statistical Society. Series B (Methodological). 41 (2): 148–177. doi:10.1111/j.2517-6161.1979.tb01068.x. JSTOR 2985029. S2CID 17724147.
Whittle, Peter (1988), "Restless bandits: Activity allocation in a changing world", Journal of Applied Probability, 25A: 287–298, doi:10.2307/3214163, JSTOR 3214163, MR 0974588, S2CID 202109695
Auer, P.; Cesa-Bianchi, N.; Freund, Y.; Schapire, R. E. (2002). "The Nonstochastic Multiarmed Bandit Problem". SIAM J. Comput. 32 (1): 48–77. CiteSeerX 10.1.1.130.158. doi:10.1137/S0097539701398375. S2CID 13209702.
Honda, J.; Takemura, A. (2011). "An asymptotically optimal policy for finite support models in the multi-armed bandit problem". Machine Learning. 85 (3): 361–391. arXiv:0905.2776. doi:10.1007/s10994-011-5257-4. S2CID 821462.
Pilarski, Sebastian; Pilarski, Slawomir; Varró, Dániel (February 2021). "Optimal Policy for Bernoulli Bandits: Computation and Algorithm Gauge". IEEE Transactions on Artificial Intelligence. 2 (1): 2–17. doi:10.1109/TAI.2021.3074122. ISSN 2691-4581. S2CID 235475602.
Pilarski, Sebastian; Pilarski, Slawomir; Varro, Daniel (2021). "Delayed Reward Bernoulli Bandits: Optimal Policy and Predictive Meta-Algorithm PARDI". IEEE Transactions on Artificial Intelligence. 3 (2): 152–163. doi:10.1109/TAI.2021.3117743. ISSN 2691-4581. S2CID 247682940.
Scott, S.L. (2010), "A modern Bayesian look at the multi-armed bandit", Applied Stochastic Models in Business and Industry, 26 (2): 639–658, doi:10.1002/asmb.874, S2CID 573750
Lihong Li; Wei Chu; John Langford; Robert E. Schapire (2010), "A contextual-bandit approach to personalized news article recommendation", Proceedings of the 19th international conference on World wide web, pp. 661–670, arXiv:1003.0146, doi:10.1145/1772690.1772758, ISBN 9781605587998, S2CID 207178795
Auer, P. (2000). "Using upper confidence bounds for online learning". Proceedings 41st Annual Symposium on Foundations of Computer Science. IEEE Comput. Soc. pp. 270–279. doi:10.1109/sfcs.2000.892116. ISBN 978-0769508504. S2CID 28713091.
Hong, Tzung-Pei; Song, Wei-Ping; Chiu, Chu-Tien (November 2011). "Evolutionary Composite Attribute Clustering". 2011 International Conference on Technologies and Applications of Artificial Intelligence. IEEE. pp. 305–308. doi:10.1109/taai.2011.59. ISBN 9781457721748. S2CID 14125100.
Perchet, Vianney; Rigollet, Philippe (2013), "The multi-armed bandit problem with covariates", Annals of Statistics, 41 (2): 693–721, arXiv:1110.6084, doi:10.1214/13-aos1101, S2CID 14258665
Santiago Ontañón (2017), "Combinatorial Multi-armed Bandits for Real-Time Strategy Games", Journal of Artificial Intelligence Research, 58: 665–702, arXiv:1710.04805, Bibcode:2017arXiv171004805O, doi:10.1613/jair.5398, S2CID 8517525

sourceforge.net

bandit.sourceforge.net

Vermorel, Joannes; Mohri, Mehryar (2005), Multi-armed bandit algorithms and empirical evaluation (PDF), In European Conference on Machine Learning, Springer, pp. 437–448

tokic.com

Tokic, Michel (2010), "Adaptive ε-greedy exploration in reinforcement learning based on value differences" (PDF), KI 2010: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 6359, Springer-Verlag, pp. 203–210, CiteSeerX 10.1.1.458.464, doi:10.1007/978-3-642-16111-7_23, ISBN 978-3-642-16110-0.
Tokic, Michel; Palm, Günther (2011), "Value-Difference Based Exploration: Adaptive Control Between Epsilon-Greedy and Softmax" (PDF), KI 2011: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 7006, Springer-Verlag, pp. 335–346, ISBN 978-3-642-24455-1.

web.archive.org

Shen, Weiwei; Wang, Jun; Jiang, Yu-Gang; Zha, Hongyuan (2015), "Portfolio Choices with Orthogonal Bandit Learning", Proceedings of International Joint Conferences on Artificial Intelligence (IJCAI2015), archived from the original on 2021-12-04, retrieved 2016-03-20
Tewari, A.; Bartlett, P.L. (2008). "Optimistic linear programming gives logarithmic regret for irreducible MDPs" (PDF). Advances in Neural Information Processing Systems. 20. CiteSeerX 10.1.1.69.5482. Archived from the original (PDF) on 2012-05-25. Retrieved 2012-10-12.
Féraud, Raphaël; Allesiardo, Robin; Urvoy, Tanguy; Clérot, Fabrice (2016). "Random Forest for the Contextual Bandit Problem". Aistats: 93–101. Archived from the original on 2016-08-10. Retrieved 2016-06-10.
Urvoy, Tanguy; Clérot, Fabrice; Féraud, Raphaël; Naamane, Sami (2013), "Generic Exploration and K-armed Voting Bandits" (PDF), Proceedings of the 30th International Conference on Machine Learning (ICML-13), archived from the original (PDF) on 2016-10-02, retrieved 2016-04-29
Zoghi, Masrour; Whiteson, Shimon; Munos, Remi; Rijke, Maarten D (2014), "Relative Upper Confidence Bound for the $K$-Armed Dueling Bandit Problem" (PDF), Proceedings of the 31st International Conference on Machine Learning (ICML-14), archived from the original (PDF) on 2016-03-26, retrieved 2016-04-27
Gajane, Pratik; Urvoy, Tanguy; Clérot, Fabrice (2015), "A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits" (PDF), Proceedings of the 32nd International Conference on Machine Learning (ICML-15), archived from the original (PDF) on 2015-09-08, retrieved 2016-04-29
Komiyama, Junpei; Honda, Junya; Kashima, Hisashi; Nakagawa, Hiroshi (2015), "Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem" (PDF), Proceedings of the 28th Conference on Learning Theory, archived from the original (PDF) on 2016-06-17, retrieved 2016-04-27
Chen, Wei; Wang, Yajun; Yuan, Yang (2013), "Combinatorial multi-armed bandit: General framework and applications", Proceedings of the 30th International Conference on Machine Learning (ICML 2013) (PDF), pp. 151–159, archived from the original (PDF) on 2016-11-19, retrieved 2019-06-14

worldcat.org

search.worldcat.org

Pilarski, Sebastian; Pilarski, Slawomir; Varró, Dániel (February 2021). "Optimal Policy for Bernoulli Bandits: Computation and Algorithm Gauge". IEEE Transactions on Artificial Intelligence. 2 (1): 2–17. doi:10.1109/TAI.2021.3074122. ISSN 2691-4581. S2CID 235475602.
Pilarski, Sebastian; Pilarski, Slawomir; Varro, Daniel (2021). "Delayed Reward Bernoulli Bandits: Optimal Policy and Predictive Meta-Algorithm PARDI". IEEE Transactions on Artificial Intelligence. 3 (2): 152–163. doi:10.1109/TAI.2021.3117743. ISSN 2691-4581. S2CID 247682940.