Shen, Weiwei; Wang, Jun; Jiang, Yu-Gang; Zha, Hongyuan (2015), "Portfolio Choices with Orthogonal Bandit Learning", Proceedings of International Joint Conferences on Artificial Intelligence (IJCAI2015), archived from the original on 2021-12-04, retrieved 2016-03-20
Bubeck, Sébastien (2012). "Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems". Foundations and Trends in Machine Learning. 5: 1–122. arXiv:1204.5721. doi:10.1561/2200000024.
Soare, Marta; Lazaric, Alessandro; Munos, Rémi (2014). "Best-Arm Identification in Linear Bandits". arXiv:1409.6110 [cs.LG].
Brochu, Eric; Hoffman, Matthew W.; de Freitas, Nando (September 2010). "Portfolio Allocation for Bayesian Optimization". arXiv:1009.5419 [cs.LG].
Aurelien Garivier; Emilie Kaufmann (2016). "Optimal Best Arm Identification with Fixed Confidence". arXiv:1602.04589 [math.ST].
Honda, J.; Takemura, A. (2011). "An asymptotically optimal policy for finite support models in the multi-armed bandit problem". Machine Learning. 85 (3): 361–391. arXiv:0905.2776. doi:10.1007/s10994-011-5257-4. S2CID821462.
Lihong Li; Wei Chu; John Langford; Robert E. Schapire (2010), "A contextual-bandit approach to personalized news article recommendation", Proceedings of the 19th international conference on World wide web, pp. 661–670, arXiv:1003.0146, doi:10.1145/1772690.1772758, ISBN9781605587998, S2CID207178795
Rigollet, Philippe; Zeevi, Assaf (2010), Nonparametric Bandits with Covariates, Conference on Learning Theory, COLT 2010, arXiv:1003.1630, Bibcode:2010arXiv1003.1630R
Branislav Kveton; Manzil Zaheer; Csaba Szepesvári; Lihong Li; Mohammad Ghavamzadeh; Craig Boutilier (2020), "Randomized exploration in generalized linear bandits", Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), arXiv:1906.08947, Bibcode:2019arXiv190608947K
Michal Valko; Nathan Korda; Rémi Munos; Ilias Flaounas; Nello Cristianini (2013), Finite-Time Analysis of Kernelised Contextual Bandits, 29th Conference on Uncertainty in Artificial Intelligence (UAI 2013) and (JFPDA 2013)., arXiv:1309.6869, Bibcode:2013arXiv1309.6869V
Burtini, Giuseppe; Loeppky, Jason; Lawrence, Ramon (2015). "A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit". arXiv:1510.00757 [stat.ML].
Garivier, Aurélien; Moulines, Eric (2008). "On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems". arXiv:0805.3415 [math.ST].
Zoghi, Masrour; Karnin, Zohar S; Whiteson, Shimon; Rijke, Maarten D (2015), "Copeland Dueling Bandits", Advances in Neural Information Processing Systems, NIPS'15, arXiv:1506.00312, Bibcode:2015arXiv150600312Z
Wu, Huasen; Liu, Xin (2016), "Double Thompson Sampling for Dueling Bandits", The 30th Annual Conference on Neural Information Processing Systems (NIPS), arXiv:1604.07101, Bibcode:2016arXiv160407101W
Cesa-Bianchi, Nicolo; Gentile, Claudio; Zappella, Giovanni (2013), A Gang of Bandits, Advances in Neural Information Processing Systems 26, NIPS 2013, arXiv:1306.0811
Gentile, Claudio; Li, Shuai; Zappella, Giovanni (2014), "Online Clustering of Bandits", The 31st International Conference on Machine Learning, Journal of Machine Learning Research (ICML 2014), arXiv:1401.8257, Bibcode:2014arXiv1401.8257G
Li, Shuai; Alexandros, Karatzoglou; Gentile, Claudio (2016), "Collaborative Filtering Bandits", The 39th International ACM SIGIR Conference on Information Retrieval (SIGIR 2016), arXiv:1502.03473, Bibcode:2015arXiv150203473L
Press, William H. (2009), "Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research", Proceedings of the National Academy of Sciences, 106 (52): 22387–22392, Bibcode:2009PNAS..10622387P, doi:10.1073/pnas.0912378106, PMC2793317, PMID20018711.
Scott, S.L. (2010), "A modern Bayesian look at the multi-armed bandit", Applied Stochastic Models in Business and Industry, 26 (2): 639–658, doi:10.1002/asmb.874, S2CID573750
Lihong Li; Wei Chu; John Langford; Robert E. Schapire (2010), "A contextual-bandit approach to personalized news article recommendation", Proceedings of the 19th international conference on World wide web, pp. 661–670, arXiv:1003.0146, doi:10.1145/1772690.1772758, ISBN9781605587998, S2CID207178795
Auer, P. (2000). "Using upper confidence bounds for online learning". Proceedings 41st Annual Symposium on Foundations of Computer Science. IEEE Comput. Soc. pp. 270–279. doi:10.1109/sfcs.2000.892116. ISBN978-0769508504. S2CID28713091.
Hong, Tzung-Pei; Song, Wei-Ping; Chiu, Chu-Tien (November 2011). "Evolutionary Composite Attribute Clustering". 2011 International Conference on Technologies and Applications of Artificial Intelligence. IEEE. pp. 305–308. doi:10.1109/taai.2011.59. ISBN9781457721748. S2CID14125100.
Press, William H. (2009), "Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research", Proceedings of the National Academy of Sciences, 106 (52): 22387–22392, Bibcode:2009PNAS..10622387P, doi:10.1073/pnas.0912378106, PMC2793317, PMID20018711.
Branislav Kveton; Manzil Zaheer; Csaba Szepesvári; Lihong Li; Mohammad Ghavamzadeh; Craig Boutilier (2020), "Randomized exploration in generalized linear bandits", Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), arXiv:1906.08947, Bibcode:2019arXiv190608947K
Michal Valko; Nathan Korda; Rémi Munos; Ilias Flaounas; Nello Cristianini (2013), Finite-Time Analysis of Kernelised Contextual Bandits, 29th Conference on Uncertainty in Artificial Intelligence (UAI 2013) and (JFPDA 2013)., arXiv:1309.6869, Bibcode:2013arXiv1309.6869V
Zoghi, Masrour; Karnin, Zohar S; Whiteson, Shimon; Rijke, Maarten D (2015), "Copeland Dueling Bandits", Advances in Neural Information Processing Systems, NIPS'15, arXiv:1506.00312, Bibcode:2015arXiv150600312Z
Wu, Huasen; Liu, Xin (2016), "Double Thompson Sampling for Dueling Bandits", The 30th Annual Conference on Neural Information Processing Systems (NIPS), arXiv:1604.07101, Bibcode:2016arXiv160407101W
Gentile, Claudio; Li, Shuai; Zappella, Giovanni (2014), "Online Clustering of Bandits", The 31st International Conference on Machine Learning, Journal of Machine Learning Research (ICML 2014), arXiv:1401.8257, Bibcode:2014arXiv1401.8257G
Li, Shuai; Alexandros, Karatzoglou; Gentile, Claudio (2016), "Collaborative Filtering Bandits", The 39th International ACM SIGIR Conference on Information Retrieval (SIGIR 2016), arXiv:1502.03473, Bibcode:2015arXiv150203473L
Urvoy, Tanguy; Clérot, Fabrice; Féraud, Raphaël; Naamane, Sami (2013), "Generic Exploration and K-armed Voting Bandits"(PDF), Proceedings of the 30th International Conference on Machine Learning (ICML-13), archived from the original(PDF) on 2016-10-02, retrieved 2016-04-29
Wei Chu; Lihong Li; Lev Reyzin; Robert E. Schapire (2011), "Contextual bandits with linear payoff functions"(PDF), Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS): 208–214
Press, William H. (2009), "Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research", Proceedings of the National Academy of Sciences, 106 (52): 22387–22392, Bibcode:2009PNAS..10622387P, doi:10.1073/pnas.0912378106, PMC2793317, PMID20018711.
Press, William H. (2009), "Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research", Proceedings of the National Academy of Sciences, 106 (52): 22387–22392, Bibcode:2009PNAS..10622387P, doi:10.1073/pnas.0912378106, PMC2793317, PMID20018711.
Sarah Filippi; Olivier Cappé; Aurélien Garivier; Csaba Szepesvári (2010), "Parametric Bandits: The Generalized Linear Case", Advances in Neural Information Processing Systems, 23, Curran Associates: 586–594
Scott, S.L. (2010), "A modern Bayesian look at the multi-armed bandit", Applied Stochastic Models in Business and Industry, 26 (2): 639–658, doi:10.1002/asmb.874, S2CID573750
Lihong Li; Wei Chu; John Langford; Robert E. Schapire (2010), "A contextual-bandit approach to personalized news article recommendation", Proceedings of the 19th international conference on World wide web, pp. 661–670, arXiv:1003.0146, doi:10.1145/1772690.1772758, ISBN9781605587998, S2CID207178795
Auer, P. (2000). "Using upper confidence bounds for online learning". Proceedings 41st Annual Symposium on Foundations of Computer Science. IEEE Comput. Soc. pp. 270–279. doi:10.1109/sfcs.2000.892116. ISBN978-0769508504. S2CID28713091.
Hong, Tzung-Pei; Song, Wei-Ping; Chiu, Chu-Tien (November 2011). "Evolutionary Composite Attribute Clustering". 2011 International Conference on Technologies and Applications of Artificial Intelligence. IEEE. pp. 305–308. doi:10.1109/taai.2011.59. ISBN9781457721748. S2CID14125100.
Shen, Weiwei; Wang, Jun; Jiang, Yu-Gang; Zha, Hongyuan (2015), "Portfolio Choices with Orthogonal Bandit Learning", Proceedings of International Joint Conferences on Artificial Intelligence (IJCAI2015), archived from the original on 2021-12-04, retrieved 2016-03-20
Urvoy, Tanguy; Clérot, Fabrice; Féraud, Raphaël; Naamane, Sami (2013), "Generic Exploration and K-armed Voting Bandits"(PDF), Proceedings of the 30th International Conference on Machine Learning (ICML-13), archived from the original(PDF) on 2016-10-02, retrieved 2016-04-29