Reinforcement learning from human feedback (English Wikipedia)

Analysis of information sources in references of the Wikipedia article "Reinforcement learning from human feedback" in English language version.

refsWebsite

Global rank English rank

27arxiv.org

69^th place

59^th place

6openreview.net

low place

4doi.org

2^nd place

3mlr.press

low place

3neurips.cc

low place

2openai.com

1,559^th place

1,155^th place

2springer.com

274^th place

309^th place

2nips.cc

low place

7,050^th place

2arstechnica.com

388^th place

265^th place

2venturebeat.com

616^th place

430^th place

2technologyreview.com

1,943^rd place

1,253^rd place

2deepmind.com

low place

2nih.gov

4^th place

1huggingface.co

low place

1semanticscholar.org

11^th place

8^th place

1acm.org

1,185^th place

840^th place

1techcrunch.com

187^th place

146^th place

1blog.google

2,218^th place

1,391^st place

1time.com

61^st place

54^th place

1alignmentforum.org

low place

acm.org

dl.acm.org

MacGlashan, James; Ho, Mark K.; Loftin, Robert; Peng, Bei; Wang, Guan; Roberts, David L.; Taylor, Matthew E.; Littman, Michael L. (6 August 2017). "Interactive learning from policy-dependent human feedback". Proceedings of the 34th International Conference on Machine Learning - Volume 70. JMLR.org: 2285–2294. arXiv:1701.06049.

alignmentforum.org

Christiano, Paul (25 January 2023). "Thoughts on the impact of RLHF research". Retrieved 4 March 2023.

arstechnica.com

Edwards, Benj (1 December 2022). "OpenAI invites everyone to test ChatGPT, a new AI-powered chatbot—with amusing results". Ars Technica. Retrieved 4 March 2023.
Edwards, Benj (2023-05-09). "AI gains "values" with Anthropic's new Constitutional AI chatbot approach". Ars Technica. Retrieved 2024-04-27.

arxiv.org

Ziegler, Daniel M.; Stiennon, Nisan; Wu, Jeffrey; Brown, Tom B.; Radford, Alec; Amodei, Dario; Christiano, Paul; Irving, Geoffrey (2019). "Fine-Tuning Language Models from Human Preferences". arXiv:1909.08593 [cs.CL].
Schulman, John; Wolski, Filip; Dhariwal, Prafulla; Radford, Alec; Klimov, Oleg (2017). "Proximal Policy Optimization Algorithms". arXiv:1707.06347 [cs.LG].
Tuan, Yi-Lin; Zhang, Jinzhi; Li, Yujia; Lee, Hung-yi (2018). "Proximal Policy Optimization and its Dynamic Version for Sequence Generation". arXiv:1808.07982 [cs.CL].
Zheng, Rui; Dou, Shihan; Gao, Songyang; Hua, Yuan; Shen, Wei; Wang, Binghai; Liu, Yan; Jin, Senjie; Liu, Qin; Zhou, Yuhao; Xiong, Limao; Chen, Lu; Xi, Zhiheng; Xu, Nuo; Lai, Wenbin; Zhu, Minghao; Chang, Cheng; Yin, Zhangyue; Weng, Rongxiang; Cheng, Wensen; Huang, Haoran; Sun, Tianxiang; Yan, Hang; Gui, Tao; Zhang, Qi; Qiu, Xipeng; Huang, Xuanjing (2023). "Secrets of RLHF in Large Language Models Part I: PPO". arXiv:2307.04964 [cs.CL].
Akrour, Riad; Schoenauer, Marc; Sebag, Michèle (2012). "APRIL: Active Preference Learning-Based Reinforcement Learning". Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science. Vol. 7524. Springer. pp. 116–131. arXiv:1208.0984. doi:10.1007/978-3-642-33486-3_8. ISBN 978-3-642-33485-6. Retrieved 26 February 2024.
Warnell, Garrett; Waytowich, Nicholas; Lawhern, Vernon; Stone, Peter (25 April 2018). "Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces". Proceedings of the AAAI Conference on Artificial Intelligence. 32 (1). arXiv:1709.10163. doi:10.1609/aaai.v32i1.11485. S2CID 4130751.
MacGlashan, James; Ho, Mark K.; Loftin, Robert; Peng, Bei; Wang, Guan; Roberts, David L.; Taylor, Matthew E.; Littman, Michael L. (6 August 2017). "Interactive learning from policy-dependent human feedback". Proceedings of the 34th International Conference on Machine Learning - Volume 70. JMLR.org: 2285–2294. arXiv:1701.06049.
Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Gray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe, Ryan (31 October 2022). Training language models to follow instructions with human feedback. Thirty-Sixth Conference on Neural Information Processing Systems: NeurIPS 2022. arXiv:2203.02155.
Bai, Yuntao; Jones, Andy; Ndousse, Kamal; Askell, Amanda; Chen, Anna; DasSarma, Nova; Drain, Dawn; Fort, Stanislav; Ganguli, Deep; Henighan, Tom; Joseph, Nicholas; Kadavath, Saurav; Kernion, Jackson; Conerly, Tom; El-Showk, Sheer; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Hume, Tristan; Johnston, Scott; Kravec, Shauna; Lovitt, Liane; Nanda, Neel; Olsson, Catherine; Amodei, Dario; Brown, Tom; Clark, Jack; McCandlish, Sam; Olah, Chris; Mann, Ben; Kaplan, Jared (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". arXiv:2204.05862 [cs.CL].
Fernandes, Patrick; Madaan, Aman; Liu, Emmy; Farinhas, António; Pedro Henrique Martins; Bertsch, Amanda; de Souza, José G. C.; Zhou, Shuyan; Wu, Tongshuang; Neubig, Graham; Martins, André F. T. (2023). "Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation". arXiv:2305.00955 [cs.CL].
Xie, Tengyang; Jiang, Nan; Wang, Huan; Xiong, Caiming; Bai, Yu (2021). "Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning". Advances in Neural Information Processing Systems. 34. Curran Associates, Inc.: 27395–27407. arXiv:2106.04895. Retrieved 10 March 2024.
Pacchiano, Aldo; Saha, Aadirupa; Lee, Jonathan (2023-03-03). "Dueling RL: Reinforcement Learning with Trajectory Preferences". Proceedings of the 26th International Conference on Artificial Intelligence and Statistics. PMLR: 6263–6289. arXiv:2111.04850.
Zhu, Banghua; Jordan, Michael; Jiao, Jiantao (2023-07-03). "Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons". Proceedings of the 40th International Conference on Machine Learning. PMLR: 43037–43067. arXiv:2301.11270.
Li, Zihao; Yang, Zhuoran; Wang, Mengdi (20 June 2023). "Reinforcement learning with Human Feedback: Learning Dynamic Choices via Pessimism". ILHF Workshop ICML 2023. arXiv:2305.18438. Retrieved 10 March 2024.
Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe, Ryan (2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155 [cs.CL].
Glaese, Amelia; McAleese, Nat; Trębacz, Maja; Aslanides, John; Firoiu, Vlad; Ewalds, Timo; Rauh, Maribeth; Weidinger, Laura; Chadwick, Martin; Thacker, Phoebe; Campbell-Gillingham, Lucy; Uesato, Jonathan; Huang, Po-Sen; Comanescu, Ramona; Yang, Fan; See, Abigail; Dathathri, Sumanth; Greig, Rory; Chen, Charlie; Fritz, Doug; Elias, Jaume Sanchez; Green, Richard; Mokrá, Soňa; Fernando, Nicholas; Wu, Boxi; Foley, Rachel; Young, Susannah; Gabriel, Iason; Isaac, William; Mellor, John; Hassabis, Demis; Kavukcuoglu, Koray; Hendricks, Lisa Anne; Irving, Geoffrey (2022). "Improving alignment of dialogue agents via targeted human judgements". arXiv:2209.14375 [cs.LG].
Fan, Ying; Watkins, Olivia; Du, Yuqing; Liu, Hao; Ryu, Moonkyung; Boutilier, Craig; Abbeel, Pieter; Ghavamzadeh, Mohammad; Lee, Kangwook; Lee, Kimin (2 November 2023). "DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models". NeurIPS 2023. arXiv:2305.16381. Retrieved 1 March 2024.
Xu, Jiazheng; Liu, Xiao; Wu, Yuchen; Tong, Yuxuan; Li, Qinkai; Ding, Ming; Tang, Jie; Dong, Yuxiao (15 December 2023). "ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation". Advances in Neural Information Processing Systems. 36: 15903–15935. arXiv:2304.05977. Retrieved 1 March 2024.
Lee, Kimin; Liu, Hao; Ryu, Moonkyung; Watkins, Olivia; Du, Yuqing; Boutilier, Craig; Abbeel, Pieter; Ghavamzadeh, Mohammad; Gu, Shixiang Shane (2023). "Aligning Text-to-Image Models using Human Feedback". arXiv:2302.12192 [cs.LG].
Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. arXiv:1706.03741. Retrieved 4 March 2023.
Casper, Stephen; Davies, Xander; Shi, Claudia; Gilbert, Thomas Krendl; Scheurer, Jérémy; Rando, Javier; Freedman, Rachel; Korbak, Tomasz; Lindner, David; Freire, Pedro; Wang, Tony Tong; Marks, Samuel; Segerie, Charbel-Raphael; Carroll, Micah; Peng, Andi; Christoffersen, Phillip; Damani, Mehul; Slocum, Stewart; Anwar, Usman; Siththaranjan, Anand; Nadeau, Max; Michaud, Eric J.; Pfau, Jacob; Krasheninnikov, Dmitrii; Chen, Xin; Langosco, Lauro; Hase, Peter; Biyik, Erdem; Dragan, Anca; Krueger, David; Sadigh, Dorsa; Hadfield-Menell, Dylan (18 September 2023). "Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback". Transactions on Machine Learning Research. arXiv:2307.15217.
Rafailov, Rafael; Chittepu, Yaswanth; Park, Ryan; Sikchi, Harshit; Hejna, Joey; Knox, Bradley; Finn, Chelsea; Niekum, Scott (2024). "Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms". arXiv:2406.02900 [cs.LG].
Shi, Zhengyan; Land, Sander; Locatelli, Acyr; Geist, Matthieu; Bartolo, Max (2024). "Understanding Likelihood Over-optimisation in Direct Alignment Algorithms". arXiv:2410.11677 [cs.CL].
Rafailov, Rafael; Sharma, Archit; Mitchell, Eric; Ermon, Stefano; Manning, Christopher D.; Finn, Chelsea (2023). "Direct Preference Optimization: Your Language Model is Secretly a Reward Model". arXiv:2305.18290 [cs.LG].
Wang, Zhilin; Dong, Yi; Zeng, Jiaqi; Adams, Virginia; Sreedhar, Makesh Narsimhan; Egert, Daniel; Delalleau, Olivier; Scowcroft, Jane Polak; Kant, Neel; Swope, Aidan; Kuchaiev, Oleksii (2023). "HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM". arXiv:2311.09528 [cs.CL].
Mohammad Gheshlaghi Azar; Rowland, Mark; Piot, Bilal; Guo, Daniel; Calandriello, Daniele; Valko, Michal; Munos, Rémi (2023). "A General Theoretical Paradigm to Understand Learning from Human Preferences". arXiv:2310.12036 [cs.AI].
Ethayarajh, Kawin; Xu, Winnie; Muennighoff, Niklas; Jurafsky, Dan; Kiela, Douwe (2024). "KTO: Model Alignment as Prospect Theoretic Optimization". arXiv:2402.01306 [cs.LG].

blog.google

Pinchai, Sundar; Hassabis, Demis (6 December 2023). "Introducing Gemini: our largest and most capable AI model". Google. Retrieved 29 February 2024.

deepmind.com

The Sparrow team (22 September 2022). "Building safer dialogue agents". www.deepmind.com. Retrieved 4 March 2023.
Leike, Jan; Martic, Miljan; Legg, Shane (12 June 2017). "Learning through human feedback". www.deepmind.com. Retrieved 4 March 2023.

doi.org

Knox, W. Bradley; Stone, Peter; Breazeal, Cynthia (2013). "Training a Robot via Human Feedback: A Case Study". Social Robotics. Lecture Notes in Computer Science. Vol. 8239. Springer International Publishing. pp. 460–470. doi:10.1007/978-3-319-02675-6_46. ISBN 978-3-319-02674-9. Retrieved 26 February 2024.
Akrour, Riad; Schoenauer, Marc; Sebag, Michèle (2012). "APRIL: Active Preference Learning-Based Reinforcement Learning". Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science. Vol. 7524. Springer. pp. 116–131. arXiv:1208.0984. doi:10.1007/978-3-642-33486-3_8. ISBN 978-3-642-33485-6. Retrieved 26 February 2024.
Warnell, Garrett; Waytowich, Nicholas; Lawhern, Vernon; Stone, Peter (25 April 2018). "Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces". Proceedings of the AAAI Conference on Artificial Intelligence. 32 (1). arXiv:1709.10163. doi:10.1609/aaai.v32i1.11485. S2CID 4130751.
Belenguer, Lorenzo (2022). "AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry". AI and Ethics. 2 (4). AI Ethics: 771–787. doi:10.1007/s43681-022-00138-8. PMC 8830968. PMID 35194591.

huggingface.co

Lambert, Nathan; Castricato, Louis; von Werra, Leandro; Havrilla, Alex. "Illustrating Reinforcement Learning from Human Feedback (RLHF)". huggingface.co. Retrieved 4 March 2023.

mlr.press

proceedings.mlr.press

Schoenauer, Marc; Akrour, Riad; Sebag, Michele; Souplet, Jean-Christophe (18 June 2014). "Programming by Feedback". Proceedings of the 31st International Conference on Machine Learning. PMLR: 1503–1511. Retrieved 26 February 2024.
Pacchiano, Aldo; Saha, Aadirupa; Lee, Jonathan (2023-03-03). "Dueling RL: Reinforcement Learning with Trajectory Preferences". Proceedings of the 26th International Conference on Artificial Intelligence and Statistics. PMLR: 6263–6289. arXiv:2111.04850.
Zhu, Banghua; Jordan, Michael; Jiao, Jiantao (2023-07-03). "Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons". Proceedings of the 40th International Conference on Machine Learning. PMLR: 43037–43067. arXiv:2301.11270.

neurips.cc

proceedings.neurips.cc

Nisan Stiennon; Long Ouyang; Jeffrey Wu; Daniel Ziegler; Ryan Lowe; Chelsea Voss; Alec Radford; Dario Amodei; Paul F. Christiano (2020). "Learning to summarize with human feedback". Advances in Neural Information Processing Systems. 33.
Xie, Tengyang; Jiang, Nan; Wang, Huan; Xiong, Caiming; Bai, Yu (2021). "Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning". Advances in Neural Information Processing Systems. 34. Curran Associates, Inc.: 27395–27407. arXiv:2106.04895. Retrieved 10 March 2024.
Xu, Jiazheng; Liu, Xiao; Wu, Yuchen; Tong, Yuxuan; Li, Qinkai; Ding, Ming; Tang, Jie; Dong, Yuxiao (15 December 2023). "ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation". Advances in Neural Information Processing Systems. 36: 15903–15935. arXiv:2304.05977. Retrieved 1 March 2024.

nih.gov

ncbi.nlm.nih.gov

Belenguer, Lorenzo (2022). "AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry". AI and Ethics. 2 (4). AI Ethics: 771–787. doi:10.1007/s43681-022-00138-8. PMC 8830968. PMID 35194591.

pubmed.ncbi.nlm.nih.gov

Belenguer, Lorenzo (2022). "AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry". AI and Ethics. 2 (4). AI Ethics: 771–787. doi:10.1007/s43681-022-00138-8. PMC 8830968. PMID 35194591.

nips.cc

papers.nips.cc

Wilson, Aaron; Fern, Alan; Tadepalli, Prasad (2012). "A Bayesian Approach for Policy Learning from Trajectory Preference Queries". Advances in Neural Information Processing Systems. 25. Curran Associates, Inc. Retrieved 26 February 2024.
Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. arXiv:1706.03741. Retrieved 4 March 2023.

openai.com

Amodei, Dario; Christiano, Paul; Ray, Alex (13 June 2017). "Learning from human preferences". openai.com. Retrieved 4 March 2023.
Clark, Jack; Amodei, Dario (21 December 2016). "Faulty reward functions in the wild". OpenAI.

openreview.net

Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Gray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe, Ryan (31 October 2022). Training language models to follow instructions with human feedback. Thirty-Sixth Conference on Neural Information Processing Systems: NeurIPS 2022. arXiv:2203.02155.
Li, Zihao; Yang, Zhuoran; Wang, Mengdi (20 June 2023). "Reinforcement learning with Human Feedback: Learning Dynamic Choices via Pessimism". ILHF Workshop ICML 2023. arXiv:2305.18438. Retrieved 10 March 2024.
Fan, Ying; Watkins, Olivia; Du, Yuqing; Liu, Hao; Ryu, Moonkyung; Boutilier, Craig; Abbeel, Pieter; Ghavamzadeh, Mohammad; Lee, Kangwook; Lee, Kimin (2 November 2023). "DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models". NeurIPS 2023. arXiv:2305.16381. Retrieved 1 March 2024.
Casper, Stephen; Davies, Xander; Shi, Claudia; Gilbert, Thomas Krendl; Scheurer, Jérémy; Rando, Javier; Freedman, Rachel; Korbak, Tomasz; Lindner, David; Freire, Pedro; Wang, Tony Tong; Marks, Samuel; Segerie, Charbel-Raphael; Carroll, Micah; Peng, Andi; Christoffersen, Phillip; Damani, Mehul; Slocum, Stewart; Anwar, Usman; Siththaranjan, Anand; Nadeau, Max; Michaud, Eric J.; Pfau, Jacob; Krasheninnikov, Dmitrii; Chen, Xin; Langosco, Lauro; Hase, Peter; Biyik, Erdem; Dragan, Anca; Krueger, David; Sadigh, Dorsa; Hadfield-Menell, Dylan (18 September 2023). "Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback". Transactions on Machine Learning Research. arXiv:2307.15217.
Zhang, Chiyuan; Bengio, Samy; Hardt, Moritz; Recht, Benjamin; Vinyals, Oriol (4 November 2016). "Understanding deep learning requires rethinking generalization". International Conference on Learning Representations.
Lee, Harrison; Phatale, Samrat; Mansoor, Hassan; Lu, Kellie Ren; Mesnard, Thomas; Ferret, Johan; Bishop, Colton; Hall, Ethan; Carbune, Victor; Rastogi, Abhinav (2023-10-13). "RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback". ICLR.

semanticscholar.org

api.semanticscholar.org

Warnell, Garrett; Waytowich, Nicholas; Lawhern, Vernon; Stone, Peter (25 April 2018). "Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces". Proceedings of the AAAI Conference on Artificial Intelligence. 32 (1). arXiv:1709.10163. doi:10.1609/aaai.v32i1.11485. S2CID 4130751.

springer.com

link.springer.com

Knox, W. Bradley; Stone, Peter; Breazeal, Cynthia (2013). "Training a Robot via Human Feedback: A Case Study". Social Robotics. Lecture Notes in Computer Science. Vol. 8239. Springer International Publishing. pp. 460–470. doi:10.1007/978-3-319-02675-6_46. ISBN 978-3-319-02674-9. Retrieved 26 February 2024.
Akrour, Riad; Schoenauer, Marc; Sebag, Michèle (2012). "APRIL: Active Preference Learning-Based Reinforcement Learning". Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science. Vol. 7524. Springer. pp. 116–131. arXiv:1208.0984. doi:10.1007/978-3-642-33486-3_8. ISBN 978-3-642-33485-6. Retrieved 26 February 2024.

techcrunch.com

Wiggers, Kyle (24 February 2023). "Can AI really be protected from text-based attacks?". TechCrunch. Retrieved 4 March 2023.

technologyreview.com

Heikkilä, Melissa (21 February 2023). "How OpenAI is trying to make ChatGPT safer and less biased". MIT Technology Review. Retrieved 4 March 2023.
Douglas Heaven, Will (30 November 2022). "ChatGPT is OpenAI's latest fix for GPT-3. It's slick but still spews nonsense". MIT Technology Review. Retrieved 4 March 2023.

time.com

Henshall, Will (18 July 2023). "What to Know About Claude 2, Anthropic's Rival to ChatGPT". TIME. Retrieved 6 March 2024.

venturebeat.com

Abhishek, Gupta (5 February 2023). "Getting stakeholder engagement right in responsible AI". VentureBeat. Retrieved 4 March 2023.
Goldman, Sharon (23 September 2022). "Why DeepMind isn't deploying its new AI chatbot — and what it means for responsible AI". VentureBeat. Retrieved 4 March 2023.