Aprendizado por reforço com feedback humano (Portuguese Wikipedia)

Analysis of information sources in references of the Wikipedia article "Aprendizado por reforço com feedback humano" in Portuguese language version.

refsWebsite

Global rank Portuguese rank

7arxiv.org

69^th place

195^th place

4huggingface.co

low place

2openreview.net

low place

2arstechnica.com

388^th place

592^nd place

2openai.com

1,559^th place

1,438^th place

1acm.org

1,185^th place

1,301^st place

1venturebeat.com

616^th place

783^rd place

1techcrunch.com

187^th place

352^nd place

1forbes.com

54^th place

82^nd place

1deepmind.com

low place

1nips.cc

low place

1alignmentforum.org

low place

1springer.com

274^th place

339^th place

1nih.gov

4^th place

8^th place

1doi.org

2^nd place

4^th place

1princeton.edu

741^st place

748^th place

acm.org

dl.acm.org

MacGlashan, James; Ho, Mark K; Loftin, Robert; Peng, Bei; Wang, Guan; Roberts, David L.; Taylor, Matthew E.; Littman, Michael L. (6 de agosto de 2017). «Interactive learning from policy-dependent human feedback». JMLR.org. Proceedings of the 34th International Conference on Machine Learning - Volume 70: 2285–2294. arXiv:1701.06049

alignmentforum.org

Christiano, Paul. «Thoughts on the impact of RLHF research» (em inglês). Consultado em 4 de março de 2023

arstechnica.com

Edwards, Benj (1 de dezembro de 2022). «OpenAI invites everyone to test ChatGPT, a new AI-powered chatbot—with amusing results». Ars Technica (em inglês). Consultado em 4 de março de 2023
Edwards, Benj (1 de dezembro de 2022). «OpenAI invites everyone to test ChatGPT, a new AI-powered chatbot—with amusing results». Ars Technica (em inglês). Consultado em 4 de março de 2023

arxiv.org

Ziegler, Daniel M.; Stiennon, Nisan (2019). «Fine-Tuning Language Models from Human Preferences». arXiv:1909.08593 [cs.CL]
MacGlashan, James; Ho, Mark K; Loftin, Robert; Peng, Bei; Wang, Guan; Roberts, David L.; Taylor, Matthew E.; Littman, Michael L. (6 de agosto de 2017). «Interactive learning from policy-dependent human feedback». JMLR.org. Proceedings of the 34th International Conference on Machine Learning - Volume 70: 2285–2294. arXiv:1701.06049
Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina (31 de outubro de 2022). «Training language models to follow instructions with human feedback» (em inglês). arXiv:2203.02155
Fernandes, Patrick; Madaan, Aman (2023). «Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation». arXiv:2305.00955 [cs.CL]
Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina (2022). «Training language models to follow instructions with human feedback». arXiv:2203.02155
Glaese, Amelia; McAleese, Nat (2022). «Improving alignment of dialogue agents via targeted human judgements». arXiv:2209.14375 [cs.LG]
Casper, Stephen; Davies, Xander; Shi, Claudia; Gilbert, Thomas Krendl; Scheurer, Jérémy; Rando, Javier; Freedman, Rachel; Korbak, Tomasz; Lindner, David; Freire, Pedro; Wang, Tony; Marks, Samuel; Segerie, Charbel-Raphaël; Carroll, Micah; Peng, Andi; Christoffersen, Phillip; Damani, Mehul; Slocum, Stewart; Anwar, Usman; Siththaranjan, Anand; Nadeau, Max; Michaud, Eric J.; Pfau, Jacob; Krasheninnikov, Dmitrii; Chen, Xin; Langosco, Lauro; Hase, Peter; Bıyık, Erdem; Dragan, Anca; Krueger, David; Sadigh, Dorsa; Hadfield-Menell, Dylan (2023). «Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback». arXiv:2307.15217 [cs.AI]

deepmind.com

«Learning through human feedback». www.deepmind.com (em inglês). Consultado em 4 de março de 2023

doi.org

dx.doi.org

Belenguer, Lorenzo (2022). «AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry». AI Ethics. AI and Ethics. 2 (4): 771–787. PMC 8830968. doi:10.1007/s43681-022-00138-8

forbes.com

Farseev, Aleks. «Council Post: Is Bigger Better? Why The ChatGPT Vs. GPT-3 Vs. GPT-4 'Battle' Is Just A Family Chat». Forbes (em inglês). Consultado em 4 de março de 2023

huggingface.co

Lambert, Nathan; Castricato, Louis; von Werra, Leandro; Havrilla, Alex. «Illustrating Reinforcement Learning from Human Feedback (RLHF)». huggingface.co. Consultado em 4 de março de 2023
Lambert, Nathan; Castricato, Louis; von Werra, Leandro; Havrilla, Alex. «Illustrating Reinforcement Learning from Human Feedback (RLHF)». huggingface.co. Consultado em 4 de março de 2023
«Illustrating Reinforcement Learning from Human Feedback (RLHF)». Hugging Face
«Paper page - Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback». huggingface.co. 31 de julho de 2023. Consultado em 31 de julho de 2023

nih.gov

ncbi.nlm.nih.gov

Belenguer, Lorenzo (2022). «AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry». AI Ethics. AI and Ethics. 2 (4): 771–787. PMC 8830968. doi:10.1007/s43681-022-00138-8

nips.cc

papers.nips.cc

Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). «Deep Reinforcement Learning from Human Preferences». Curran Associates, Inc. Advances in Neural Information Processing Systems. 30. Consultado em 4 de março de 2023

openai.com

«Learning from human preferences». openai.com. Consultado em 4 de março de 2023
«Faulty reward functions in the wild». OpenAI

openreview.net

Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina (31 de outubro de 2022). «Training language models to follow instructions with human feedback» (em inglês). arXiv:2203.02155
Zhang, Chiyuan; Bengio, Samy; Hardt, Moritz; Recht, Benjamin; Vinyals, Oriol (4 de novembro de 2016). «Understanding deep learning requires rethinking generalization». International Conference on Learning Representations

princeton.edu

cs.princeton.edu

Wang, Austin. «Training Language Models to Follow Instructions with Human Feedback» (PDF). Princeton

springer.com

link.springer.com

Belenguer, Lorenzo (2022). «AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry». AI Ethics. AI and Ethics. 2 (4): 771–787. PMC 8830968. doi:10.1007/s43681-022-00138-8

techcrunch.com

Wiggers, Kyle (24 de fevereiro de 2023). «Can AI really be protected from text-based attacks?». TechCrunch. Consultado em 4 de março de 2023

venturebeat.com

Abhishek, Gupta (5 de fevereiro de 2023). «Getting stakeholder engagement right in responsible AI». VentureBeat. Consultado em 4 de março de 2023