Aprendizado por reforço com feedback humano (Portuguese Wikipedia)

Analysis of information sources in references of the Wikipedia article "Aprendizado por reforço com feedback humano" in Portuguese language version.

refsWebsite
Global rank Portuguese rank
69th place
195th place
low place
low place
low place
low place
388th place
592nd place
1,559th place
1,438th place
1,185th place
1,301st place
616th place
783rd place
187th place
352nd place
54th place
82nd place
low place
low place
low place
low place
low place
low place
274th place
339th place
4th place
8th place
2nd place
4th place
741st place
748th place

acm.org

dl.acm.org

alignmentforum.org

arstechnica.com

arxiv.org

  • Ziegler, Daniel M.; Stiennon, Nisan (2019). «Fine-Tuning Language Models from Human Preferences». arXiv:1909.08593Acessível livremente [cs.CL] 
  • MacGlashan, James; Ho, Mark K; Loftin, Robert; Peng, Bei; Wang, Guan; Roberts, David L.; Taylor, Matthew E.; Littman, Michael L. (6 de agosto de 2017). «Interactive learning from policy-dependent human feedback». JMLR.org. Proceedings of the 34th International Conference on Machine Learning - Volume 70: 2285–2294. arXiv:1701.06049Acessível livremente 
  • Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina (31 de outubro de 2022). «Training language models to follow instructions with human feedback» (em inglês). arXiv:2203.02155Acessível livremente 
  • Fernandes, Patrick; Madaan, Aman (2023). «Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation». arXiv:2305.00955Acessível livremente [cs.CL] 
  • Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina (2022). «Training language models to follow instructions with human feedback». arXiv:2203.02155Acessível livremente 
  • Glaese, Amelia; McAleese, Nat (2022). «Improving alignment of dialogue agents via targeted human judgements». arXiv:2209.14375Acessível livremente [cs.LG] 
  • Casper, Stephen; Davies, Xander; Shi, Claudia; Gilbert, Thomas Krendl; Scheurer, Jérémy; Rando, Javier; Freedman, Rachel; Korbak, Tomasz; Lindner, David; Freire, Pedro; Wang, Tony; Marks, Samuel; Segerie, Charbel-Raphaël; Carroll, Micah; Peng, Andi; Christoffersen, Phillip; Damani, Mehul; Slocum, Stewart; Anwar, Usman; Siththaranjan, Anand; Nadeau, Max; Michaud, Eric J.; Pfau, Jacob; Krasheninnikov, Dmitrii; Chen, Xin; Langosco, Lauro; Hase, Peter; Bıyık, Erdem; Dragan, Anca; Krueger, David; Sadigh, Dorsa; Hadfield-Menell, Dylan (2023). «Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback». arXiv:2307.15217Acessível livremente [cs.AI] 

deepmind.com

doi.org

dx.doi.org

forbes.com

huggingface.co

nih.gov

ncbi.nlm.nih.gov

nips.cc

papers.nips.cc

openai.com

openreview.net

princeton.edu

cs.princeton.edu

springer.com

link.springer.com

techcrunch.com

venturebeat.com