人間のフィードバックによる強化学習 (Japanese Wikipedia)

Analysis of information sources in references of the Wikipedia article "人間のフィードバックによる強化学習" in Japanese language version.

refsWebsite
Global rank Japanese rank
69th place
227th place
low place
low place
low place
low place
187th place
440th place
1,559th place
1,682nd place
1,185th place
2,667th place
2nd place
6th place
388th place
1,331st place
616th place
2,168th place
low place
low place
low place
low place
low place
low place
low place
low place
274th place
596th place
741st place
1,856th place
1,131st place
838th place

acm.org

dl.acm.org

alignmentforum.org

arstechnica.com

arxiv.org

  • Ziegler, Daniel M.; Stiennon, Nisan; Wu, Jeffrey; Brown, Tom B.; Radford, Alec; Amodei, Dario; Christiano, Paul; Irving, Geoffrey (2019). Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593. 
  • MacGlashan, James; Ho, Mark K; Loftin, Robert; Peng, Bei; Wang, Guan; Roberts, David L.; Taylor, Matthew E.; Littman, Michael L. (6 August 2017). “Interactive learning from policy-dependent human feedback”. Proceedings of the 34th International Conference on Machine Learning - Volume 70 (JMLR.org): 2285–2294. arXiv:1701.06049. https://dl.acm.org/doi/10.5555/3305890.3305917. 
    • Warnell, Garrett; Waytowich, Nicholas; Lawhern, Vernon; Stone, Peter (25 April 2018). “Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”. Proceedings of the AAAI Conference on Artificial Intelligence 32 (1). doi:10.1609/aaai.v32i1.11485. 
    • Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini et al. (31 October 2022) (英語). Training language models to follow instructions with human feedback. arXiv:2203.02155. https://openreview.net/forum?id=TG8KACxEON. 
    • Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José G. C. de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André F. T. Martins. “Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation”. arXiv:2305.00955.{{cite arXiv}}: CS1メンテナンス: authors引数 (カテゴリ)
    • Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini et al. (2022). Training language models to follow instructions with human feedback. arXiv:2203.02155. 

deepmind.com

doi.org

huggingface.co

ibm.com

neurips.cc

proceedings.neurips.cc

nips.cc

papers.nips.cc

openai.com

openreview.net

princeton.edu

cs.princeton.edu

springer.com

link.springer.com

techcrunch.com

venturebeat.com