Warnell, Garrett; Waytowich, Nicholas; Lawhern, Vernon; Stone, Peter (25 April 2018). “Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”. Proceedings of the AAAI Conference on Artificial Intelligence32 (1). doi:10.1609/aaai.v32i1.11485.
Ziegler, Daniel M.; Stiennon, Nisan; Wu, Jeffrey; Brown, Tom B.; Radford, Alec; Amodei, Dario; Christiano, Paul; Irving, Geoffrey (2019). Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593.
Warnell, Garrett; Waytowich, Nicholas; Lawhern, Vernon; Stone, Peter (25 April 2018). “Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”. Proceedings of the AAAI Conference on Artificial Intelligence32 (1). doi:10.1609/aaai.v32i1.11485.
Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José G. C. de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André F. T. Martins. “Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation”. arXiv:2305.00955.{{cite arXiv}}: CS1メンテナンス: authors引数 (カテゴリ)
Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini et al. (2022). Training language models to follow instructions with human feedback. arXiv:2203.02155.
Warnell, Garrett; Waytowich, Nicholas; Lawhern, Vernon; Stone, Peter (25 April 2018). “Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”. Proceedings of the AAAI Conference on Artificial Intelligence32 (1). doi:10.1609/aaai.v32i1.11485.
Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini et al. (2022). Training language models to follow instructions with human feedback. arXiv:2203.02155.