Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation"Archived 15 August 2020 at the Wayback Machine. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
Bengio, Yoshua; Ducharme, Réjean; Vincent, Pascal; Janvin, Christian (1 March 2003). "A neural probabilistic language model". The Journal of Machine Learning Research. 3: 1137–1155 – via ACM Digital Library.
Mikolov, Tomas; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). "Efficient estimation of word representations in vector space". arXiv:1301.3781 [cs.CL].
Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Matthew; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma (2021). "On the Opportunities and Risks of Foundation Models". arXiv:2108.07258 [cs.LG].
Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (December 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.). "Language Models are Few-Shot Learners"(PDF). Advances in Neural Information Processing Systems. 33. Curran Associates, Inc.: 1877–1901. arXiv:2005.14165. doi:10.1145/3582269.3615599. Archived(PDF) from the original on 17 November 2023. Retrieved 14 March 2023.
Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario (2020). "Scaling Laws for Neural Language Models". arXiv:2001.08361 [cs.LG].
Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need". arXiv:1706.03762 [cs.CL].
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805 [cs.CL].
Christiano, Paul; Leike, Jan; Brown, Tom B.; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". arXiv:1706.03741 [stat.ML].
Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex (2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155 [cs.CL].
Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omer; Bowman, Samuel R. (2018). "GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding". arXiv:1804.07461 [cs.CL].
Hendrycks, Dan; Burns, Collin; Basart, Steven; Zou, Andy; Mazeika, Mantas; Song, Dawn; Steinhardt, Jacob (2025). "Expressing stigma and inappropriate responses prevents LLMS from safely replacing mental health providers". Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. pp. 599–627. arXiv:2009.03300. doi:10.1145/3715275.3732039. ISBN979-8-4007-1482-5.
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (10 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805 [cs.CL].
Ponte, Jay M.; Croft, W. Bruce (1998). A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference. Melbourne, Australia: ACM. pp. 275–281. doi:10.1145/290941.291008.
Hiemstra, Djoerd (1998). A linguistically motivated probabilistically model of information retrieval. Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries. LNCS, Springer. pp. 569–584. doi:10.1007/3-540-49653-X_34.
Hendrycks, Dan; Burns, Collin; Basart, Steven; Zou, Andy; Mazeika, Mantas; Song, Dawn; Steinhardt, Jacob (2025). "Expressing stigma and inappropriate responses prevents LLMS from safely replacing mental health providers". Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. pp. 599–627. arXiv:2009.03300. doi:10.1145/3715275.3732039. ISBN979-8-4007-1482-5.
Karlgren, Jussi; Schutze, Hinrich (2015), "Evaluating Learning Language Representations", International Conference of the Cross-Language Evaluation Forum, Lecture Notes in Computer Science, Springer International Publishing, pp. 254–260, doi:10.1007/978-3-319-64206-2_8, ISBN978-3-319-64205-5
Aghaebrahimian, Ahmad (2017), "Quora Question Answer Dataset", Text, Speech, and Dialogue, Lecture Notes in Computer Science, vol. 10415, Springer International Publishing, pp. 66–73, doi:10.1007/978-3-319-64206-2_8, ISBN978-3-319-64205-5
Jurafsky, Dan; Martin, James H. (7 January 2023). "N-gram Language Models". Speech and Language Processing(PDF) (3rd edition draft ed.). Retrieved 24 May 2022.
Sammons, V.G.Vinod Vydiswaran, Dan Roth, Mark; Vydiswaran, V.G.; Roth, Dan. "Recognizing Textual Entailment"(PDF). Archived from the original(PDF) on 9 August 2017. Retrieved 24 February 2019.{{cite web}}: CS1 maint: multiple names: authors list (link)
Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation"Archived 15 August 2020 at the Wayback Machine. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
Sammons, V.G.Vinod Vydiswaran, Dan Roth, Mark; Vydiswaran, V.G.; Roth, Dan. "Recognizing Textual Entailment"(PDF). Archived from the original(PDF) on 9 August 2017. Retrieved 24 February 2019.{{cite web}}: CS1 maint: multiple names: authors list (link)