Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (May 24, 2019). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". Association for Computational Linguistics. arXiv:1810.04805.
Radford, Alec; Jozefowicz, Rafal; Sutskever, Ilya (April 6, 2017). "Learning to Generate Reviews and Discovering Sentiment". arXiv:1704.01444 [cs.LG].
Chen, Mark; Tworek, Jerry; Jun, Heewoo; Yuan, Qiming; Ponde de Oliveira Pinto, Henrique; Kaplan, Jared; Edwards, Harri; Burda, Yuri; Joseph, Nicholas; Brockman, Greg; Ray, Alex; Puri, Raul; Krueger, Gretchen; Petrov, Michael; Khlaaf, Heidy (July 1, 2021). "Evaluating Large Language Models Trained on Code". Association for Computational Linguistics. arXiv:2107.03374.
Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (December 6, 2022). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744. arXiv:2203.02155. Archived from the original on June 28, 2023. Retrieved June 24, 2023.
Bommasani (et-al), Rishi (July 12, 2022). "On the Opportunities and Risks of Foundation Models". arXiv:2108.07258 [cs.LG].
Ouyang, Long; Wu, Jeff; Jiang, Xu; et al. (November 4, 2022). "Training language models to follow instructions with human feedback". NeurIPS. arXiv:2203.02155.
Luo (et-al), Renqian (April 3, 2023). "BioGPT: Generative pre-trained transformer for biomedical text generation and mining". Briefings in Bioinformatics. 23 (6). arXiv:2210.10341. doi:10.1093/bib/bbac409. PMID36156661.
Solaiman, Irene; Brundage, Miles; Clark, Jack; Askell, Amanda; Herbert-Voss, Ariel; Wu, Jeff; Radford, Alec; Krueger, Gretchen; Kim, Jong Wook; Kreps, Sarah; McCain, Miles; Newhouse, Alex; Blazakis, Jason; McGuffie, Kris; Wang, Jasmine (November 12, 2019). "Release Strategies and the Social Impacts of Language Models". arXiv:1908.09203 [cs.CL].
Nakano, Reiichiro; Hilton, Jacob; Balaji, Suchir; Wu, Jeff; Ouyang, Long; Kim, Christina; Hesse, Christopher; Jain, Shantanu; Kosaraju, Vineet; Saunders, William; Jiang, Xu; Cobbe, Karl; Eloundou, Tyna; Krueger, Gretchen; Button, Kevin (December 1, 2021). "WebGPT: Browser-assisted question-answering with human feedback". CoRR. arXiv:2112.09332. Archived from the original on July 2, 2023. Retrieved July 2, 2023.
Luo (et-al), Renqian (April 3, 2023). "BioGPT: Generative pre-trained transformer for biomedical text generation and mining". Briefings in Bioinformatics. 23 (6). arXiv:2210.10341. doi:10.1093/bib/bbac409. PMID36156661.
Erhan, Dumitru; Courville, Aaron; Bengio, Yoshua; Vincent, Pascal (March 31, 2010). "Why Does Unsupervised Pre-training Help Deep Learning?". Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings: 201–208. Archived from the original on January 24, 2024. Retrieved January 24, 2024.
namepepper.com
Ver Meer, Dave (June 1, 2023). "ChatGPT Statistics". NamePepper. Archived from the original on June 5, 2023. Retrieved June 9, 2023.
Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need"(PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. Archived(PDF) from the original on February 21, 2024. Retrieved January 28, 2024.
Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (December 6, 2022). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744. arXiv:2203.02155. Archived from the original on June 28, 2023. Retrieved June 24, 2023.
Luo (et-al), Renqian (April 3, 2023). "BioGPT: Generative pre-trained transformer for biomedical text generation and mining". Briefings in Bioinformatics. 23 (6). arXiv:2210.10341. doi:10.1093/bib/bbac409. PMID36156661.
Erhan, Dumitru; Courville, Aaron; Bengio, Yoshua; Vincent, Pascal (March 31, 2010). "Why Does Unsupervised Pre-training Help Deep Learning?". Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings: 201–208. Archived from the original on January 24, 2024. Retrieved January 24, 2024.
Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need"(PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. Archived(PDF) from the original on February 21, 2024. Retrieved January 28, 2024.