Polosukhin, Illia; Kaiser, Lukasz (12 June 2017). "Attention Is All You Need". arXiv:1706.03762 [cs.CL]。
Kitaev, Nikita; Kaiser, Łukasz. "Reformer: The Efficient Transformer". arXiv:2001.04451 [cs.LG]。
Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omer; Bowman, Samuel (2018). “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding”. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics): 353–355. arXiv:1804.07461. doi:10.18653/v1/w18-5446.
Noever, David; Ciolino, Matt (21 August 2020). "The Chess Transformer: Mastering Play using Generative Language Models". arXiv:2008.04057 [cs.AI]。
Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omer; Bowman, Samuel (2018). “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding”. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics): 353–355. arXiv:1804.07461. doi:10.18653/v1/w18-5446.
Yang, Zhilin Dai, Zihang Yang, Yiming Carbonell, Jaime Salakhutdinov, Ruslan Le, Quoc V. (2019-06-19). XLNet: Generalized Autoregressive Pretraining for Language Understanding. OCLC1106350082