Devlin, Jacob; Chang, Ming-Wei (11 de outubro de 2018). «BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding». arXiv:1810.04805v2 [cs.CL]
Zhu, Yukun; Kiros, Ryan (2015). «Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books». arXiv:1506.06724 [cs.CV]
Rajpurkar, Pranav; Zhang, Jian (10 de outubro de 2016). «SQuAD: 100,000+ Questions for Machine Comprehension of Text». arXiv:1606.05250 [cs.CL]
Zellers, Rowan; Bisk, Yonatan (15 de agosto de 2018). «SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference». arXiv:1808.05326 [cs.CL]
Dai, Andrew; Le, Quoc (4 de novembro de 2015). «Semi-supervised Sequence Learning». arXiv:1511.01432 [cs.LG]
Peters, Matthew; Neumann, Mark (15 de fevereiro de 2018). «Deep contextualized word representations». arXiv:1802.05365v2 [cs.CL]
Howard, Jeremy; Ruder, Sebastian (18 de janeiro de 2018). «Universal Language Model Fine-tuning for Text Classification». arXiv:1801.06146v5 [cs.CL]
Montti, Roger (10 de dezembro de 2019). «Google's BERT Rolls Out Worldwide». Search Engine Journal. Search Engine Journal. Consultado em 10 de dezembro de 2019