Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Florence, Italy: Association for Computational Linguistics). August 2019: 276–286 [2022-06-08]. doi:10.18653/v1/W19-4828. (原始内容存档于2020-10-21).
arxiv.org
Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia. Attention Is All You Need. 2017-06-12. arXiv:1706.03762 [cs.CL].
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018-10-11. arXiv:1810.04805v2 [cs.CL].
Tay. Long Range Arena: A Benchmark for Efficient Transformers. arXiv:2011.04006.
Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omer; Bowman, Samuel. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 353–355. S2CID 5034059. arXiv:1804.07461. doi:10.18653/v1/w18-5446.
Bertasias; Wang; Torresani. Is Space-Time Attention All You Need for Video Understanding?. 2021. arXiv:2102.05095 [cs.CV].
Noever, David; Ciolino, Matt; Kalin, Josh. The Chess Transformer: Mastering Play using Generative Language Models. 2020-08-21. arXiv:2008.04057 [cs.AI].
Dosovitskiy, Alexey; Beyer, Lucas; Kolesnikov, Alexander; Weissenborn, Dirk; Zhai, Xiaohua; Unterthiner, Thomas; Dehghani, Mostafa; Minderer, Matthias; Heigold, Georg; Gelly, Sylvain; Uszkoreit, Jakob; Houlsby, Neil. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2020. arXiv:2010.11929 [cs.CV].
Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Florence, Italy: Association for Computational Linguistics). August 2019: 276–286 [2022-06-08]. doi:10.18653/v1/W19-4828. (原始内容存档于2020-10-21).
Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omer; Bowman, Samuel. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 353–355. S2CID 5034059. arXiv:1804.07461. doi:10.18653/v1/w18-5446.
Rives, Alexander; Goyal, Siddharth. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv 10.1101/622803.
Nambiar, Ananthan; Heflin, Maeve; Liu, Simon; Maslov, Sergei; Hopkins, Mark; Ritz, Anna. Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks.. 2020. S2CID 226283020. doi:10.1145/3388440.3412467.
Rao, Roshan; Bhattacharya, Nicholas. Evaluating Protein Transfer Learning with TAPE. bioRxiv 10.1101/676825.
Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omer; Bowman, Samuel. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 353–355. S2CID 5034059. arXiv:1804.07461. doi:10.18653/v1/w18-5446.
Nambiar, Ananthan; Heflin, Maeve; Liu, Simon; Maslov, Sergei; Hopkins, Mark; Ritz, Anna. Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks.. 2020. S2CID 226283020. doi:10.1145/3388440.3412467.
towardsdatascience.com
He, Cheng. Transformer in CV. Transformer in CV. Towards Data Science. 31 December 2021 [2022-06-08]. (原始内容存档于2023-04-16).
web.archive.org
He, Cheng. Transformer in CV. Transformer in CV. Towards Data Science. 31 December 2021 [2022-06-08]. (原始内容存档于2023-04-16).
Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Florence, Italy: Association for Computational Linguistics). August 2019: 276–286 [2022-06-08]. doi:10.18653/v1/W19-4828. (原始内容存档于2020-10-21).
Yang, Zhilin Dai, Zihang Yang, Yiming Carbonell, Jaime Salakhutdinov, Ruslan Le, Quoc V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2019-06-19. OCLC 1106350082.