Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna (November 2019). «Revealing the Dark Secrets of BERT». Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)(en inglés estadounidense). pp. 4364-4373. doi:10.18653/v1/D19-1445.
arxiv.org
Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan (2018). «Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context». Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics): 284-294. Bibcode:2018arXiv180504623K. arXiv:1805.04623. doi:10.18653/v1/p18-1027.
Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco (2018). «Colorless Green Recurrent Networks Dream Hierarchically». Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics): 1195-1205. Bibcode:2018arXiv180311138G. arXiv:1803.11138. doi:10.18653/v1/n18-1108.
Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem (2018). «Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information». Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics): 240-248. Bibcode:2018arXiv180808079G. arXiv:1808.08079. doi:10.18653/v1/w18-5426.
Dai, Andrew M.; Le, Quoc V. (4 de noviembre de 2015). «Semi-supervised Sequence Learning». arXiv:1511.01432 [cs]. Consultado el 28 de julio de 2020.
Peters, Matthew E.; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Zettlemoyer, Luke (22 de marzo de 2018). «Deep contextualized word representations». arXiv:1802.05365 [cs]. Consultado el 28 de julio de 2020.
Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna (November 2019). «Revealing the Dark Secrets of BERT». Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)(en inglés estadounidense). pp. 4364-4373. doi:10.18653/v1/D19-1445.
Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. (2019). «What Does BERT Look at? An Analysis of BERT's Attention». Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics): 276-286. doi:10.18653/v1/w19-4828.
Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan (2018). «Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context». Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics): 284-294. Bibcode:2018arXiv180504623K. arXiv:1805.04623. doi:10.18653/v1/p18-1027.
Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco (2018). «Colorless Green Recurrent Networks Dream Hierarchically». Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics): 1195-1205. Bibcode:2018arXiv180311138G. arXiv:1803.11138. doi:10.18653/v1/n18-1108.
Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem (2018). «Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information». Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics): 240-248. Bibcode:2018arXiv180808079G. arXiv:1808.08079. doi:10.18653/v1/w18-5426.
Zhang, Kelly; Bowman, Samuel (2018). «Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis». Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics): 359-361. doi:10.18653/v1/w18-5448.
Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan (2018). «Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context». Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics): 284-294. Bibcode:2018arXiv180504623K. arXiv:1805.04623. doi:10.18653/v1/p18-1027.
Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco (2018). «Colorless Green Recurrent Networks Dream Hierarchically». Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics): 1195-1205. Bibcode:2018arXiv180311138G. arXiv:1803.11138. doi:10.18653/v1/n18-1108.
Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem (2018). «Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information». Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics): 240-248. Bibcode:2018arXiv180808079G. arXiv:1808.08079. doi:10.18653/v1/w18-5426.
Montti, Roger (10 de diciembre de 2019). «Google's BERT Rolls Out Worldwide». Search Engine Journal. Search Engine Journal. Consultado el 10 de diciembre de 2019.