Xiong, Ruibin, et al. "On layer normalization in the transformer architecture." International Conference on Machine Learning. PMLR, 2020. https://proceedings.mlr.press/v119/xiong20b
neurips.cc
proceedings.neurips.cc
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin: Attention is All you Need. (PDF) In: Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017, abgerufen am 29. April 2024 (englisch).