Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (21 November 2012). "On the difficulty of training Recurrent Neural Networks". arXiv:1211.5063 [cs.LG].
Santurkar, Shibani; Tsipras, Dimitris; Ilyas, Andrew; Madry, Aleksander (2018). "How Does Batch Normalization Help Optimization?". Advances in Neural Information Processing Systems. 31. Curran Associates, Inc.