Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (21 November 2012). "On the difficulty of training Recurrent Neural Networks". arXiv:1211.5063 [cs.LG].
Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). "Gradient flow in recurrent nets: the difficulty of learning long-term dependencies". In Kremer, S. C.; Kolen, J. F. (eds.). A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press. doi:10.1109/9780470544037.ch14. ISBN0-7803-5369-2.
Bengio, Y.; Frasconi, P.; Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. IEEE International Conference on Neural Networks. IEEE. pp. 1183–1188. doi:10.1109/ICNN.1993.298725. ISBN978-0-7803-0999-9.
Santurkar, Shibani; Tsipras, Dimitris; Ilyas, Andrew; Madry, Aleksander (2018). "How Does Batch Normalization Help Optimization?". Advances in Neural Information Processing Systems. 31. Curran Associates, Inc.