Goodfellow, Bengio & Courville 2016, p. 200. , "The term back-propagation is often misunderstood as meaning the whole learning algorithm for multilayer neural networks. Backpropagation refers only to the method for computing the gradient, while other algorithms, such as stochastic gradient descent, is used to perform learning using this gradient."
Tan, Hong Hui; Lim, King Han (). „Review of second-order optimization techniques in artificial neural networks backpropagation”. IOP Conference Series: Materials Science and Engineering. 495 (1): 012003. Bibcode:2019MS&E..495a2003T. doi:10.1088/1757-899X/495/1/012003.
Dreyfus, Stuart (). „The numerical solution of variational problems”. Journal of Mathematical Analysis and Applications. 5 (1): 30–45. doi:10.1016/0022-247x(62)90004-5.
Dreyfus, Stuart (). „The computational solution of optimal control problems with time lag”. IEEE Transactions on Automatic Control. 18 (4): 383–385. doi:10.1109/tac.1973.1100330.
Tan, Hong Hui; Lim, King Han (). „Review of second-order optimization techniques in artificial neural networks backpropagation”. IOP Conference Series: Materials Science and Engineering. 495 (1): 012003. Bibcode:2019MS&E..495a2003T. doi:10.1088/1757-899X/495/1/012003.
Bryson, Arthur E. (). „A gradient method for optimizing multi-stage allocation processes”. Proceedings of the Harvard Univ. Symposium on digital computers and their applications, 3–6 April 1961. Cambridge: Harvard University Press. OCLC498866871.
Hertz, John (). Introduction to the theory of neural computation. Krogh, Anders., Palmer, Richard G. Redwood City, Calif.: Addison-Wesley. p. 8. ISBN0-201-50395-6. OCLC21522159.