Backpropagation (English Wikipedia)

Ramachandran, Prajit; Zoph, Barret; Le, Quoc V. (2017-10-27). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE].
Misra, Diganta (2019-08-23). "Mish: A Self Regularized Non-Monotonic Activation Function". arXiv:1908.08681 [cs.LG].
Martens, James (August 2020). "New Insights and Perspectives on the Natural Gradient Method". Journal of Machine Learning Research (21). arXiv:1412.1193.
Schmidhuber, Jürgen (2022). "Annotated History of Modern AI and Deep Learning". arXiv:2212.11279 [cs.NE].
Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.

auburn.edu (Global: 7,446^th place; English: 5,101^st place)

eng.auburn.edu

Wiliamowski, Bogdan; Yu, Hao (June 2010). "Improved Computation for Levenberg–Marquardt Training" (PDF). IEEE Transactions on Neural Networks and Learning Systems. 21 (6): 930. Bibcode:2010ITNN...21..930W. doi:10.1109/TNN.2010.2045657.

books.google.com (Global: 3^rd place; English: 3^rd place)

Leibniz, Gottfried Wilhelm Freiherr von (1920). The Early Mathematical Manuscripts of Leibniz: Translated from the Latin Texts Published by Carl Immanuel Gerhardt with Critical and Historical Notes (Leibniz published the chain rule in a 1676 memoir). Open court publishing Company. ISBN 978-0-598-81846-1. {{cite book}}: ISBN / Date incompatibility (help)
Griewank, Andreas; Walther, Andrea (2008). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition. SIAM. ISBN 978-0-89871-776-1.
Alpaydin, Ethem (2010). Introduction to Machine Learning. MIT Press. ISBN 978-0-262-01243-0.

deeplearningbook.org (Global: low place; English: low place)

Goodfellow, Bengio & Courville 2016, p. 214, "This table-filling strategy is sometimes called dynamic programming." Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). "6.5 Back-Propagation and Other Differentiation Algorithms". Deep Learning. MIT Press. pp. 200–220. ISBN 978-0-262-03561-3.
Goodfellow, Bengio & Courville 2016, p. 200, "The term back-propagation is often misunderstood as meaning the whole learning algorithm for multilayer neural networks. Backpropagation refers only to the method for computing the gradient, while other algorithms, such as stochastic gradient descent, is used to perform learning using this gradient." Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). "6.5 Back-Propagation and Other Differentiation Algorithms". Deep Learning. MIT Press. pp. 200–220. ISBN 978-0-262-03561-3.
Goodfellow, Bengio & Courville (2016, p. 217–218), "The back-propagation algorithm described here is only one approach to automatic differentiation. It is a special case of a broader class of techniques called reverse mode accumulation." Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). "6.5 Back-Propagation and Other Differentiation Algorithms". Deep Learning. MIT Press. pp. 200–220. ISBN 978-0-262-03561-3.

doi.org (Global: 2^nd place; English: 2^nd place)

Kelley, Henry J. (1960). "Gradient theory of optimal flight paths". ARS Journal. 30 (10): 947–954. doi:10.2514/8.5282.
Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (1986a). "Learning representations by back-propagating errors". Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID 205001834.
Tan, Hong Hui; Lim, King Han (2019). "Review of second-order optimization techniques in artificial neural networks backpropagation". IOP Conference Series: Materials Science and Engineering. 495 (1) 012003. Bibcode:2019MS&E..495a2003T. doi:10.1088/1757-899X/495/1/012003. S2CID 208124487.
Wiliamowski, Bogdan; Yu, Hao (June 2010). "Improved Computation for Levenberg–Marquardt Training" (PDF). IEEE Transactions on Neural Networks and Learning Systems. 21 (6): 930. Bibcode:2010ITNN...21..930W. doi:10.1109/TNN.2010.2045657.
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096.
Rodríguez, Omar Hernández; López Fernández, Jorge M. (2010). "A Semiotic Reflection on the Didactics of the Chain Rule". The Mathematics Enthusiast. 7 (2): 321–332. doi:10.54870/1551-3440.1191. S2CID 29739148. Retrieved 2019-08-04.
Robbins, H.; Monro, S. (1951). "A Stochastic Approximation Method". The Annals of Mathematical Statistics. 22 (3): 400. doi:10.1214/aoms/1177729586.
Dreyfus, Stuart (1962). "The numerical solution of variational problems". Journal of Mathematical Analysis and Applications. 5 (1): 30–45. doi:10.1016/0022-247x(62)90004-5.
Dreyfus, Stuart E. (1990). "Artificial Neural Networks, Back Propagation, and the Kelley-Bryson Gradient Procedure". Journal of Guidance, Control, and Dynamics. 13 (5): 926–928. Bibcode:1990JGCD...13..926D. doi:10.2514/3.25422.
Dreyfus, Stuart (1973). "The computational solution of optimal control problems with time lag". IEEE Transactions on Automatic Control. 18 (4): 383–385. doi:10.1109/tac.1973.1100330.
Linnainmaa, Seppo (1976). "Taylor expansion of the accumulated rounding error". BIT Numerical Mathematics. 16 (2): 146–160. doi:10.1007/bf01931367. S2CID 122357351.
Anderson, James A.; Rosenfeld, Edward, eds. (2000). Talking Nets: An Oral History of Neural Networks. The MIT Press. doi:10.7551/mitpress/6626.003.0016. ISBN 978-0-262-26715-1.
P. J. Werbos, "Backpropagation through time: what it does and how to do it," in Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, Oct. 1990, doi:10.1109/5.58337
Rumelhart; Hinton; Williams (1986). "Learning representations by back-propagating errors" (PDF). Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID 205001834.
Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
Chang, Franklin; Dell, Gary S.; Bock, Kathryn (2006). "Becoming syntactic". Psychological Review. 113 (2): 234–272. doi:10.1037/0033-295x.113.2.234. PMID 16637761.
Janciauskas, Marius; Chang, Franklin (2018). "Input and Age-Dependent Variation in Second Language Learning: A Connectionist Account". Cognitive Science. 42 (Suppl Suppl 2): 519–554. doi:10.1111/cogs.12519. PMC 6001481. PMID 28744901.
Fitz, Hartmut; Chang, Franklin (2019). "Language ERPs reflect learning through prediction error propagation". Cognitive Psychology. 111: 15–52. doi:10.1016/j.cogpsych.2019.03.002. hdl:21.11116/0000-0003-474D-8. PMID 30921626. S2CID 85501792.

hal.science (Global: low place; English: low place)

LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096.

handle.net (Global: 102^nd place; English: 76^th place)

hdl.handle.net

Fitz, Hartmut; Chang, Franklin (2019). "Language ERPs reflect learning through prediction error propagation". Cognitive Psychology. 111: 15–52. doi:10.1016/j.cogpsych.2019.03.002. hdl:21.11116/0000-0003-474D-8. PMID 30921626. S2CID 85501792.

harvard.edu (Global: 18^th place; English: 17^th place)

ui.adsabs.harvard.edu

Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (1986a). "Learning representations by back-propagating errors". Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID 205001834.
Tan, Hong Hui; Lim, King Han (2019). "Review of second-order optimization techniques in artificial neural networks backpropagation". IOP Conference Series: Materials Science and Engineering. 495 (1) 012003. Bibcode:2019MS&E..495a2003T. doi:10.1088/1757-899X/495/1/012003. S2CID 208124487.
Wiliamowski, Bogdan; Yu, Hao (June 2010). "Improved Computation for Levenberg–Marquardt Training" (PDF). IEEE Transactions on Neural Networks and Learning Systems. 21 (6): 930. Bibcode:2010ITNN...21..930W. doi:10.1109/TNN.2010.2045657.
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096.
Dreyfus, Stuart E. (1990). "Artificial Neural Networks, Back Propagation, and the Kelley-Bryson Gradient Procedure". Journal of Guidance, Control, and Dynamics. 13 (5): 926–928. Bibcode:1990JGCD...13..926D. doi:10.2514/3.25422.
Rumelhart; Hinton; Williams (1986). "Learning representations by back-propagating errors" (PDF). Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID 205001834.

ieee.org (Global: 652^nd place; English: 515^th place)

spectrum.ieee.org

"Photonic Chips Curb AI Training's Energy Appetite - IEEE Spectrum". IEEE. Retrieved 2023-05-25.

incompleteideas.net (Global: low place; English: low place)

Sutton, Richard S.; Barto, Andrew G. (2018). "11.1 TD-Gammon". Reinforcement Learning: An Introduction (2nd ed.). Cambridge, MA: MIT Press.

janbasktraining.com (Global: low place; English: low place)

"Decoding the Power of Backpropagation: A Deep Dive into Advanced Neural Network Techniques". janbasktraining.com. 30 January 2024.

mit.edu (Global: 415^th place; English: 327^th place)

direct.mit.edu

Anderson, James A.; Rosenfeld, Edward, eds. (2000). Talking Nets: An Oral History of Neural Networks. The MIT Press. doi:10.7551/mitpress/6626.003.0016. ISBN 978-0-262-26715-1.

neuralnetworksanddeeplearning.com (Global: low place; English: low place)

This follows Nielsen (2015), and means (left) multiplication by the matrix $W^{l}$ corresponds to converting output values of layer $l-1$ to input values of layer $l$ : columns correspond to input coordinates, rows correspond to output coordinates. Nielsen, Michael A. (2015). "How the backpropagation algorithm works". Neural Networks and Deep Learning. Determination Press.
This section largely follows and summarizes Nielsen (2015). Nielsen, Michael A. (2015). "How the backpropagation algorithm works". Neural Networks and Deep Learning. Determination Press.
Nielsen (2015), "[W]hat assumptions do we need to make about our cost function ... in order that backpropagation can be applied? The first assumption we need is that the cost function can be written as an average ... over cost functions ... for individual training examples ... The second assumption we make about the cost is that it can be written as a function of the outputs from the neural network ..." Nielsen, Michael A. (2015). "How the backpropagation algorithm works". Neural Networks and Deep Learning. Determination Press.

neurips.cc (Global: low place; English: low place)

proceedings.neurips.cc

Pomerleau, Dean A. (1988). "ALVINN: An Autonomous Land Vehicle in a Neural Network". Advances in Neural Information Processing Systems. 1. Morgan-Kaufmann.

nih.gov (Global: 4^th place; English: 4^th place)

pubmed.ncbi.nlm.nih.gov

LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096.
Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
Chang, Franklin; Dell, Gary S.; Bock, Kathryn (2006). "Becoming syntactic". Psychological Review. 113 (2): 234–272. doi:10.1037/0033-295x.113.2.234. PMID 16637761.
Janciauskas, Marius; Chang, Franklin (2018). "Input and Age-Dependent Variation in Second Language Learning: A Connectionist Account". Cognitive Science. 42 (Suppl Suppl 2): 519–554. doi:10.1111/cogs.12519. PMC 6001481. PMID 28744901.
Fitz, Hartmut; Chang, Franklin (2019). "Language ERPs reflect learning through prediction error propagation". Cognitive Psychology. 111: 15–52. doi:10.1016/j.cogpsych.2019.03.002. hdl:21.11116/0000-0003-474D-8. PMID 30921626. S2CID 85501792.

ncbi.nlm.nih.gov

Janciauskas, Marius; Chang, Franklin (2018). "Input and Age-Dependent Variation in Second Language Learning: A Connectionist Account". Cognitive Science. 42 (Suppl Suppl 2): 519–554. doi:10.1111/cogs.12519. PMC 6001481. PMID 28744901.

nobelprize.org (Global: 301^st place; English: 478^th place)

"The Nobel Prize in Physics 2024". NobelPrize.org. Retrieved 2024-10-13.

semanticscholar.org (Global: 11^th place; English: 8^th place)

api.semanticscholar.org

Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (1986a). "Learning representations by back-propagating errors". Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID 205001834.
Tan, Hong Hui; Lim, King Han (2019). "Review of second-order optimization techniques in artificial neural networks backpropagation". IOP Conference Series: Materials Science and Engineering. 495 (1) 012003. Bibcode:2019MS&E..495a2003T. doi:10.1088/1757-899X/495/1/012003. S2CID 208124487.
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096.
Rodríguez, Omar Hernández; López Fernández, Jorge M. (2010). "A Semiotic Reflection on the Didactics of the Chain Rule". The Mathematics Enthusiast. 7 (2): 321–332. doi:10.54870/1551-3440.1191. S2CID 29739148. Retrieved 2019-08-04.
Linnainmaa, Seppo (1976). "Taylor expansion of the accumulated rounding error". BIT Numerical Mathematics. 16 (2): 146–160. doi:10.1007/bf01931367. S2CID 122357351.
Griewank, Andreas (2012). "Who Invented the Reverse Mode of Differentiation?". Optimization Stories. Documenta Mathematica, Extra Volume ISMP. pp. 389–400. S2CID 15568746.
Rumelhart; Hinton; Williams (1986). "Learning representations by back-propagating errors" (PDF). Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID 205001834.
Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
Wan, Eric A. (1994). "Time Series Prediction by Using a Connectionist Network with Internal Delay Lines". In Weigend, Andreas S.; Gershenfeld, Neil A. (eds.). Time Series Prediction: Forecasting the Future and Understanding the Past. Proceedings of the NATO Advanced Research Workshop on Comparative Time Series Analysis. Vol. 15. Reading: Addison-Wesley. pp. 195–217. ISBN 0-201-62601-2. S2CID 12652643.
Fitz, Hartmut; Chang, Franklin (2019). "Language ERPs reflect learning through prediction error propagation". Cognitive Psychology. 111: 15–52. doi:10.1016/j.cogpsych.2019.03.002. hdl:21.11116/0000-0003-474D-8. PMID 30921626. S2CID 85501792.

sudoc.fr (Global: 4,537^th place; English: low place)

Le Cun, Yann (1987). Modèles connexionnistes de l'apprentissage (Thèse de doctorat d'état thesis). Paris, France: Université Pierre et Marie Curie.

toronto.edu (Global: low place; English: 8,821^st place)

cs.toronto.edu

Rumelhart; Hinton; Williams (1986). "Learning representations by back-propagating errors" (PDF). Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID 205001834.

umt.edu (Global: 9,893^rd place; English: 6,252^nd place)

scholarworks.umt.edu

Rodríguez, Omar Hernández; López Fernández, Jorge M. (2010). "A Semiotic Reflection on the Didactics of the Chain Rule". The Mathematics Enthusiast. 7 (2): 321–332. doi:10.54870/1551-3440.1191. S2CID 29739148. Retrieved 2019-08-04.

web.archive.org (Global: 1^st place; English: 1^st place)

Werbos, Paul (1982). "Applications of advances in nonlinear sensitivity analysis" (PDF). System modeling and optimization. Springer. pp. 762–770. Archived (PDF) from the original on 14 April 2016. Retrieved 2 July 2017.
Olazaran Rodriguez, Jose Miguel. A historical sociology of neural network research. PhD Dissertation. University of Edinburgh, 1991.

werbos.com (Global: low place; English: low place)

Werbos, Paul (1982). "Applications of advances in nonlinear sensitivity analysis" (PDF). System modeling and optimization. Springer. pp. 762–770. Archived (PDF) from the original on 14 April 2016. Retrieved 2 July 2017.

worldcat.org (Global: 5^th place; English: 5^th place)

search.worldcat.org

Bryson, Arthur E. (1962). "A gradient method for optimizing multi-stage allocation processes". Proceedings of the Harvard Univ. Symposium on digital computers and their applications, 3–6 April 1961. Cambridge: Harvard University Press. OCLC 498866871.
Hertz, John (1991). Introduction to the theory of neural computation. Krogh, Anders., Palmer, Richard G. Redwood City, Calif.: Addison-Wesley. p. 8. ISBN 0-201-50395-6. OCLC 21522159.

wpengine.com (Global: low place; English: low place)

coeieor.wpengine.com

Mizutani, Eiji; Dreyfus, Stuart; Nishio, Kenichi (July 2000). "On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application" (PDF). Proceedings of the IEEE International Joint Conference on Neural Networks.

Backpropagation (English Wikipedia)

archive.org (Global: 6th place; English: 6th place)

arxiv.org (Global: 69th place; English: 59th place)

auburn.edu (Global: 7,446th place; English: 5,101st place)

eng.auburn.edu

books.google.com (Global: 3rd place; English: 3rd place)

deeplearningbook.org (Global: low place; English: low place)

doi.org (Global: 2nd place; English: 2nd place)

hal.science (Global: low place; English: low place)

handle.net (Global: 102nd place; English: 76th place)

hdl.handle.net

harvard.edu (Global: 18th place; English: 17th place)

ui.adsabs.harvard.edu

ieee.org (Global: 652nd place; English: 515th place)

spectrum.ieee.org

incompleteideas.net (Global: low place; English: low place)

janbasktraining.com (Global: low place; English: low place)

mit.edu (Global: 415th place; English: 327th place)

direct.mit.edu

neuralnetworksanddeeplearning.com (Global: low place; English: low place)

neurips.cc (Global: low place; English: low place)

proceedings.neurips.cc

nih.gov (Global: 4th place; English: 4th place)

pubmed.ncbi.nlm.nih.gov

ncbi.nlm.nih.gov

nobelprize.org (Global: 301st place; English: 478th place)

semanticscholar.org (Global: 11th place; English: 8th place)

api.semanticscholar.org

sudoc.fr (Global: 4,537th place; English: low place)

toronto.edu (Global: low place; English: 8,821st place)

cs.toronto.edu

umt.edu (Global: 9,893rd place; English: 6,252nd place)

scholarworks.umt.edu

web.archive.org (Global: 1st place; English: 1st place)

werbos.com (Global: low place; English: low place)

worldcat.org (Global: 5th place; English: 5th place)

search.worldcat.org

wpengine.com (Global: low place; English: low place)

coeieor.wpengine.com

archive.org (Global: 6^th place; English: 6^th place)

arxiv.org (Global: 69^th place; English: 59^th place)

auburn.edu (Global: 7,446^th place; English: 5,101^st place)

books.google.com (Global: 3^rd place; English: 3^rd place)

doi.org (Global: 2^nd place; English: 2^nd place)

handle.net (Global: 102^nd place; English: 76^th place)

harvard.edu (Global: 18^th place; English: 17^th place)

ieee.org (Global: 652^nd place; English: 515^th place)

mit.edu (Global: 415^th place; English: 327^th place)

nih.gov (Global: 4^th place; English: 4^th place)

nobelprize.org (Global: 301^st place; English: 478^th place)

semanticscholar.org (Global: 11^th place; English: 8^th place)

sudoc.fr (Global: 4,537^th place; English: low place)

toronto.edu (Global: low place; English: 8,821^st place)

umt.edu (Global: 9,893^rd place; English: 6,252^nd place)

web.archive.org (Global: 1^st place; English: 1^st place)

worldcat.org (Global: 5^th place; English: 5^th place)