Vanishing gradient problem (English Wikipedia)

Analysis of information sources in references of the Wikipedia article "Vanishing gradient problem" in English language version.

refsWebsite

Global rank English rank

13doi.org

2^nd place

8arxiv.org

69^th place

59^th place

8semanticscholar.org

11^th place

8^th place

6nih.gov

4^th place

4ieee.org

652^nd place

515^th place

3worldcat.org

5^th place

2idsia.ch

low place

2harvard.edu

18^th place

17^th place

2mlr.press

low place

1neurips.cc

low place

1toronto.edu

low place

8,821^st place

1psu.edu

207^th place

136^th place

1sciencedirect.com

149^th place

178^th place

1handle.net

102^nd place

76^th place

1uni-bonn.de

6,912^th place

8,465^th place

arxiv.org

Basodi, Sunitha; Ji, Chunyan; Zhang, Haiping; Pan, Yi (September 2020). "Gradient amplification: An efficient way to train deep neural networks". Big Data Mining and Analytics. 3 (3): 198. arXiv:2006.10560. doi:10.26599/BDMA.2020.9020004. ISSN 2096-0654. S2CID 219792172.
Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (15 June 2017). "Deep learning for computational chemistry". Journal of Computational Chemistry. 38 (16): 1291–1307. arXiv:1701.04503. Bibcode:2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810. S2CID 6831636.
Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (21 November 2012). "On the difficulty of training Recurrent Neural Networks". arXiv:1211.5063 [cs.LG].
Ioffe, Sergey; Szegedy, Christian (1 June 2015). "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". International Conference on Machine Learning. PMLR: 448–456. arXiv:1502.03167.
Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1.
Veit, Andreas; Wilber, Michael; Belongie, Serge (20 May 2016). "Residual Networks Behave Like Ensembles of Relatively Shallow Networks". arXiv:1605.06431 [cs.CV].
Kumar, Siddharth Krishna. "On weight initialization in deep neural networks." arXiv preprint arXiv:1704.08863 (2017).

doi.org

Basodi, Sunitha; Ji, Chunyan; Zhang, Haiping; Pan, Yi (September 2020). "Gradient amplification: An efficient way to train deep neural networks". Big Data Mining and Analytics. 3 (3): 198. arXiv:2006.10560. doi:10.26599/BDMA.2020.9020004. ISSN 2096-0654. S2CID 219792172.
Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). "Gradient flow in recurrent nets: the difficulty of learning long-term dependencies". In Kremer, S. C.; Kolen, J. F. (eds.). A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press. doi:10.1109/9780470544037.ch14. ISBN 0-7803-5369-2.
Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (15 June 2017). "Deep learning for computational chemistry". Journal of Computational Chemistry. 38 (16): 1291–1307. arXiv:1701.04503. Bibcode:2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810. S2CID 6831636.
Bengio, Y.; Frasconi, P.; Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. IEEE International Conference on Neural Networks. IEEE. pp. 1183–1188. doi:10.1109/ICNN.1993.298725. ISBN 978-0-7803-0999-9.
Doya, K. (1992). "Bifurcations in the learning of recurrent neural networks". [Proceedings] 1992 IEEE International Symposium on Circuits and Systems. Vol. 6. IEEE. pp. 2777–2780. doi:10.1109/iscas.1992.230622. ISBN 0-7803-0593-0. S2CID 15069221.
Bengio, Y.; Simard, P.; Frasconi, P. (March 1994). "Learning long-term dependencies with gradient descent is difficult". IEEE Transactions on Neural Networks. 5 (2): 157–166. doi:10.1109/72.279181. ISSN 1941-0093. PMID 18267787. S2CID 206457500.
Hochreiter, Sepp; Schmidhuber, Jürgen (1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.
Hinton, G. E.; Osindero, S.; Teh, Y. (2006). "A fast learning algorithm for deep belief nets" (PDF). Neural Computation. 18 (7): 1527–1554. CiteSeerX 10.1.1.76.1541. doi:10.1162/neco.2006.18.7.1527. PMID 16764513. S2CID 2309950.
Hinton, G. (2009). "Deep belief networks". Scholarpedia. 4 (5): 5947. Bibcode:2009SchpJ...4.5947H. doi:10.4249/scholarpedia.5947.
Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1.
Yilmaz, Ahmet; Poli, Riccardo (1 September 2022). "Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean". Neural Networks. 153: 87–103. doi:10.1016/j.neunet.2022.05.030. hdl:11492/6392. ISSN 0893-6080. PMID 35714424. S2CID 249487697.

dx.doi.org

Doya, K. (1992). "Bifurcations in the learning of recurrent neural networks". [Proceedings] 1992 IEEE International Symposium on Circuits and Systems. Vol. 6. IEEE. pp. 2777–2780. doi:10.1109/iscas.1992.230622. ISBN 0-7803-0593-0. S2CID 15069221.

handle.net

hdl.handle.net

Yilmaz, Ahmet; Poli, Riccardo (1 September 2022). "Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean". Neural Networks. 153: 87–103. doi:10.1016/j.neunet.2022.05.030. hdl:11492/6392. ISSN 0893-6080. PMID 35714424. S2CID 249487697.

harvard.edu

ui.adsabs.harvard.edu

Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (15 June 2017). "Deep learning for computational chemistry". Journal of Computational Chemistry. 38 (16): 1291–1307. arXiv:1701.04503. Bibcode:2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810. S2CID 6831636.
Hinton, G. (2009). "Deep belief networks". Scholarpedia. 4 (5): 5947. Bibcode:2009SchpJ...4.5947H. doi:10.4249/scholarpedia.5947.

idsia.ch

people.idsia.ch

Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (Diplom thesis). Institut f. Informatik, Technische Univ. Munich.
"Sepp Hochreiter's Fundamental Deep Learning Problem (1991)". people.idsia.ch. Retrieved 7 January 2017.

ieee.org

ieeexplore.ieee.org

Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). "Gradient flow in recurrent nets: the difficulty of learning long-term dependencies". In Kremer, S. C.; Kolen, J. F. (eds.). A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press. doi:10.1109/9780470544037.ch14. ISBN 0-7803-5369-2.
Bengio, Y.; Frasconi, P.; Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. IEEE International Conference on Neural Networks. IEEE. pp. 1183–1188. doi:10.1109/ICNN.1993.298725. ISBN 978-0-7803-0999-9.
Bengio, Y.; Simard, P.; Frasconi, P. (March 1994). "Learning long-term dependencies with gradient descent is difficult". IEEE Transactions on Neural Networks. 5 (2): 157–166. doi:10.1109/72.279181. ISSN 1941-0093. PMID 18267787. S2CID 206457500.
He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1.

mlr.press

proceedings.mlr.press

Ioffe, Sergey; Szegedy, Christian (1 June 2015). "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". International Conference on Machine Learning. PMLR: 448–456. arXiv:1502.03167.
Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua (14 June 2011). "Deep Sparse Rectifier Neural Networks". PMLR: 315–323.

neurips.cc

proceedings.neurips.cc

Santurkar, Shibani; Tsipras, Dimitris; Ilyas, Andrew; Madry, Aleksander (2018). "How Does Batch Normalization Help Optimization?". Advances in Neural Information Processing Systems. 31. Curran Associates, Inc.

nih.gov

pubmed.ncbi.nlm.nih.gov

Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (15 June 2017). "Deep learning for computational chemistry". Journal of Computational Chemistry. 38 (16): 1291–1307. arXiv:1701.04503. Bibcode:2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810. S2CID 6831636.
Bengio, Y.; Simard, P.; Frasconi, P. (March 1994). "Learning long-term dependencies with gradient descent is difficult". IEEE Transactions on Neural Networks. 5 (2): 157–166. doi:10.1109/72.279181. ISSN 1941-0093. PMID 18267787. S2CID 206457500.
Hochreiter, Sepp; Schmidhuber, Jürgen (1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.
Hinton, G. E.; Osindero, S.; Teh, Y. (2006). "A fast learning algorithm for deep belief nets" (PDF). Neural Computation. 18 (7): 1527–1554. CiteSeerX 10.1.1.76.1541. doi:10.1162/neco.2006.18.7.1527. PMID 16764513. S2CID 2309950.
Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
Yilmaz, Ahmet; Poli, Riccardo (1 September 2022). "Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean". Neural Networks. 153: 87–103. doi:10.1016/j.neunet.2022.05.030. hdl:11492/6392. ISSN 0893-6080. PMID 35714424. S2CID 249487697.

psu.edu

citeseerx.ist.psu.edu

Hinton, G. E.; Osindero, S.; Teh, Y. (2006). "A fast learning algorithm for deep belief nets" (PDF). Neural Computation. 18 (7): 1527–1554. CiteSeerX 10.1.1.76.1541. doi:10.1162/neco.2006.18.7.1527. PMID 16764513. S2CID 2309950.

sciencedirect.com

Yilmaz, Ahmet; Poli, Riccardo (1 September 2022). "Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean". Neural Networks. 153: 87–103. doi:10.1016/j.neunet.2022.05.030. hdl:11492/6392. ISSN 0893-6080. PMID 35714424. S2CID 249487697.

semanticscholar.org

api.semanticscholar.org

Basodi, Sunitha; Ji, Chunyan; Zhang, Haiping; Pan, Yi (September 2020). "Gradient amplification: An efficient way to train deep neural networks". Big Data Mining and Analytics. 3 (3): 198. arXiv:2006.10560. doi:10.26599/BDMA.2020.9020004. ISSN 2096-0654. S2CID 219792172.
Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (15 June 2017). "Deep learning for computational chemistry". Journal of Computational Chemistry. 38 (16): 1291–1307. arXiv:1701.04503. Bibcode:2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810. S2CID 6831636.
Doya, K. (1992). "Bifurcations in the learning of recurrent neural networks". [Proceedings] 1992 IEEE International Symposium on Circuits and Systems. Vol. 6. IEEE. pp. 2777–2780. doi:10.1109/iscas.1992.230622. ISBN 0-7803-0593-0. S2CID 15069221.
Bengio, Y.; Simard, P.; Frasconi, P. (March 1994). "Learning long-term dependencies with gradient descent is difficult". IEEE Transactions on Neural Networks. 5 (2): 157–166. doi:10.1109/72.279181. ISSN 1941-0093. PMID 18267787. S2CID 206457500.
Hochreiter, Sepp; Schmidhuber, Jürgen (1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.
Hinton, G. E.; Osindero, S.; Teh, Y. (2006). "A fast learning algorithm for deep belief nets" (PDF). Neural Computation. 18 (7): 1527–1554. CiteSeerX 10.1.1.76.1541. doi:10.1162/neco.2006.18.7.1527. PMID 16764513. S2CID 2309950.
Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
Yilmaz, Ahmet; Poli, Riccardo (1 September 2022). "Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean". Neural Networks. 153: 87–103. doi:10.1016/j.neunet.2022.05.030. hdl:11492/6392. ISSN 0893-6080. PMID 35714424. S2CID 249487697.

toronto.edu

cs.toronto.edu

Hinton, G. E.; Osindero, S.; Teh, Y. (2006). "A fast learning algorithm for deep belief nets" (PDF). Neural Computation. 18 (7): 1527–1554. CiteSeerX 10.1.1.76.1541. doi:10.1162/neco.2006.18.7.1527. PMID 16764513. S2CID 2309950.

uni-bonn.de

ais.uni-bonn.de

Sven Behnke (2003). Hierarchical Neural Networks for Image Interpretation (PDF). Lecture Notes in Computer Science. Vol. 2766. Springer.

worldcat.org

search.worldcat.org

Basodi, Sunitha; Ji, Chunyan; Zhang, Haiping; Pan, Yi (September 2020). "Gradient amplification: An efficient way to train deep neural networks". Big Data Mining and Analytics. 3 (3): 198. arXiv:2006.10560. doi:10.26599/BDMA.2020.9020004. ISSN 2096-0654. S2CID 219792172.
Bengio, Y.; Simard, P.; Frasconi, P. (March 1994). "Learning long-term dependencies with gradient descent is difficult". IEEE Transactions on Neural Networks. 5 (2): 157–166. doi:10.1109/72.279181. ISSN 1941-0093. PMID 18267787. S2CID 206457500.
Yilmaz, Ahmet; Poli, Riccardo (1 September 2022). "Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean". Neural Networks. 153: 87–103. doi:10.1016/j.neunet.2022.05.030. hdl:11492/6392. ISSN 0893-6080. PMID 35714424. S2CID 249487697.