Vanishing gradient problem (English Wikipedia)

Analysis of information sources in references of the Wikipedia article "Vanishing gradient problem" in English language version.

refsWebsite
Global rank English rank
2nd place
2nd place
69th place
59th place
11th place
8th place
4th place
4th place
652nd place
515th place
5th place
5th place
low place
low place
18th place
17th place
low place
low place
low place
low place
low place
8,821st place
207th place
136th place
149th place
178th place
6,912th place
8,465th place

arxiv.org

  • Basodi, Sunitha; Ji, Chunyan; Zhang, Haiping; Pan, Yi (September 2020). "Gradient amplification: An efficient way to train deep neural networks". Big Data Mining and Analytics. 3 (3): 198. arXiv:2006.10560. doi:10.26599/BDMA.2020.9020004. ISSN 2096-0654. S2CID 219792172.
  • Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (15 June 2017). "Deep learning for computational chemistry". Journal of Computational Chemistry. 38 (16): 1291–1307. arXiv:1701.04503. Bibcode:2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810. S2CID 6831636.
  • Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (21 November 2012). "On the difficulty of training Recurrent Neural Networks". arXiv:1211.5063 [cs.LG].
  • Ioffe, Sergey; Szegedy, Christian (1 June 2015). "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". International Conference on Machine Learning. PMLR: 448–456. arXiv:1502.03167.
  • Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
  • He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1.
  • Veit, Andreas; Wilber, Michael; Belongie, Serge (20 May 2016). "Residual Networks Behave Like Ensembles of Relatively Shallow Networks". arXiv:1605.06431 [cs.CV].
  • Kumar, Siddharth Krishna. "On weight initialization in deep neural networks." arXiv preprint arXiv:1704.08863 (2017).

doi.org

doi.org

dx.doi.org

harvard.edu

ui.adsabs.harvard.edu

idsia.ch

people.idsia.ch

ieee.org

ieeexplore.ieee.org

mlr.press

proceedings.mlr.press

neurips.cc

proceedings.neurips.cc

nih.gov

pubmed.ncbi.nlm.nih.gov

psu.edu

citeseerx.ist.psu.edu

sciencedirect.com

semanticscholar.org

api.semanticscholar.org

toronto.edu

cs.toronto.edu

uni-bonn.de

ais.uni-bonn.de

worldcat.org

search.worldcat.org

  • Basodi, Sunitha; Ji, Chunyan; Zhang, Haiping; Pan, Yi (September 2020). "Gradient amplification: An efficient way to train deep neural networks". Big Data Mining and Analytics. 3 (3): 198. arXiv:2006.10560. doi:10.26599/BDMA.2020.9020004. ISSN 2096-0654. S2CID 219792172.
  • Bengio, Y.; Simard, P.; Frasconi, P. (March 1994). "Learning long-term dependencies with gradient descent is difficult". IEEE Transactions on Neural Networks. 5 (2): 157–166. doi:10.1109/72.279181. ISSN 1941-0093. PMID 18267787. S2CID 206457500.
  • Yilmaz, Ahmet; Poli, Riccardo (1 September 2022). "Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean". Neural Networks. 153: 87–103. doi:10.1016/j.neunet.2022.05.030. ISSN 0893-6080. PMID 35714424. S2CID 249487697.