Bottou, Léon; Bousquet, Olivier (2008). The Tradeoffs of Large Scale Learning. Advances in Neural Information Processing Systems. Vol. 20. pp. 161–168.
Ferguson, Thomas S. (1982). “An inconsistent maximum likelihood estimate”. Journal of the American Statistical Association77 (380): 831–834. doi:10.1080/01621459.1982.10477894. JSTOR2287314.
Kiwiel, Krzysztof C. (2001年). “Convergence and efficiency of subgradient methods for quasiconvex minimization”. Mathematical Programming (Series A) (Berlin, Heidelberg: Springer) 90 (1): pp. 1–25. doi:10.1007/PL00011414. ISSN0025-5610
Herbert Robbins; Sutton Monro (1951). “A Stochastic Approximation Method”. Ann. Math. Statist.22 (3): 400-407. doi:10.1214/aoms/1177729586.
Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (8 October 1986). “Learning representations by back-propagating errors”. Nature323 (6088): 533–536. doi:10.1038/323533a0.
Ferguson, Thomas S. (1982). “An inconsistent maximum likelihood estimate”. Journal of the American Statistical Association77 (380): 831–834. doi:10.1080/01621459.1982.10477894. JSTOR2287314.
Kiwiel, Krzysztof C. (2001年). “Convergence and efficiency of subgradient methods for quasiconvex minimization”. Mathematical Programming (Series A) (Berlin, Heidelberg: Springer) 90 (1): pp. 1–25. doi:10.1007/PL00011414. ISSN0025-5610