Kiwiel, Krzysztof C. (2001). «Convergence and efficiency of subgradient methods for quasiconvex minimization». Mathematical Programming, Series A (Berlin, Heidelberg: Springer) 90 (1): 1-25. ISSN0025-5610. doi:10.1007/PL00011414.
Darken, Christian; Moody, John (1990). «Fast adaptive k-means clustering: some empirical results». Int'l Joint Conf. on Neural Networks (IJCNN). doi:10.1109/IJCNN.1990.137720.
Spall, J. C. (2000). «Adaptive Stochastic Approximation by the Simultaneous Perturbation Method». IEEE Transactions on Automatic Control45 (10): 1839−1853. doi:10.1109/TAC.2000.880982.
Spall, J. C. (2009). «Feedback and Weighting Mechanisms for Improving Jacobian Estimates in the Adaptive Simultaneous Perturbation Algorithm». IEEE Transactions on Automatic Control54 (6): 1216-1229. doi:10.1109/TAC.2009.2019793.
Kiwiel, Krzysztof C. (2001). «Convergence and efficiency of subgradient methods for quasiconvex minimization». Mathematical Programming, Series A (Berlin, Heidelberg: Springer) 90 (1): 1-25. ISSN0025-5610. doi:10.1007/PL00011414.
Sutskever, Ilya; Martens, James; Dahl, George; Hinton, Geoffrey E. (Junio de 2013). «On the importance of initialization and momentum in deep learning». En Dasgupta, Sanjoy, Mcallester, David, ed. In Proceedings of the 30th international conference on machine learning (ICML-13) (Atlanta, GA) 28 (1139–1147). Consultado el 14 de enero de 2016.