For insight on choosing the parameter, see, e.g., Wahba, Grace; Yonghua Wang (1990). "When is the optimal regularization parameter insensitive to the choice of the loss function". Communications in Statistics – Theory and Methods. 19 (5): 1685–1700. doi:10.1080/03610929008830285.
Schölkopf, Bernhard; Herbrich, Ralf; Smola, Alexander J. (2001). "A generalized representer theorem". In Helmbold, David P.; Williamson, Robert C. (eds.). Computational Learning Theory, 14th Annual Conference on Computational Learning Theory, COLT 2001 and 5th European Conference on Computational Learning Theory, EuroCOLT 2001, Amsterdam, The Netherlands, July 16–19, 2001, Proceedings. Lecture Notes in Computer Science. Vol. 2111. Springer. pp. 416–426. doi:10.1007/3-540-44581-1_27. ISBN978-3-540-42343-0.