Cireşan, Dan Claudiu; Meier, Ueli; Gambardella, Luca Maria; Schmidhuber, Jürgen (21 September 2010). “Deep, Big, Simple Neural Nets for Handwritten Digit Recognition”. Neural Computation22 (12): 3207–3220. arXiv:1003.0358. doi:10.1162/neco_a_00052. ISSN0899-7667. PMID20858131.
Ciresan, D.; Meier, U.; Schmidhuber, J. (2012). “Multi-column deep neural networks for image classification”. 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3642–3649. arXiv:1202.2745. doi:10.1109/cvpr.2012.6248110. ISBN978-1-4673-1228-8
Simonyan, Karen; Andrew, Zisserman (2014). “Very Deep Convolution Networks for Large Scale Image Recognition”. arXiv:1409.1556 [cs.CV].
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. (26 February 2018). “Progressive Growing of GANs for Improved Quality, Stability, and Variation”. arXiv:1710.10196 [cs.NE].
starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes itPGGAN paper
"making normalization a part of the model architecture and performing the normalization for each training mini-batch." Sergey Ioffe, et. al.. (2015)
Farley, B.G.; W.A. Clark (1954). “Simulation of Self-Organizing Systems by Digital Computer”. IRE Transactions on Information Theory4 (4): 76–84. doi:10.1109/TIT.1954.1057468.
Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). “Tests on a cell assembly theory of the action of the brain, using a large digital computer”. IRE Transactions on Information Theory2 (3): 80–93. doi:10.1109/TIT.1956.1056810.
Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model For Information Storage And Organization in the Brain”. Psychological Review65 (6): 386–408. doi:10.1037/h0042519. PMID13602029.
Olazaran, Mikel (1996). “A Sociological Study of the Official History of the Perceptrons Controversy”. Social Studies of Science26 (3): 611–659. doi:10.1177/030631296026003005. JSTOR285702.
Fukushima, K. (1969). “Visual feature extraction by a multilayered network of analog threshold elements”. IEEE Transactions on Systems Science and Cybernetics5 (4): 322–333. doi:10.1109/TSSC.1969.300225.
Sonoda, Sho; Murata, Noboru (2017). “Neural network with unbounded activation functions is universal approximator”. Applied and Computational Harmonic Analysis43 (2): 233–268. arXiv:1505.03654. doi:10.1016/j.acha.2015.12.005.
Fukushima, K. (1979). “Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron”. Trans. IECE (In Japanese)J62-A (10): 658–665. doi:10.1007/bf00344251. PMID7370364.
Fukushima, K. (1980). “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position”. Biol. Cybern.36 (4): 193–202. doi:10.1007/bf00344251. PMID7370364.
Gers, Felix; Schmidhuber, Jürgen; Cummins, Fred (1999). “Learning to forget: Continual prediction with LSTM”. 9th International Conference on Artificial Neural Networks: ICANN '99. 1999. pp. 850–855. doi:10.1049/cp:19991218. ISBN0-85296-721-7
Ciresan, D.; Giusti, A.; Gambardella, L.M.; Schmidhuber, J. (2013). “Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks”. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013. Lecture Notes in Computer Science. 7908. pp. 411–418. doi:10.1007/978-3-642-40763-5_51. ISBN978-3-642-38708-1. PMID24579167
Ciresan, D.; Meier, U.; Schmidhuber, J. (2012). “Multi-column deep neural networks for image classification”. 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3642–3649. arXiv:1202.2745. doi:10.1109/cvpr.2012.6248110. ISBN978-1-4673-1228-8
Wolf, Thomas; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Pierric; Rault, Tim et al. (2020). “Transformers: State-of-the-Art Natural Language Processing”. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp. 38–45. doi:10.18653/v1/2020.emnlp-demos.6
Olazaran, Mikel (1996). “A Sociological Study of the Official History of the Perceptrons Controversy”. Social Studies of Science26 (3): 611–659. doi:10.1177/030631296026003005. JSTOR285702.
kit.edu
isl.anthropomatik.kit.edu
Waibel, Alex (December 1987). Phoneme Recognition Using Time-Delay Neural Networks(PDF). Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE). Tokyo, Japan. 2024年9月17日時点のオリジナルよりアーカイブ(PDF). 2025年6月20日閲覧.
Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model For Information Storage And Organization in the Brain”. Psychological Review65 (6): 386–408. doi:10.1037/h0042519. PMID13602029.
Fukushima, K. (1979). “Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron”. Trans. IECE (In Japanese)J62-A (10): 658–665. doi:10.1007/bf00344251. PMID7370364.
Fukushima, K. (1980). “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position”. Biol. Cybern.36 (4): 193–202. doi:10.1007/bf00344251. PMID7370364.
Cireşan, Dan Claudiu; Meier, Ueli; Gambardella, Luca Maria; Schmidhuber, Jürgen (21 September 2010). “Deep, Big, Simple Neural Nets for Handwritten Digit Recognition”. Neural Computation22 (12): 3207–3220. arXiv:1003.0358. doi:10.1162/neco_a_00052. ISSN0899-7667. PMID20858131.
Ciresan, D.; Giusti, A.; Gambardella, L.M.; Schmidhuber, J. (2013). “Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks”. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013. Lecture Notes in Computer Science. 7908. pp. 411–418. doi:10.1007/978-3-642-40763-5_51. ISBN978-3-642-38708-1. PMID24579167
"TensorRT can optimize and deploy applications to the data center, as well as embedded and automotive environments. It powers key NVIDIA solutions" NVIDIA TensorRT. NVIDIA.
"Quantization performance gain comes in 2 part: instruction and cache." Quantize ONNX Models. ONNX Runtime.
"Old hardware doesn’t have or has few instruction support for byte computation. And quantization has overhead (quantize and dequantize), so it is not rare to get worse performance on old devices." Quantize ONNX Models. ONNX Runtime.
"Performance improvement depends on your model and hardware." Quantize ONNX Models. ONNX Runtime.
"There are 2 ways to represent quantized ONNX models: ... Tensor Oriented, aka Quantize and DeQuantize (QDQ)." Quantize ONNX Models. ONNX RUNTIME. 2022-03-15閲覧.
"Quantizing a network means converting it to use a reduced precision integer representation for the weights and/or activations." DYNAMIC QUANTIZATION. PyTorch.
"Static quantization quantizes the weights and activations of the model. ... It requires calibration with a representative dataset to determine optimal quantization parameters for activations." QUANTIZATION. PyTorch.
"with dynamic quantization ... determine the scale factor for activations dynamically based on the data range observed at runtime." DYNAMIC QUANTIZATION. PyTorch.
"The model parameters ... are converted ahead of time and stored in INT8 form." DYNAMIC QUANTIZATION. PyTorch.
"Simulate the quantize and dequantize operations in training time." FAKEQUANTIZE. PyTorch. 2022-03-15閲覧.
"Quantization works by reducing the precision of the numbers used to represent a model's parameters, which by default are 32-bit floating point numbers." Model optimization. TensorFlow.
"Less memory usage: Smaller models use less RAM when they are run, which frees up memory for other parts of your application to use, and can translate to better performance and stability." Model optimization. TensorFlow.
Bozinovski S. (1995) "Neuro genetic agents and structural theory of self-reinforcement learning systems". CMPSCI Technical Report 95-107, University of Massachusetts at Amherst [1]Archived 2024-10-08 at the Wayback Machine.
Waibel, Alex (December 1987). Phoneme Recognition Using Time-Delay Neural Networks(PDF). Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE). Tokyo, Japan. 2024年9月17日時点のオリジナルよりアーカイブ(PDF). 2025年6月20日閲覧.
Bozinovski S. (1995) "Neuro genetic agents and structural theory of self-reinforcement learning systems". CMPSCI Technical Report 95-107, University of Massachusetts at Amherst [1]Archived 2024-10-08 at the Wayback Machine.