Raihan, Md Aamir; Goli, Negar; Aamodt, Tor (2018). "Modeling Deep Learning Accelerator Enabled GPUs". arXiv:1811.08309 [cs.MS].
Jia, Zhe; Maggioni, Marco; Smith, Jeffrey; Daniele Paolo Scarpazza (2019). "Dissecting the NVidia Turing T4 GPU via Microbenchmarking". arXiv:1903.07486 [cs.DC].
Jia, Zhe; Maggioni, Marco; Staiger, Benjamin; Scarpazza, Daniele P. (2018). "Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking". arXiv:1804.06826 [cs.DC].
Jia, Zhe; Maggioni, Marco; Smith, Jeffrey; Daniele Paolo Scarpazza (2019). "Dissecting the NVidia Turing T4 GPU via Microbenchmarking". arXiv:1903.07486 [cs.DC].
Note that Jia, Zhe; Maggioni, Marco; Smith, Jeffrey; Daniele Paolo Scarpazza (2019). "Dissecting the NVidia Turing T4 GPU via Microbenchmarking". arXiv:1903.07486 [cs.DC]. disagrees and states 2 KiB L0 instruction cache per SM partition and 16 KiB L1 instruction cache per SM