CUDA (English Wikipedia)

Shimpi, Anand Lal; Wilson, Derek (November 8, 2006). "Nvidia's GeForce 8800 (G80): GPUs Re-architected for DirectX 10". AnandTech. Archived from the original on April 24, 2010. Retrieved May 16, 2015.

arxiv.org (Global: 69^th place; English: 59^th place)

Luo, Weile; Fan, Ruibo; Li, Zeyu; Du, Dayou; Wang, Qiang; Chu, Xiaowen (2024). "Benchmarking and Dissecting the Nvidia Hopper GPU Architecture". arXiv:2402.13499v1 [cs.AR].
Sun, Wei; Li, Ang; Geng, Tong; Stuijk, Sander; Corporaal, Henk (2023). "Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors". IEEE Transactions on Parallel and Distributed Systems. 34 (1): 246–261. arXiv:2206.02874. Bibcode:2023ITPDS..34..246S. doi:10.1109/tpds.2022.3217824. S2CID 249431357.
Raihan, Md Aamir; Goli, Negar; Aamodt, Tor (2018). "Modeling Deep Learning Accelerator Enabled GPUs". arXiv:1811.08309 [cs.MS].
Jia, Zhe; Maggioni, Marco; Smith, Jeffrey; Daniele Paolo Scarpazza (2019). "Dissecting the NVidia Turing T4 GPU via Microbenchmarking". arXiv:1903.07486 [cs.DC].
Jia, Zhe; Maggioni, Marco; Staiger, Benjamin; Scarpazza, Daniele P. (2018). "Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking". arXiv:1804.06826 [cs.DC].
Jia, Zhe; Maggioni, Marco; Smith, Jeffrey; Daniele Paolo Scarpazza (2019). "Dissecting the NVidia Turing T4 GPU via Microbenchmarking". arXiv:1903.07486 [cs.DC].
Note that Jia, Zhe; Maggioni, Marco; Smith, Jeffrey; Daniele Paolo Scarpazza (2019). "Dissecting the NVidia Turing T4 GPU via Microbenchmarking". arXiv:1903.07486 [cs.DC]. disagrees and states 2 KiB L0 instruction cache per SM partition and 16 KiB L1 instruction cache per SM

berkeley.edu (Global: 580^th place; English: 462^nd place)

boinc.berkeley.edu

"Use your Nvidia GPU for scientific computing". boinc.berkeley.edu. Berkeley Open Infrastructure for Network Computing (BOINC). 2008-12-18. Archived from the original on 2008-12-28. Retrieved 2017-08-08.

biocentric.nl (Global: low place; English: low place)

"nVidia CUDA Bioinformatics: BarraCUDA". BioCentric. 2019-07-19. Retrieved 2019-10-15.

businessinsider.com (Global: 140^th place; English: 115^th place)

Cosgrove, Emma. "Ian Buck built Nvidia's secret weapon. He may spend the rest of his career defending it". Business Insider. Retrieved 2025-07-24.

code.google.com (Global: 4,942^nd place; English: 4,061^st place)

"Pyrit – Google Code".

cupy.dev (Global: low place; English: low place)

"CuPy". cupy.dev. Retrieved 2025-09-23.

doi.org (Global: 2^nd place; English: 2^nd place)

Vasiliadis, Giorgos; Antonatos, Spiros; Polychronakis, Michalis; Markatos, Evangelos P.; Ioannidis, Sotiris (September 2008). "Gnort: High Performance Network Intrusion Detection Using Graphics Processors" (PDF). Recent Advances in Intrusion Detection. Lecture Notes in Computer Science. Vol. 5230. pp. 116–134. doi:10.1007/978-3-540-87403-4_7. ISBN 978-3-540-87402-7.
Schatz, Michael C.; Trapnell, Cole; Delcher, Arthur L.; Varshney, Amitabh (2007). "High-throughput sequence alignment using Graphics Processing Units". BMC Bioinformatics. 8 474. doi:10.1186/1471-2105-8-474. PMC 2222658. PMID 18070356.
Manavski, Svetlin A.; Giorgio, Valle (2008). "CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment". BMC Bioinformatics. 10 (Suppl 2): S10. doi:10.1186/1471-2105-9-S2-S10. PMC 2323659. PMID 18387198.
Silberstein, Mark; Schuster, Assaf; Geiger, Dan; Patney, Anjul; Owens, John D. (2008). "Efficient computation of sum-products on GPUs through software-managed cache" (PDF). Proceedings of the 22nd annual international conference on Supercomputing – ICS '08 (PDF). Proceedings of the 22nd annual international conference on Supercomputing – ICS '08. pp. 309–318. doi:10.1145/1375527.1375572. ISBN 978-1-60558-158-3.
Sun, Wei; Li, Ang; Geng, Tong; Stuijk, Sander; Corporaal, Henk (2023). "Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors". IEEE Transactions on Parallel and Distributed Systems. 34 (1): 246–261. arXiv:2206.02874. Bibcode:2023ITPDS..34..246S. doi:10.1109/tpds.2022.3217824. S2CID 249431357.
Burgess, John (2019). "RTX ON – The NVIDIA TURING GPU". 2019 IEEE Hot Chips 31 Symposium (HCS). pp. 1–27. doi:10.1109/HOTCHIPS.2019.8875651. ISBN 978-1-7281-2089-8. S2CID 204822166.
Burgess, John (2019). "RTX ON – The NVIDIA TURING GPU". 2019 IEEE Hot Chips 31 Symposium (HCS). pp. 1–27. doi:10.1109/HOTCHIPS.2019.8875651. ISBN 978-1-7281-2089-8. S2CID 204822166.
2 clock cycles/instruction for each SM partition Burgess, John (2019). "RTX ON – The NVIDIA TURING GPU". 2019 IEEE Hot Chips 31 Symposium (HCS). pp. 1–27. doi:10.1109/HOTCHIPS.2019.8875651. ISBN 978-1-7281-2089-8. S2CID 204822166.
Wong, Henry; Papadopoulou, Misel-Myrto; Sadooghi-Alvandi, Maryam; Moshovos, Andreas (March 2010). Demystifying GPU Microarchitecture through Microbenchmarking (PDF). 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). White Plains, NY, USA: IEEE Computer Society. doi:10.1109/ISPASS.2010.5452013. ISBN 978-1-4244-6023-6.

escholarship.org (Global: 1,523^rd place; English: 976^th place)

Silberstein, Mark; Schuster, Assaf; Geiger, Dan; Patney, Anjul; Owens, John D. (2008). "Efficient computation of sum-products on GPUs through software-managed cache" (PDF). Proceedings of the 22nd annual international conference on Supercomputing – ICS '08 (PDF). Proceedings of the 22nd annual international conference on Supercomputing – ICS '08. pp. 309–318. doi:10.1145/1375527.1375572. ISBN 978-1-60558-158-3.

forth.gr (Global: low place; English: low place)

ics.forth.gr

Vasiliadis, Giorgos; Antonatos, Spiros; Polychronakis, Michalis; Markatos, Evangelos P.; Ioannidis, Sotiris (September 2008). "Gnort: High Performance Network Intrusion Detection Using Graphics Processors" (PDF). Recent Advances in Intrusion Detection. Lecture Notes in Computer Science. Vol. 5230. pp. 116–134. doi:10.1007/978-3-540-87403-4_7. ISBN 978-3-540-87402-7.

github.com (Global: 383^rd place; English: 320^th place)

"hughperkins/coriander: Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices". GitHub. May 6, 2019.
"GitHub – vosen/ZLUDA". GitHub.
"GitHub – chip-spv/chipStar". GitHub.
"asfermi Opcode". GitHub.
"Question: What does ROCm stand for? · Issue #1628 · RadeonOpenCompute/ROCm". Github.com. Retrieved January 18, 2022.

harvard.edu (Global: 18^th place; English: 17^th place)

ui.adsabs.harvard.edu

Sun, Wei; Li, Ang; Geng, Tong; Stuijk, Sander; Corporaal, Henk (2023). "Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors". IEEE Transactions on Parallel and Distributed Systems. 34 (1): 246–261. arXiv:2206.02874. Bibcode:2023ITPDS..34..246S. doi:10.1109/tpds.2022.3217824. S2CID 249431357.

iwocl.org (Global: low place; English: low place)

Perkins, Hugh (2017). "cuda-on-cl" (PDF). IWOCL. Retrieved August 8, 2017.

kered.org (Global: low place; English: low place)

"pycublas". Archived from the original on 2009-04-20. Retrieved 2017-08-08.

llvm.org (Global: low place; English: 7,917^th place)

"Compiling CUDA with clang – LLVM 22.0.0git documentation". llvm.org.
"User Guide for NVPTX Back-end — LLVM 22.0.0git documentation". llvm.org.

mercurynews.com (Global: 701^st place; English: 439^th place)

"John Nickolls Obituary – Los Altos, CA". The Mercury News. 2011-09-29. Retrieved 2025-11-23. John Richard Nickolls, who passed away in Los Altos, California on August 13, 2011 after a courageous battle against cancer. He was born on March 6, 1950 to Kenneth and Kathryn Nickolls and grew up in Wilbraham, Massachusetts.

newyorker.com (Global: 146^th place; English: 110^th place)

Witt, Stephen (2023-11-27). "How Jensen Huang's Nvidia Is Powering the A.I. Revolution". The New Yorker. ISSN 0028-792X. Retrieved 2023-12-10.

nih.gov (Global: 4^th place; English: 4^th place)

ncbi.nlm.nih.gov

Schatz, Michael C.; Trapnell, Cole; Delcher, Arthur L.; Varshney, Amitabh (2007). "High-throughput sequence alignment using Graphics Processing Units". BMC Bioinformatics. 8 474. doi:10.1186/1471-2105-8-474. PMC 2222658. PMID 18070356.
Manavski, Svetlin A.; Giorgio, Valle (2008). "CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment". BMC Bioinformatics. 10 (Suppl 2): S10. doi:10.1186/1471-2105-9-S2-S10. PMC 2323659. PMID 18387198.

pubmed.ncbi.nlm.nih.gov

Schatz, Michael C.; Trapnell, Cole; Delcher, Arthur L.; Varshney, Amitabh (2007). "High-throughput sequence alignment using Graphics Processing Units". BMC Bioinformatics. 8 474. doi:10.1186/1471-2105-8-474. PMC 2222658. PMID 18070356.
Manavski, Svetlin A.; Giorgio, Valle (2008). "CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment". BMC Bioinformatics. 10 (Suppl 2): S10. doi:10.1186/1471-2105-9-S2-S10. PMC 2323659. PMID 18387198.

nvidia.com (Global: 2,503^rd place; English: 1,760^th place)

images.nvidia.com

"NVIDIA Ampere GA102 GPU Architecture" (PDF). nvidia.com. Retrieved 5 September 2023.
"Datasheet NVIDIA A40" (PDF). nvidia.com. Retrieved 27 April 2024.
"NVIDIA Turing Architecture Whitepaper" (PDF). nvidia.com. Retrieved 5 September 2023.

news.developer.nvidia.com

"CUDA 1.1 – Now on Mac OS X". February 14, 2008. Archived from the original on November 22, 2008.

devtalk.nvidia.com

"NVCC forces c++ compilation of .cu files". 29 November 2011.

devblogs.nvidia.com

Durant, Luke; Giroux, Olivier; Harris, Mark; Stam, Nick (May 10, 2017). "Inside Volta: The World's Most Advanced Data Center GPU". Nvidia developer blog.

oneapi.io (Global: low place; English: low place)

"oneAPI Programming Model". oneAPI.io. Retrieved 2024-07-27.
"Specifications | oneAPI". oneAPI.io. Retrieved 2024-07-27.

phoronix.com (Global: 4,683^rd place; English: 3,096^th place)

"Coriander Project: Compile CUDA Codes To OpenCL, Run Everywhere". Phoronix.
Larabel, Michael (2024-02-12), "AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source", Phoronix, retrieved 2024-02-12
Larabel, Michael (March 29, 2017). "NVIDIA Rolls Out Tegra X2 GPU Support In Nouveau". Phoronix. Retrieved August 8, 2017.
"NVIDIA Bringing up Open-Source Volta GPU Support for Their Xavier SoC".

reuters.com (Global: 49^th place; English: 47^th place)

Cherney, Max A.; Cherney, Max A. (26 March 2024). "Exclusive: Behind the plot to break Nvidia's grip on AI by targeting software". Reuters. Retrieved 2024-04-05.

semanticscholar.org (Global: 11^th place; English: 8^th place)

api.semanticscholar.org

Sun, Wei; Li, Ang; Geng, Tong; Stuijk, Sander; Corporaal, Henk (2023). "Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors". IEEE Transactions on Parallel and Distributed Systems. 34 (1): 246–261. arXiv:2206.02874. Bibcode:2023ITPDS..34..246S. doi:10.1109/tpds.2022.3217824. S2CID 249431357.
Burgess, John (2019). "RTX ON – The NVIDIA TURING GPU". 2019 IEEE Hot Chips 31 Symposium (HCS). pp. 1–27. doi:10.1109/HOTCHIPS.2019.8875651. ISBN 978-1-7281-2089-8. S2CID 204822166.
Burgess, John (2019). "RTX ON – The NVIDIA TURING GPU". 2019 IEEE Hot Chips 31 Symposium (HCS). pp. 1–27. doi:10.1109/HOTCHIPS.2019.8875651. ISBN 978-1-7281-2089-8. S2CID 204822166.
2 clock cycles/instruction for each SM partition Burgess, John (2019). "RTX ON – The NVIDIA TURING GPU". 2019 IEEE Hot Chips 31 Symposium (HCS). pp. 1–27. doi:10.1109/HOTCHIPS.2019.8875651. ISBN 978-1-7281-2089-8. S2CID 204822166.

stuffedcow.net (Global: low place; English: low place)

Wong, Henry; Papadopoulou, Misel-Myrto; Sadooghi-Alvandi, Maryam; Moshovos, Andreas (March 2010). Demystifying GPU Microarchitecture through Microbenchmarking (PDF). 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). White Plains, NY, USA: IEEE Computer Society. doi:10.1109/ISPASS.2010.5452013. ISBN 978-1-4244-6023-6.

techpowerup.com (Global: 4,960^th place; English: 3,052^nd place)

"NVIDIA Quadro NVS 420 Specs". TechPowerUp GPU Database. 25 August 2023.
Nvidia Xavier Specs on TechPowerUp (preliminary)

theregister.com (Global: 3,700^th place; English: 2,360^th place)

Shah, Agam. "Nvidia not totally against third parties making CUDA chips". www.theregister.com. Retrieved 2024-04-25.

tician.de (Global: low place; English: low place)

mathema.tician.de

"PyCUDA".

tomshardware.com (Global: 2,976^th place; English: 1,939^th place)

Abi-Chahla, Fedy (June 18, 2008). "Nvidia's CUDA: The End of the CPU?". Tom's Hardware. Retrieved May 17, 2015.
"New SCALE tool enables CUDA applications to run on AMD GPUs". Tom's Hardware. July 17, 2024.

uxlfoundation.org (Global: low place; English: low place)

oneapi-spec.uxlfoundation.org

"oneAPI Specification – oneAPI Specification 1.3-rev-1 documentation". oneapi-spec.uxlfoundation.org. Retrieved 2024-07-27.

videomaker.com (Global: low place; English: low place)

Zunitch, Peter (2018-01-24). "CUDA vs. OpenCL vs. OpenGL". Videomaker. Retrieved 2018-09-16.

vt.edu (Global: 2,431^st place; English: 1,607^th place)

chrec.cs.vt.edu

"CU2CL Documentation". chrec.cs.vt.edu.

web.archive.org (Global: 1^st place; English: 1^st place)

"NVIDIA® CUDA™ Unleashes Power of GPU Computing - Press Release". nvidia.com. Archived from the original on 29 March 2007. Retrieved 26 January 2025.
Shimpi, Anand Lal; Wilson, Derek (November 8, 2006). "Nvidia's GeForce 8800 (G80): GPUs Re-architected for DirectX 10". AnandTech. Archived from the original on April 24, 2010. Retrieved May 16, 2015.
"Use your Nvidia GPU for scientific computing". boinc.berkeley.edu. Berkeley Open Infrastructure for Network Computing (BOINC). 2008-12-18. Archived from the original on 2008-12-28. Retrieved 2017-08-08.
"Nvidia CUDA Software Development Kit (CUDA SDK) – Release Notes Version 2.0 for MAC OS X". Archived from the original on 2009-01-06.
"CUDA 1.1 – Now on Mac OS X". February 14, 2008. Archived from the original on November 22, 2008.
"pycublas". Archived from the original on 2009-04-20. Retrieved 2017-08-08.

widen.net (Global: low place; English: low place)

nvdam.widen.net

NVIDIA H100 Tensor Core GPU Architecture

worldcat.org (Global: 5^th place; English: 5^th place)

search.worldcat.org

Witt, Stephen (2023-11-27). "How Jensen Huang's Nvidia Is Powering the A.I. Revolution". The New Yorker. ISSN 0028-792X. Retrieved 2023-12-10.

youtube.com (Global: 9^th place; English: 13^th place)

Jones, Stephen (2025-04-22). What is CUDA? (Video). Computerphile. Retrieved 2025-07-24 – via YouTube.
First OpenCL demo on a GPU on YouTube
DirectCompute Ocean Demo Running on Nvidia CUDA-enabled GPU on YouTube

CUDA (English Wikipedia)

anandtech.com (Global: 1,383rd place; English: 878th place)

arxiv.org (Global: 69th place; English: 59th place)

berkeley.edu (Global: 580th place; English: 462nd place)

boinc.berkeley.edu

biocentric.nl (Global: low place; English: low place)

businessinsider.com (Global: 140th place; English: 115th place)

code.google.com (Global: 4,942nd place; English: 4,061st place)

cupy.dev (Global: low place; English: low place)

doi.org (Global: 2nd place; English: 2nd place)

escholarship.org (Global: 1,523rd place; English: 976th place)

forth.gr (Global: low place; English: low place)

ics.forth.gr

github.com (Global: 383rd place; English: 320th place)

harvard.edu (Global: 18th place; English: 17th place)

ui.adsabs.harvard.edu

iwocl.org (Global: low place; English: low place)

kered.org (Global: low place; English: low place)

llvm.org (Global: low place; English: 7,917th place)

mercurynews.com (Global: 701st place; English: 439th place)

newyorker.com (Global: 146th place; English: 110th place)

nih.gov (Global: 4th place; English: 4th place)

ncbi.nlm.nih.gov

pubmed.ncbi.nlm.nih.gov

nvidia.com (Global: 2,503rd place; English: 1,760th place)

docs.nvidia.com

developer.nvidia.com

developer.download.nvidia.com

nvidia.com

images.nvidia.com

news.developer.nvidia.com

devtalk.nvidia.com

devblogs.nvidia.com

oneapi.io (Global: low place; English: low place)

phoronix.com (Global: 4,683rd place; English: 3,096th place)

reuters.com (Global: 49th place; English: 47th place)

semanticscholar.org (Global: 11th place; English: 8th place)

api.semanticscholar.org

stuffedcow.net (Global: low place; English: low place)

techpowerup.com (Global: 4,960th place; English: 3,052nd place)

theregister.com (Global: 3,700th place; English: 2,360th place)

tician.de (Global: low place; English: low place)

mathema.tician.de

tomshardware.com (Global: 2,976th place; English: 1,939th place)

uxlfoundation.org (Global: low place; English: low place)

oneapi-spec.uxlfoundation.org

videomaker.com (Global: low place; English: low place)

vt.edu (Global: 2,431st place; English: 1,607th place)

chrec.cs.vt.edu

web.archive.org (Global: 1st place; English: 1st place)

widen.net (Global: low place; English: low place)

nvdam.widen.net

worldcat.org (Global: 5th place; English: 5th place)

search.worldcat.org

youtube.com (Global: 9th place; English: 13th place)

anandtech.com (Global: 1,383^rd place; English: 878^th place)

arxiv.org (Global: 69^th place; English: 59^th place)

berkeley.edu (Global: 580^th place; English: 462^nd place)

businessinsider.com (Global: 140^th place; English: 115^th place)

code.google.com (Global: 4,942^nd place; English: 4,061^st place)

doi.org (Global: 2^nd place; English: 2^nd place)

escholarship.org (Global: 1,523^rd place; English: 976^th place)

github.com (Global: 383^rd place; English: 320^th place)

harvard.edu (Global: 18^th place; English: 17^th place)

llvm.org (Global: low place; English: 7,917^th place)

mercurynews.com (Global: 701^st place; English: 439^th place)

newyorker.com (Global: 146^th place; English: 110^th place)

nih.gov (Global: 4^th place; English: 4^th place)

nvidia.com (Global: 2,503^rd place; English: 1,760^th place)

phoronix.com (Global: 4,683^rd place; English: 3,096^th place)

reuters.com (Global: 49^th place; English: 47^th place)

semanticscholar.org (Global: 11^th place; English: 8^th place)

techpowerup.com (Global: 4,960^th place; English: 3,052^nd place)

theregister.com (Global: 3,700^th place; English: 2,360^th place)

tomshardware.com (Global: 2,976^th place; English: 1,939^th place)

vt.edu (Global: 2,431^st place; English: 1,607^th place)

web.archive.org (Global: 1^st place; English: 1^st place)

worldcat.org (Global: 5^th place; English: 5^th place)

youtube.com (Global: 9^th place; English: 13^th place)