Existing University Courses
This page has online courses to help you get started programming or teaching CUDA as well as links to Universities teaching CUDA.
This page organized into three sections to get you started
Introductory CUDA Technical Training Courses
Udacity: CS344 Intro To Parallel Programming
- Volume I: Introduction to CUDA Programming
- Exercises (for Linux and Mac)
- Visual Studio Exercises (for Windows)
- Instructions for Exercises
- Volume II: CUDA Case Studies
Check out our CUDAcasts playlist on youtube
CUDA University Courses
University of Illinois : Current Course: ECE408/CS483
Taught by Professor Wen-mei W. Hwu and David Kirk, NVIDIA CUDA Scientist.
- Introduction to GPU Computing (60.2 MB)
- CUDA Programming Model (75.3 MB)
- CUDA API (32.4 MB)
- Simple Matrix Multiplication in CUDA (46.0 MB)
- CUDA Memory Model (109 MB)
- Shared Memory Matrix Multiplication (81.4 MB)
- Additional CUDA API Features (22.4 MB)
- Useful Information on CUDA Tools (15.7 MB)
- Threading Hardware (140 MB)
- Memory Hardware (85.8 MB)
- Memory Bank Conflicts (115 MB)
- Parallel Thread Execution (32.6 MB)
- Control Flow (96.6 MB)
- Precision (137 MB)
These classes are each downloadable CUDAcasts with video pre-scaled to be compatible with major players.
All PowerPoint class presentations can be found on the Fall 2014 webpage: ECE408/CS483
Stanford University: CS 193G: Programming Massively Parallel Processors with CUDA
Taught by Jared Hoberock and David Tarjan
University of Oxford: CUDA Programming on NVIDIA GPUs
Taught by Mike Giles, Professor
UC Davis: EE171: Parallel Computer Architecture
Taught by John Owens, Associate Professor
University of Sheffield: COM4521: Parallel Computing with GPUs
Taught by Paul Richmond,
CUDA Seminars and Tutorials
- GPU Technology Conference: search for recordings
- SC10
- SC09
- SC08 Tutorial: High Performance Computing with CUDA
- SC07 Tutorial: High Performance Computing with CUDA
Dr Dobbs Article Series
- CUDA, Supercomputing for the Masses: Part 1 : CUDA lets you work with familiar programming concepts..
- CUDA, Supercomputing for the Masses: Part 2 : A first kernel
- CUDA, Supercomputing for the Masses: Part 3 : Error handling and global memory performance limitations
- CUDA, Supercomputing for the Masses: Part 4 : Understanding and using shared memory (1)
- CUDA, Supercomputing for the Masses: Part 5 : Understanding and using shared memory (2)
- CUDA, Supercomputing for the Masses: Part 6 : Global memory and the CUDA profiler
- CUDA, Supercomputing for the Masses: Part 7 : Double the fun with next-generation CUDA hardware
- CUDA, Supercomputing for the Masses: Part 8 : Using libraries with CUDA
- CUDA, Supercomputing for the Masses: Part 9 : Extending High-level Languages with CUDA
- CUDA, Supercomputing for the Masses: Part 10 : CUDPP, a powerful data-parallel CUDA library
- CUDA, Supercomputing for the Masses: Part 11 : Revisiting CUDA memory spaces
- CUDA, Supercomputing for the Masses: Part 12 : CUDA 2.2 changes the data movement paradigm
- CUDA, Supercomputing for the Masses: Part 13 : Using texture memory in CUDA
- CUDA, Supercomputing for the Masses: Part 14 : Debuging CUDA and using CUDA-GDB
- CUDA, Supercomputing for the Masses: Part 15 : Using Pixel Buffer Objects with CUDA and OpenGL
- CUDA, Supercomputing for the Masses: Part 16 : CUDA 3.0 provides expanded capabilities
- CUDA, Supercomputing for the Masses: Part 17 : CUDA 3.0 provides expanded capabilities and makes development easier
- CUDA, Supercomputing for the Masses: Part 18 : Using Vertex Buffer Objects with CUDA and OpenGL
- CUDA, Supercomputing for the Masses: Part 19 : Parallel Nsight Part 1: Configuring and Debugging Applications
- CUDA, Supercomputing for the Masses: Part 20 : Parallel Nsight Part 2: Using the Parallel Nsight Analysis capabilities
- CUDA, Supercomputing for the Masses: Part 21 : The Fermi architecture and CUDA
- Unified Memory in CUDA 6: A Brief Overview