Introduction to CUDA

CUDA, as native programming model of Nividia GPUs, allows very fine-grained control over parallel execution compared to higher level programming models such as OpenMP offloading, which helps to optimize performance.

The module provides an introduction to the programming language CUDA which is used to write fast numeric algorithms for NVIDIA GPUs. Focus is on the basic usage of the language, error handling and understanding kernel functions.

Prerequisites

  • Programming experience in any of C, C++, or Fortran

  • Basic experience in working with a Linux shell (e.g. see here)

  • Any knowledge about parallel programming is benefitial but not mandatory

Software setup

Learning outcomes

This material is for researchers, engineers, and students who want to learn how to program NVIDIA GPUs using CUDA for high-performance computing workloads.

After completing this module, learners will be able to:

  • Explain the host / device programming model and identify which parts of a program run on the CPU vs the GPU

  • Write a CUDA kernel using the __global__ qualifier and launch it from host code using the triple-chevron syntax in C/C++ and the equivalent syntax in CUDA Fortran (<<<...>>>)

  • Map thread and block indices to data elements using the built-in threadIdx / blockIdx / blockDim / gridDim variables, and apply the grid-stride loop pattern to handle arbitrary data sizes

  • Apply the function execution-space qualifiers (__global__, __device__, __host__ __device__) correctly and respect the restrictions that apply to device code

  • Detect and handle CUDA errors using error-checking macros, and understand the asynchronous nature of kernel-launch error reporting

See also

Credit

FIXME

Don’t forget to check out additional course materials from …

License

Note

To module authors: For code you may use any OSI-approved license as mentioned in https://spdx.org/licenses/, such as Apache License 2.0, GNU GPLv3, MIT. Please make sure to update the deed above and LICENSE.code file accordingly.