CUDA Training

Acceleware offers advanced CUDA training courses for NVIDIA GPUs delivered by the industry’s most experienced instructors. Since 2008, Acceleware has delivered detailed instruction to hundreds of programmers needing to achieve maximum performance from compute tasked GPUs. This is the industry leading course on GPU programming!

Clients will access our top rated training techniques for parallel programming in CUDA, OpenCL, MPI, Microsoft HPC Server, Visual Studio and many others. Acceleware's training consists of classroom lectures and several practical hands-on exercises using supplied laptops equipped with NVIDIA GPUs.

We recommend that the attendees have a background C/C++ (2 or more years) in order to get the most out of the course. Contact services@acceleware.com if you are interested in a beginner level CUDA courses.

Attendees should be familiar with
the following C/C++ concepts:

  • Pointers and pointer to pointers (*, **)
  • Taking the address of a variable (&)
  • Writing functions, for loops, if/else statements
  • Printing to standard output (printf, cout)
  • Memory allocation and deallocation
  • Arrays and indexing
  • Structures
  • General debugging

Entirely optional (but helpful) experiences:

  • Multithreading
  • Optimization of programs
  • Low level programming (e.g., assembly languages)
  • Familiarity with computer architectures

4 Day Course Syllabus

  • Day 1:
    • Overview of GPU computing
    • Data-parallel architectures and the GPU programming model
    • GPU memory model & thread cooperation
    • Hands-on exercises: GPU memory management, simple CUDA kernels and shared memory and constant memory
  • Day 2:
    • Asynchronous operations
    • Advanced CUDA features
    • Libraries
    • Debugging GPU Programs
    • Hands-on-exercises: Asynchronous operations, CUDA features, experience with CUFFT, CUBLAS, Thrust, debugging
  • Day 3:
    • Introduction to optimization
    • Resource management, latency and occupancy
    • Memory performance optimizations
    • Profiling applications
    • Hands-on exercises: Arithmetic optimizations, occupancy calculator, profiling and memory access patterns
  • Day 4:
    • CUDA compiler and user-defined libraries
    • OpenACC
    • Hands-on exercises: Case study exercise and OpenACC
    • Case study: Finite difference stencil algorithm or monte carlo simulations

Your fee includes:

  • Use of a laptop equipped with CUDA capable GPU
  • Choice of Linux or Windows operating system
  • Printed manual of all lectures
  • Electronic copy of lab exercises
  • Certificate of Completion

Contact us for pricing information and to schedule your training session.