CUDA Training
I recently finished teaching an Acceleware CUDA training course, so the timing seemed appropriate to share some of my experiences from the course and share some of the students’ thoughts as they progressed through the training material. When you are immersed in GPU technology on a daily basis, it is easy to take the fundamental concepts of CUDA for granted. Teaching the course forces me to revisit the foundations of GPU programming and gives me some insight into some of the thoughts of people approaching the GPU for the first time.
Learning the GPU – the First Probable Outcomes
When people first start working with the GPU, they invariably experience one of the following outcomes:
- I have no idea where to begin.
- The performance on the GPU is slower.
- The performance on the GPU is marginally better than the CPU.
- The performance on the GPU crushes the CPU!
I Have No Idea Where to Begin!
This is not uncommon. The GPU does not work like a normal CPU and it requires a completely different mindset to program. By way of example, I’m going to blatantly steal, I mean borrow from Mike’s analogy of home renovations. Imagine that you are building a house using the following (simplified) process:
- Pour the foundation
- Add the frame to the home
- Build the roof
- Plumbing and electrical
- Apply the finishing touches
What if I told you the electricians were doing the wiring before the foundation was poured? Doesn’t make sense? It might if you are building 10000 homes at the same time. Home #7756 may have already completed Steps 1-3, when home #2212 hasn’t even started Step 1. This logic also applies when running an algorithm on the GPU, except that the subtleties can get even more... well, subtle.
Will training help?
Definitely. From basic parallelism through shared memory bank conflicts, the course covers high level concepts to extremely detailed optimization strategies.
The GPU is Slower
It is possible, actually quite easy, to make the GPU perform worse than the CPU. For example, improper thread use, uncoalesced memory accesses, or inefficient data transfer across the PCI bus could grind the processor to a halt and serialize all of your operations.
Time to seek an alternative to the online documentation?
Possibly. If you are investigating the GPU, you probably have concluded that your problem exhibits some level of parallelism that can be exploited. In most cases this is true and a multiplier can ultimately be obtained.
I sometimes see snippets of code like this in GPU kernels:
[code]for (int i = 0; i < arraySize; i++) arr[i] += 3;[/code]
Non-data dependent operations, like the one above, are exactly what the GPU is good at! If you implement your algorithm with the code above, I would expect the GPU to be slower than the CPU.
The GPU is Marginally Better
Most of the pitfalls above apply, except that there still may be enough performance to show parity or an improvement over the CPU. Unfortunately, programmers in this window often prematurely dismiss the GPU as a technology that is not suitable to their class of problem, particularly if someone else has struggled with the same problem.
“I read that the GPU was not good for application X.”
That heavily depends on application X. Why is the GPU not good for this application? Are you implementing a variant of application X? Sometimes variations on the algorithm have properties that can be exploited on the GPU. Sometimes the reference implementation is poor. And yes, sometimes the GPU is not suited for a specific algorithm.
Before we abandon the GPU, we need to understand if our problem is compute or memory bandwidth limited and analyze the maximum performance we can expect from the GPU. And before we can do that, we need to understand the fundamentals and architecture of the GPU.
The GPU Performance is Amazing!
“I’m seeing a 10x multiplier so I must be a legendary GPU programmer!”
Maybe. But is that the best you can do? What is the maximum performance of the hardware? How do you find out?
Can I still learn anything from the course?
Almost certainly! We had a student who was seeing a 3x improvement over the CPU with a naïve implementation on the GPU. Within a day after the course, they were seeing closer to 10x for their algorithm with the possibility of even more performance.
We share with you our years of optimization experience, so that you can see the maximum performance from your code. Sometimes even small changes can result in big performance improvements!
The Students Sign Up!
Before coming to the course, students might have limited GPU experience; others have spent several weeks working with the GPU. In almost all cases, they had experiences similar to the ones above and were looking to understand more about the GPU and how they can speed up their application. A background in C/C++ is very helpful for the course.
Misconceptions
Students commented on the complete shift in understanding before and after attending the course. Some common thoughts prior to the course:
- I can just recompile my code for the GPU with nvcc
- I can replace all my malloc calls with cudaMalloc
- I can access GPU memory however I want
While all of the items above may compile, they are almost all guaranteed to produce a slower algorithm than the CPU. Towards the end of the course, I was hearing comments such as “no wonder my code was slow” and “I can’t believe my program even worked.” I was pleased to hear this feedback as it demonstrated a shift in thinking towards a GPU mentality.
Closing Thoughts
I am always pleased to hear comments from students who have taken our course. Without exception, they all agree that they have a much better understanding of the GPU architecture and how they can make their applications faster. If you think it is time to ask for directions, let us know!
Comments
Hello Acceleware,
Can you please include a thread which discusses the current financial results and the outlook for 2010?
Thanks,
John
# Posted By john | 11/26/09 5:32 AM
Hi John,
Please see our interim financial statements and related management’s discussion and analysis for the three months ended September 30, 2009 on SEDAR at http://www.sedar.com/DisplayCompanyDocuments.do?la... for a discussion of current financial results. Acceleware currently does not provide public guidance on possible future financial results.
# Posted By admin | 11/26/09 4:21 PM