As I mentioned in my last post, Acceleware has been doing GPU programming for 5+ years now, this makes us veritable seasoned veterans in NVIDIA’s ‘GPU computing ecosystem’. This fact might cause some to wonder why we only officially released the CUDA-based version of our FDTD libraries only a few months ago. The short answer is that it took a long time to port a code base that took 3+ years to build. The more interesting story however are the benefits and results of doing that port to CUDA. This is what I want to focus on in this post.
The foremost two benefits go hand in hand and they are performance and robustness. Before CUDA we were basically hijacking OpenGL, the graphics programming language, to do GPU computing. While this worked, there were many workarounds and kludges that were required to make sure things ran smoothly. We were actually quite proud of what we accomplished in terms of performance, but there was still some left on the table, that OpenGL just couldn’t get to. The other down side was that we didn’t make any friends at NVIDIA when we reported OpenGL computing bugs. “OpenGL isn’t made for computing” they would remind us at each reprise, a fact that couldn’t have been more obvious for our developers, or more painful for me.
CUDA, or ‘C for CUDA’ as NVIDIA is now calling it, has allowed us to get that last bit of performance out of the G80 series of GPUs. This is great, because for existing users of GPU acceleration, a simple library upgrade will give them a significant performance boost. The real performance miracle however, came on the GT200 hardware where we were free from the limitations our OpenGL workarounds imposed on us (texture size limit anyone?). Here we got a 50%+ boost in speed and a 2X improvement in simulation size going from OpenGL to CUDA-based libraries.
Getting back to robustness, given that we now are using CUDA, any compute bugs we submit to NVIDIA get top attention and are resolved promptly. It also helps that we’re now in NVIDIA’s ‘good books’ and have a good rapport with their driver team. The end result is a more stable hardware+driver platform for us and all our end users, as well as full support of Tesla hardware in Windows, something we couldn’t do with our OpenGL libraries. Overall the CUDA port has been a big win for us and for our partners, who are now actively pushing CUDA+Tesla out to all their users. But wait, there are more benefits…
The other big win for Acceleware, is the developer experience gained in the months that we’ve spent upgrading our code. We now have no less than six new battle-hardened CUDA developers in house, which is already starting to benefit our other products and services teams as we disseminate both knowledge and people into those projects.
The other great part about getting CUDA knowledge and experience more firmly embedded into the teams is that, in our opinion, the future transition to OpenCL should be that much easier. I was initially worried that the port of our FDTD library to CUDA would be all for naught given that OpenCL was due out and was hyped to be the next great savior of parallel programming. Learning more about OpenCL since that time has reassured me. The similarities of the CUDA Driver API to OpenCL are apparently striking, which will make for any future port significantly easier. Given the now three-way momentum behind GPUs from NVIDIA, AMD/ATI, and Intel, this may be a very likely outcome.
Five years from now, I am quite confident that we will look back and say that the investment in both GPUs and in CUDA was worthwhile. Not only for the immediate performance and robustness reasons I mentioned above, but for the knowledge and experience that will allow us to succeed as parallel computing architectures evolve to 100s and 1000s of cores from a variety of vendors.