The Oil and Gas team just delivered on a project with Saudi Aramco, and it's my job to acknowledge (immortalize) the team's success by blogging about it. The project objective was to redevelop the customer's production Kirchhoff Time Migration codes (KTM, the "workhorse" algorithm in the seismic processing industry) to exploit heterogonous multi-core hardware, or in plain language, computer clusters equipped with GPUs and modern CPUs. Note that I didn't say port.
While some may be tempted to apply the "port" label to what we do in our Professional Services projects, it couldn't be more misrepresentative of our development process. While it's well known that algorithms must be substantially re-worked to be effective on parallel hardware (especially true on the GPU), what some organizations may not realize is that when end-to-end production software is migrated, a lot more than the core computational kernels have to be reconsidered. In particular, load balancing becomes more significant (drastically faster hardware elements can lead to drastically worse imbalances) and elaborate (nodes, cores , GPUs and network/disk/bus IO all play together simultaneously), Amdahl's Law rears its performance-damning head (small contributors to overall runtime suddenly become major bottlenecks) and maximally exploiting all available compute hardware becomes interesting (does the CPU have spare cycles that we can harness to take some of the load of the GPU, tangential from the usual notion of load balancing between like-resources?).