Inside AXE Professional Services: Success in Seismic Processing

The Oil and Gas team just delivered on a project with Saudi Aramco, and it's my job to acknowledge (immortalize) the team's success by blogging about it. The project objective was to redevelop the customer's production Kirchhoff Time Migration codes (KTM, the "workhorse" algorithm in the seismic processing industry) to exploit heterogonous multi-core hardware, or in plain language, computer clusters equipped with GPUs and modern CPUs. Note that I didn't say port.

While some may be tempted to apply the "port" label to what we do in our Professional Services projects, it couldn't be more misrepresentative of our development process. While it's well known that algorithms must be substantially re-worked to be effective on parallel hardware (especially true on the GPU), what some organizations may not realize is that when end-to-end production software is migrated, a lot more than the core computational kernels have to be reconsidered. In particular, load balancing becomes more significant (drastically faster hardware elements can lead to drastically worse imbalances) and elaborate (nodes, cores , GPUs and network/disk/bus IO all play together simultaneously), Amdahl's Law rears its performance-damning head (small contributors to overall runtime suddenly become major bottlenecks) and maximally exploiting all available compute hardware becomes interesting (does the CPU have spare cycles that we can harness to take some of the load of the GPU, tangential from the usual notion of load balancing between like-resources?).

The KTM project was no different, but hey, if this wasn't the case then our jobs wouldn't be so much fun! Drawing on our past experience with the KTM algorithm, the team dissected the original Fortran codes, to understand not only the software mechanics (and quirks stemming from legacy Fortran, like rounding by adding misc. magic numbers and truncating), but more importantly the semantics of the theory, such that we can safely apply both theory-level and mathematical/arithmetic optimizations without breaking their approach (KTM is a highly mature algorithm, and every implementation we have seen has dozens of "secret sauce"/differentiating features to improve its imaging capabilities). Armed with a line-level understanding, the team produced a high performance system capable of exploiting the massive concurrency available on heterogeneous clustered CPU-GPU hardware. The software is limited in throughput only by the underlying hardware, and is well poised to take advantage of next generation GPUs and quad-socket (+4-8 core) CPUs.

During the project handoff, the customer was highly enthusiastic about our results, praising the performance ("You've really managed to squeeze all of [the performance] out!"), system design ("Really impressive") and scalability (positing that it will enable the efficient use of their incoming 64 and 128 CPU core machines), and numerical accuracy ("It's amazing how close the answer is").

Congratulations team!

Written by guest blogger and Software Development Team Lead (Oil and Gas) Danny Eaton.