GPU

Slidecast: GPU Technology and Usage Models

GPUs and their applications can be a complex topic. This week I came across an insideHPC article of NVIDIA presenting an introduction into the GPU technology and usage models. So, if you are interested in an introduction of what GPUs can do for you, look no further!

NVIDIA Introduction into GPU applications by Sumit Gupta
Picture from NVIDIA slidecast: GPU Technology and Usage Models
Tags:

External Views on Parallel Processing

There were several information pieces that caught my attention over the last few weeks that seemed to be worthy of sharing. As one of the few non-technical people here at Acceleware, what I appreciated about all of these snipets was how clearly they affirmed the value of the technologies that we are working on. Two of these pieces have a connection with NVIDIA but the third is Intel, so that provides a good balance.

The first one occurred May 18th with IBM and NVIDIA announcing that Big Blue would start incorporating GPU technology in their iDataPlex Servers. It is another great endorsement for using GPUs as part of the processing engines in modern data centers. Check out the video with Scott Denham who gives a very concise overview of the multiple benefits of GPUs.

(via TGDaily)

Exxon Technical SW Development Conference

This past Thursday, April Fools!, I had the honor of speaking at Exxon Mobil’s Technical Software Development conference in Houston. We were one of four external vendors invited to present and later do demos at this all-Exxon event. It was all proprietary so they put a black hood on me, took me to the room after many spins and twists and turns, I gave my presentation, then they hooded me again and escorted me back out of the building.  My talk was right after Dr. Bjarne Stroustrup on C++ and C++0x; definitely a tough act to follow but I did my best.

“One part agile software development, two parts parallel, many-core: Helping Exxon Mobil to reach for higher performance.”

Ni hao! (Mandarin Chinese)

January 31: So I ended up in Zhuozhou City about 120km from Beijing this week as a guest of the Geophysical Research Institute (GRI). I have trouble following the connection chart, and it changes, but to make a long, convoluted story seem short: it seems like the path is China National Petroleum Company (CNPC), which is then connected to PetroChina, which is then or also connected to the Bureau of Geophysical Processing (BGP), which is then the parent of GRI. GRI is the seismic processing workhorse and I think they do most of the seismic processing for the Chinese oilfields/companies.

Video: SEMCAD-X and Acceleware Deliver Seamless GPU Acceleration

 

AxIsland’s Ferry Service

Back when I competed in programming contests, problems would be wrapped up in “real-world” stories, with the goal of making the problem easier to comprehend and more digestible for larger audiences. Following such tradition (hopefully), I present a story about a common problem developers deal with here at Acceleware.

In recent years, AxIsland has become quite a tourist destination, and hotel companies have been flourishing. Sixteen different hotel chains exist on the island, each with multiple accommodation offerings and courtesy taxi service. The high cost of beach property forced the local airport onto a nearby island, thus arriving tourists must first hop onto a taxi vehicle (run by their choice of hotel chain) before taking a small ferry across to AxIsland. The ferry service is a joint service run by the sixteen hotel chains, and has exactly sixteen spots for taxi-vehicles, reserved exclusively for its respective hotel chain. It used to be that single-passenger taxi-cabs were the only taxi-vehicles on AxIsland, but recently taxi-vans capable of holding up to sixteen passengers (all headed to the same hotel) were introduced to accommodate large families and business-retreat visitors.  Unfortunately, the introduction of taxi-vans also introduced logistical problems. The ferry can still carry sixteen taxi-vehicles at once, however due to the ferry’s size and slot restrictions there is only room for at most one taxi-van. Due to this constraint, hotel-chains began bickering over who could operate the taxi-van, before it was finally decided that taxi-van destination hotel would be that of the first tourist waiting in line for transportation. This solution is far from ideal, and can lead to tourists sitting around waiting a long time for their turn on the ferry.

Given a list of tourists and their destination hotels, your job is to assign them to one of the daily flights headed to AxIsland in such a way that minimizes the worse-case total number of ferry transfers needed.  (Worst-case due to the fact that you have no control over which tourist gets in line for the ferry first).

In the examples below, sixteen passengers have lined up for the ferry.  Color is used to symbolize the hotel chain they are staying at, while the shape represents the destination hotel.  The ferry has 16 smaller squares for taxi-cabs with their respective hotel-chain color pattern, along with one large square to accommodate a single taxi-van.  In the first configuration, two ferry transfers are required.  The second configuration has the same passengers as the first, but due to the way they lined up for the ferry, it will take 4 transfers.

Start:

Start

 

Tags:

Inside AXE Professional Services: Success in Seismic Processing

The Oil and Gas team just delivered on a project with Saudi Aramco, and it's my job to acknowledge (immortalize) the team's success by blogging about it. The project objective was to redevelop the customer's production Kirchhoff Time Migration codes (KTM, the "workhorse" algorithm in the seismic processing industry) to exploit heterogonous multi-core hardware, or in plain language, computer clusters equipped with GPUs and modern CPUs. Note that I didn't say port.

While some may be tempted to apply the "port" label to what we do in our Professional Services projects, it couldn't be more misrepresentative of our development process. While it's well known that algorithms must be substantially re-worked to be effective on parallel hardware (especially true on the GPU), what some organizations may not realize is that when end-to-end production software is migrated, a lot more than the core computational kernels have to be reconsidered. In particular, load balancing becomes more significant (drastically faster hardware elements can lead to drastically worse imbalances) and elaborate (nodes, cores , GPUs and network/disk/bus IO all play together simultaneously), Amdahl's Law rears its performance-damning head (small contributors to overall runtime suddenly become major bottlenecks) and maximally exploiting all available compute hardware becomes interesting (does the CPU have spare cycles that we can harness to take some of the load of the GPU, tangential from the usual notion of load balancing between like-resources?).

Re: Australia boots up GPU supercomputer

Interesting TGDaily article "Australia boots up GPU supercomputer ":

Australia's national science agency has fired up a massive GPU supercomputer capable of delivering 256 Teraflops of peak performance.

The CSIRO supercomputer - which is powered by 64 Nvidia Tesla S1070 GPUs - includes 28 Dual Xeon E5462 compute nodes (or 1024 2.8GHz compute cores), 500 GB of SATA storage, a 144 port DDR InfiniBand Switch and an 80 Terabyte Hitachi NAS file system. [...]

 

 

Tags:

SuperComputing 2009 Wrap Up

Just back from my third SC conference, SC09, in Portland where I was able to confirm Doug's observation that the word 'Supercomputing' no longer figures on any of the signs, banners and conference material. It's like when Kentucky Fried Chicken went to KFC, but I'm still trying to figure out what what's so bad about the words 'super' or 'computing'. Regardless, two words that were anything but denigrated at the show were 'cloud' and 'GPU', which I'd argue were the two most prevalent themes of this year's show. I'll stay more down to earth, i.e. not in the clouds, with this blog, and focus on the GPU side of things, which is obviously of greater interest to Acceleware, its customers, and hopefully its blog readers.

Acceleware Booth at SuperComputing 2009 (SC09)

Is it time to ask for directions?

CUDA Training

I recently finished teaching an Acceleware CUDA training course, so the timing seemed appropriate to share some of my experiences from the course and share some of the students’ thoughts as they progressed through the training material. When you are immersed in GPU technology on a daily basis, it is easy to take the fundamental concepts of CUDA for granted. Teaching the course forces me to revisit the foundations of GPU programming and gives me some insight into some of the thoughts of people approaching the GPU for the first time.

Learning the GPU – the First Probable Outcomes

When people first start working with the GPU, they invariably experience one of the following outcomes:

  1. I have no idea where to begin.
  2. The performance on the GPU is slower.
  3. The performance on the GPU is marginally better than the CPU.
  4. The performance on the GPU crushes the CPU!

I Have No Idea Where to Begin!

This is not uncommon. The GPU does not work like a normal CPU and it requires a completely different mindset to program. By way of example, I’m going to blatantly steal, I mean borrow from Mike’s analogy of home renovations. Imagine that you are building a house using the following (simplified) process:

  1. Pour the foundation
  2. Add the frame to the home
  3. Build the roof
  4. Plumbing and electrical
  5. Apply the finishing touches