GPGPU and the Law of New Features

Publication date: 31 December 2008
Originally published 2008 in Atomic: Maximum Power Computing
Last modified 03-Dec-2011.

It Is Written that when a new, much-ballyhooed feature shows up in cutting edge expensive graphics cards, you shouldn't expect that feature to actually amount to anything for at least another couple of hardware generations.

This Law of New Features applies to everything. Remember hardware transform and lighting? How about full-scene antialiasing, and anisotropic filtering?

Every feature is introduced with great fanfare and gold-embossed text on the box, but it's not actually good for anything until a few years later. That's partly because it's not a good business plan to write software for hardware that not many people yet own, but it's also because the early versions of new features are often underpowered and incomplete.

Which brings us to GPGPU.

GPGPU stands, rather untidily, for "General Purpose computing on Graphics Processing Units". It's using graphics-card hardware to do things other than fling pixels at the screen, and it's something that people have been talking about since... heck, probably last century.

Back then, graphics processors were too specialised to be useful for much besides... graphics. They didn't, for instance, have the ability to quickly process numbers with enough precision for many serious-computation tasks.

In brief, an old graphics card that could only do 16-bit colour could only quickly process 16-bit numbers, and that isn't enough precision for many interesting applications. You can use software to make just about any hardware process numbers with as much precision as you like, but this can hugely reduce performance, which defeats the purpose of GPGPU.

Early on, people were also trying to shoehorn GPGPU tasks into the graphics Application Programming Interfaces (APIs), because that was all that was available. There was no other way to tell a graphics card to do anything, unless you hit the hardware directly and wrote new versions of your software for every graphics card, like in the bad old DOS days.

Both Nvidia and ATI now have pretty thorough GPU programming toolkits available, though. Nvidia's is called CUDA; ATI's is Stream. (Or, at least, that's what they were calling it the last time I looked.) There are third parties making GPU programming languages, too.

So GPGPU coders no longer have to trick the graphics hardware into doing non-graphics tasks by sending it peculiar graphics instructions. There are even, now, specialised cards that have GPGPU-capable graphics processors on them, without any actual graphics connectors on the back. Nvidia will sell you a whole "Tesla Personal Supercomputer", which is a PC with a bunch of these sorts of cards in it.

People are so excited about all this because modern video cards are actually specialised parallel computers, which happen to usually run graphics software. Give them different software, and they can perform different tasks, sometimes much faster than any CPU. Each individual "processor" in a modern graphics chip isn't impressive when compared with a desktop CPU - but you get hundreds of these processors per graphics card. So they can be very fast indeed, provided you're asking them to do a highly "parallisable" task - a job that can be broken up into many independent streams, and in which the input for one stream doesn't depend on the output from another.

Unfortunately, GPGPU development hasn't quite made it to critical mass yet, so it's still quite hard for end users to find interesting things for their graphics processors to do, besides putting textured triangles on the screen.

Browse through GPGPU.org and you'll find quite a lot of people coding away at whatever signal-processing, scientific computation or fluid dynamics software takes their fancy. You'll also find occasional people building honest-to-goodness, though very specialised, supercomputers that're basically just normal PCs stuffed full of dual-GPU graphics cards. But you won't find much ready-to-run software that an ordinary non-programmer can use to demonstrate the power of GPGPU code.

Many GPGPU developers are, at the moment, making stuff that's only interesting to other GPGPU developers - you're lucky if you can download anything but source code from their Web sites. Take the "screen capture" tool this guy wrote, for instance. It allows Mac users to route anything any program puts on the screen, including high-frame-rate video, to any other program, which is very neat if you're writing stuff in Quartz Composer. But no use at all if you aren't.

There are also several video encoders that now have, or will soon have, GPU acceleration. This is very useful GPGPU task, and could be very helpful for people running ordinary Home Theater PCs, not just video production houses and professional TV-show pirates. But most end users don't do any video encoding at all.

I think a new computing platform's worthy of attention when you can get both Conway's Life and a Mandelbrot program for it (cue the Amiga nostalgia). Nvidia-card versions of both of those have been available for some time on the Nvidia SDK-samples page here.

Neither is actually a very good example of the breed, though - I think this other Mandelbrot program, with GPGPU support for both Nvidia and ATI, is better, and plain old CPU-powered software remains even better than that, at the moment. The Demoniak3D demo list also includes some nice fractal software, which does indeed use very little CPU power as it renders. But that software doesn't seem to be very fast either, which kind of defeats the purpose of running your fractal code on a GPU instead of a CPU.

The most obvious GPGPU task for the modern nerd is public distributed computing projects, like SETI@Home or Folding@Home, where zillions of PCs each work on their own little pieces of one gigantic job. There's been a Folding client for ATI graphics cards for some time now - it's up to version 2 as I write this. And there's an Nvidia Folding client too, the beta version of which came out only a few days after the release of the 200-series cards.

(There's a pre-release CUDA client for distributed.net as well, but it only does RC5 cracking, not the Optimal Golomb ruler searching that's the main d.net project at the moment. It's not slow, though. A 3.33GHz Core 2 Duo CPU can crack about 7.8 million RC5-72 keys per second; the CUDA cracker running on my humble GeForce 8800 GT managed almost 282 megakeys per second.)

Nvidia also, around the time of the 200-series card launch, announced that they'd be adding support for PhysX physics acceleration to the drivers for various of their recent graphics cards, and making the standard open for other manufacturers too.

PhysX isn't the first thing that springs to mind when you think of GPGPU applications, since it's really only useful for games. But PhysX originally required a separate add-on card that (to a first approximation) nobody bought. It didn't seem likely to take off while it was tied to special $200 hardware - but now you can get it for free (economically, if not computationally, speaking) with your graphics card.

Nvidia's PhysX acceleration wasn't really a new idea. Havok FX, a GPU-accelerated version of the very popular Havok physics engine, has existed for well over two years now. But, following the Law of New Features, it was used by almost nobody. A pretty decent list of games support PhysX, and this points the way toward other coprocessor tasks that GPUs may be able to do in the future.

OK, so you've got physics acceleration. That's fun. And Photoshop CS4 has a couple of GPU-powered features, with more promised as free downloads in the future. GPU acceleration shows some promise for audio processing as well.

GPUs also lend themselves to data encryption (and decryption, possibly for nefarious reasons...), and database acceleration. And, surprisingly, virus-scanning as well; Kaspersky's GPU-accelerated "SafeStream" isn't a full antivirus solution, but it can apparently do its much-better-than-nothing scan at a colossal data rate, making it useful for real-time scanning of, say, a mail server with many users. None of that's very interesting to the average end user, though.

Oh, and then there's mapping video RAM as a block device and then using it as swap space in Linux, which is pretty hilarious. But it's only a GPGPU application in the broadest sense, and not actually terribly useful.

Still, I'd enjoy playing with a Windows video-card RAM-disk utility, even if I had to remember to shut it down manually before I ran a game.

That sort of thing actually shouldn't be necessary in Windows Vista. The reason why Vista has slower graphics performance than WinXP is because Vista turns the video adapter into the same sort of virtualised device as many other parts of the PC, with multithreaded tasks and virtual memory and several other very impressive buzzwords.

(James Wang's piece from the October 2006 issue of Atomic has more on this, and GPGPU applications in general.)

Intel's trying to outflank both of the big video-card companies with their upcoming Larrabee GPU, which promises to be the first Intel video adapter that actually deserves the Extreme Ultimate Super Mega Graphics names that Intel will doubtless give it. Larrabee will be based on a bunch of shrunk-down, lower-power-consumption Pentium cores, each not unlike the current Atom CPU. So a Larrabee card will, essentially, be a horde of largely standard x86 CPUs, not weird specialised graphics processors.

GPGPU and the Law of New Features

Other columns