GPGPU and the Law of New Features

Publication date: 31 December 2008
Originally published 2008 in Atomic: Maximum Power Computing
Last modified 03-Dec-2011.


It Is Written that when a new, much-ballyhooed feature shows up in cutting edge expensive graphics cards, you shouldn't expect that feature to actually amount to anything for at least another couple of hardware generations.

This Law of New Features applies to everything. Remember hardware transform and lighting? How about full-scene antialiasing, and anisotropic filtering?

Every feature is introduced with great fanfare and gold-embossed text on the box, but it's not actually good for anything until a few years later. That's partly because it's not a good business plan to write software for hardware that not many people yet own, but it's also because the early versions of new features are often underpowered and incomplete.

Which brings us to GPGPU.

GPGPU stands, rather untidily, for "General Purpose computing on Graphics Processing Units". It's using graphics-card hardware to do things other than fling pixels at the screen, and it's something that people have been talking about since... heck, probably last century.

Back then, graphics processors were too specialised to be useful for much besides... graphics. They didn't, for instance, have the ability to quickly process numbers with enough precision for many serious-computation tasks.

In brief, an old graphics card that could only do 16-bit colour could only quickly process 16-bit numbers, and that isn't enough precision for many interesting applications. You can use software to make just about any hardware process numbers with as much precision as you like, but this can hugely reduce performance, which defeats the purpose of GPGPU.

Early on, people were also trying to shoehorn GPGPU tasks into the graphics Application Programming Interfaces (APIs), because that was all that was available. There was no other way to tell a graphics card to do anything, unless you hit the hardware directly and wrote new versions of your software for every graphics card, like in the bad old DOS days.

Both Nvidia and ATI now have pretty thorough GPU programming toolkits available, though. Nvidia's is called CUDA; ATI's is Stream. (Or, at least, that's what they were calling it the last time I looked.) There are third parties making GPU programming languages, too.

So GPGPU coders no longer have to trick the graphics hardware into doing non-graphics tasks by sending it peculiar graphics instructions. There are even, now, specialised cards that have GPGPU-capable graphics processors on them, without any actual graphics connectors on the back. Nvidia will sell you a whole "Tesla Personal Supercomputer", which is a PC with a bunch of these sorts of cards in it.

People are so excited about all this because modern video cards are actually specialised parallel computers, which happen to usually run graphics software. Give them different software, and they can perform different tasks, sometimes much faster than any CPU. Each individual "processor" in a modern graphics chip isn't impressive when compared with a desktop CPU - but you get hundreds of these processors per graphics card. So they can be very fast indeed, provided you're asking them to do a highly "parallisable" task - a job that can be broken up into many independent streams, and in which the input for one stream doesn't depend on the output from another.

Unfortunately, GPGPU development hasn't quite made it to critical mass yet, so it's still quite hard for end users to find interesting things for their graphics processors to do, besides putting textured triangles on the screen.

Browse through and you'll find quite a lot of people coding away at whatever signal-processing, scientific computation or fluid dynamics software takes their fancy. You'll also find occasional people building honest-to-goodness, though very specialised, supercomputers that're basically just normal PCs stuffed full of dual-GPU graphics cards. But you won't find much ready-to-run software that an ordinary non-programmer can use to demonstrate the power of GPGPU code.

Many GPGPU developers are, at the moment, making stuff that's only interesting to other GPGPU developers - you're lucky if you can download anything but source code from their Web sites. Take the "screen capture" tool this guy wrote, for instance. It allows Mac users to route anything any program puts on the screen, including high-frame-rate video, to any other program, which is very neat if you're writing stuff in Quartz Composer. But no use at all if you aren't.

There are also several video encoders that now have, or will soon have, GPU acceleration. This is very useful GPGPU task, and could be very helpful for people running ordinary Home Theater PCs, not just video production houses and professional TV-show pirates. But most end users don't do any video encoding at all.

I think a new computing platform's worthy of attention when you can get both Conway's Life and a Mandelbrot program for it (cue the Amiga nostalgia). Nvidia-card versions of both of those have been available for some time on the Nvidia SDK-samples page here.

Neither is actually a very good example of the breed, though - I think this other Mandelbrot program, with GPGPU support for both Nvidia and ATI, is better, and plain old CPU-powered software remains even better than that, at the moment. The Demoniak3D demo list also includes some nice fractal software, which does indeed use very little CPU power as it renders. But that software doesn't seem to be very fast either, which kind of defeats the purpose of running your fractal code on a GPU instead of a CPU.

The most obvious GPGPU task for the modern nerd is public distributed computing projects, like SETI@Home or Folding@Home, where zillions of PCs each work on their own little pieces of one gigantic job. There's been a Folding client for ATI graphics cards for some time now - it's up to version 2 as I write this. And there's an Nvidia Folding client too, the beta version of which came out only a few days after the release of the 200-series cards.

(There's a pre-release CUDA client for as well, but it only does RC5 cracking, not the Optimal Golomb ruler searching that's the main project at the moment. It's not slow, though. A 3.33GHz Core 2 Duo CPU can crack about 7.8 million RC5-72 keys per second; the CUDA cracker running on my humble GeForce 8800 GT managed almost 282 megakeys per second.)

Nvidia also, around the time of the 200-series card launch, announced that they'd be adding support for PhysX physics acceleration to the drivers for various of their recent graphics cards, and making the standard open for other manufacturers too.

PhysX isn't the first thing that springs to mind when you think of GPGPU applications, since it's really only useful for games. But PhysX originally required a separate add-on card that (to a first approximation) nobody bought. It didn't seem likely to take off while it was tied to special $200 hardware - but now you can get it for free (economically, if not computationally, speaking) with your graphics card.

Nvidia's PhysX acceleration wasn't really a new idea. Havok FX, a GPU-accelerated version of the very popular Havok physics engine, has existed for well over two years now. But, following the Law of New Features, it was used by almost nobody. A pretty decent list of games support PhysX, and this points the way toward other coprocessor tasks that GPUs may be able to do in the future.

OK, so you've got physics acceleration. That's fun. And Photoshop CS4 has a couple of GPU-powered features, with more promised as free downloads in the future. GPU acceleration shows some promise for audio processing as well.

GPUs also lend themselves to data encryption (and decryption, possibly for nefarious reasons...), and database acceleration. And, surprisingly, virus-scanning as well; Kaspersky's GPU-accelerated "SafeStream" isn't a full antivirus solution, but it can apparently do its much-better-than-nothing scan at a colossal data rate, making it useful for real-time scanning of, say, a mail server with many users. None of that's very interesting to the average end user, though.

Oh, and then there's mapping video RAM as a block device and then using it as swap space in Linux, which is pretty hilarious. But it's only a GPGPU application in the broadest sense, and not actually terribly useful.

Still, I'd enjoy playing with a Windows video-card RAM-disk utility, even if I had to remember to shut it down manually before I ran a game.

That sort of thing actually shouldn't be necessary in Windows Vista. The reason why Vista has slower graphics performance than WinXP is because Vista turns the video adapter into the same sort of virtualised device as many other parts of the PC, with multithreaded tasks and virtual memory and several other very impressive buzzwords.

(James Wang's piece from the October 2006 issue of Atomic has more on this, and GPGPU applications in general.)

Intel's trying to outflank both of the big video-card companies with their upcoming Larrabee GPU, which promises to be the first Intel video adapter that actually deserves the Extreme Ultimate Super Mega Graphics names that Intel will doubtless give it. Larrabee will be based on a bunch of shrunk-down, lower-power-consumption Pentium cores, each not unlike the current Atom CPU. So a Larrabee card will, essentially, be a horde of largely standard x86 CPUs, not weird specialised graphics processors.

Perhaps that's what it'll take for GPGPU apps to push their way past the Law of New Features.

Other columns

Learning to love depreciation

Overclockers: Get in early!

Stuff I Hate

Why Macs annoy me

USB: It's worth what you pay

"Great product! Doesn't work!"

The virus I want to see

Lies, damned lies and marketing

Unconventional wisdom

How not to e-mail me

Dan's Quick Guide to Memory Effect, You Idiots

Your computer is not alive

What's the point of robot pets?

Learning from spam

Why it doesn't matter whether censorware works

The price of power

The CPU Cooler Snap Judgement Guide

Avoiding electrocution

Video memory mysteries

New ways to be wrong

Clearing the VR hurdles

Not So Super

Do you have a license for that Athlon?

Cool bananas

Getting rid of the disks

LCDs, CRTs, and geese

Filling up the laptop

IMAX computing

Digital couch potatoes, arise!

Invisible miracles

Those darn wires

Wossit cost, then?

PFC decoded

Cheap high-res TV: Forget it.


Dan Squints At The Future, Again

The programmable matter revolution

Sounding better

Reality Plus™!

I want my Tidy-Bot!

Less go, more show

In search of stupidity

It's SnitchCam time!

Power struggle

Speakers versus headphones

Getting paid to play

Hurdles on the upgrade path

Hatin' on lithium ion

Wanted: Cheap giant bit barrel

The screen you'll be using tomorrow

Cool gadget. Ten bucks.

Open Sesame!

Absolutely accurate predictions

The truth about everything

Burr walnut computing

Nothing new behind the lens

Do it yourself. Almost.

The quest for physicality

Tool time

Pretty PCs - the quest continues

The USB drive time bomb

Closer to quietness

Stuff You Should Want

The modular car

Dumb smart houses

Enough already with the megapixels

Inching toward the NAS of our dreams

Older than dirt

The Synthetics are coming


Game Over is nigh

The Embarrassingly Easy Case Mod

Dumb then, smart now

Fuel cells - are we there yet?

A PC full of magnets

Knowledge is weakness

One Laptop Per Me

The Land of Wind, Ghosts and Minimised Windows

Things that change, things that don't

Water power

Great interface disasters

Doughnut-shaped universes

Grease and hard drive change

Save me!

Impossible antenna, only $50!

I'm ready for my upgrade

The Great Apathetic Revolution

Protect the Wi-Fi wilderness!

Wi-Fi pirate radio

The benign botnet

Meet the new DRM, same as the old DRM

Your laptop is lying to you

Welcome to super-surveillance

Lemon-fresh power supplies


Internet washing machines, and magic rip-off boxes

GPGPU and the Law of New Features

Are you going to believe me, or your lying eyes?

We're all prisoners of game theory

I think I'm turning cyborg-ese, I really think so

Half an ounce of electrons

Next stop, clay tablets

A bold new computer metaphor

Won't someone PLEASE think of the hard drives?!

Alternate history

From aerial torpedoes to RoboCars

How fast is a hard drive? How long is a piece of string?

"In tonight's episode of Fallout 4..."

How hot is too hot?

Nerd Skill Number One

What'll be free next?

Out: Hot rods. In: Robots.

500 gig per second, if we don't get a flat

No spaceship? No sale.

The shifting goalposts of AI

Steal This Education

Next stop: Hardware piracy

A hundred years of EULAs

The triumph of niceness

The daily grind

Speed kings


Game crazy

Five trillion bits flying in loose formation

Cannibalise the corpses!

One-note NPCs

Big Brother is watching you play

Have you wasted enough time today?

The newt hits! You die...

Stuck in the foothills

A modest censorship proposal

In Praise of the Fisheye


The death of the manual

Of magic lanterns, and MMORPGs

When you have eliminated the impossible...

Welcome to dream-land

Welcome to my museum

Stomp, don't sprint!

Grinding myself down

Pathfinding to everywhere

A deadly mouse trap

If it looks random, it probably isn't

Identical voices and phantom swords


Socialised entertainment

Warfare. Aliens. Car crashes. ENTERTAINMENT!

On the h4xx0ring of p4sswordZ

Seeing past the normal

Science versus SoftRAM

Righteous bits

Random... ish... numbers

I get letters

Money for nothing

Of course you'd download a car. Or a gun!

A comforting lie

Give Dan some money!
(and no-one gets hurt)