Not So Super
Originally published 2002 in Atomic: Maximum Power Computing Last modified 03-Dec-2011.
Go and have a look at Nvidia's Xbox page, here. It's been up for ages, now. They seem quite proud of it.
On that page, Nvidia assert that the Microsoft Xbox (whose biggest and baddest chip is an Nvidia product...
...hence the puffery on their site) "...has 80 Gigaflops of computing power. That's equivalent to the power found in a Cray C94 supercomputer."
This statement is, to use a technical expression, "a big fat pile of marketing".
As in, "Look out! Don't step in the marketing!"
By the same logic used above, one could argue that the 255 kilowatt engine in a Holden Special Vehicles Series 2 ClubSport clearly makes that vehicle 14% more capable than a 224 kilowatt Panzerkampfwagen IV.
A brief examination of these two machines should persuade even the very unobservant that a 2002 sports sedan is different, albeit in subtle and hard-to-spot ways, from a 1942 medium tank.
Actually, this is too generous. The Nvidia puffery isn't even that accurate. But people keep using these dumb speed comparisons - between game consoles and supercomputers, between PCs and supercomputers (the Xbox is rather more PC than game console...), between Macintoshes and supercomputers.
Enough!
First up, Graphics Processing Units (GPUs), whether they're on a graphics card or on a mainboard like the "XGPU", aren't general purpose processors. You can't make them do anything but graphics. They may be really fast for floating point operations having to do with the 3D graphics they're made to pump, but their speed for general purpose floating point is zero.
Both a tank and a car can drive around a racetrack. Graphics processors and CPUs aren't that similar.
OK, GPUs are kind of headed that way, with seldom-used proprietary hardware extensions giving way to more generally useful programmable processor features. But GPUs still ain't CPUs.
Secondly, gigaFLOPS (billions of floating point operations per second) are not a trivially measurable thing, like storage capacity. The number of operations per second a given processor can perform depends on what sort of job it's doing.
Tank tasks are different from sports car tasks. Well, unless you're very rich and very eccentric, anyway. Supercomputer tasks are just as different from game console tasks.
Assume you've got a PC with a CPU so blindingly fast that, even though it's a general purpose processor and not a dedicated graphics device, it can render 3D video as well as the Xbox video chip can.
That CPU still isn't within a thousand miles of a supercomputer with similar processor grunt.
One of the basic features of the tasks performed by traditional supercomputers (as opposed to distributed "cluster" supercomputers made of many relatively independent nodes), is that most of the input data to the supercomputer's CPUs (there's usually more than one processor), and most of their output, is going to and from main memory at astounding speed. All the time. Non-stop.
The data pipes to and from a traditional supercomputer's main memory therefore have to be huge. And the memory all has to be blazingly fast.
Nothing based on a PC architecture, including the Xbox, will perform worth a toss for these sorts of tasks, because main memory in personal computers is miserably slow by comparison. That's why PCs are so cheap.
A PC with a decent current processor and, say, 1024Mb of RAM that the processor could somehow access as fast as it can access its Level 1 cache, would actually be able to mix it up with some traditional supercomputers. It wouldn't be a whole lot cheaper than those supercomputers, though - I leave the pricing of 1Gb cache-speed RAM modules with an embedded CPU as an exercise to the reader - and its external connectivity, among other things, would be pathetic in comparison.
PCs, fortunately, don't generally do supercomputer-type tasks. They don't need to pump gigabytes of data to and from main memory every second. Hence, they use relatively small but quite fast caches all over the place to take the load off main memory. Some other game consoles - the PS2 is a good example - have very fast memory buses, and can be regarded as a cacheless, supercomputer-ish sort of design; its multi-unit CPU helps the analogy. But the PS2 still, of course, doesn't have anything like enough storage, or sufficient outside-the-box bandwidth, or a CPU that actually runs very fast, by PC standards.
A PC CPU has two levels of cache built in. A PC (or Xbox) graphics processor has cache as well. The whole graphics card memory block in a PC can be regarded as a slow, but huge, cache.
3D games are not at all like supercomputer tasks. Games tend to chew on the same data over and over, without leaning on main memory all the time. Which is why PCs can get away with video cards that have a slab of on-board RAM and a relatively small pipe back to main memory, and with CPUs that have a (smaller) slab of cache, and only slightly less weedy main memory access speed than the graphics card.
The four-processor Cray C94 that Nvidia use for their comparison is, for genuine supercomputer tasks, generally rated at four gigaFLOPS, not 80. It can probably do 80 or more gigaFLOPS for goofy little benchmarks, but nobody cares about them. They're worse than trying to determine a tank's top speed by removing its tracks and seeing how fast it can spin its sprockets.
In the future, it's possible that bulldust about graphics hardware power will be slightly less meaningless, because we'll soon have more generally programmable graphics chips, with the ability to do things like arbitrary floating point operations. This may, just may, actually also mean that the special new hardware features of these video cards will actually be used by game writers, in exactly the way that hardware Transform and Lighting, and programmable vertex and pixel shaders, generally haven't been.
Heck, we may even actually see distributed computing clients that can run on 3D hardware when you're not using it, allowing you to make your SETI@home or Folding@Home or distributed.net scores more studly. Such things have been announced before, but they've all been hoaxes; they might now become real.
But. Nonetheless. For one giant serialised calculation, a standalone non-clustered single-processor supercomputer which is, for dumb benchmark tests, apparently slower than a consumer PC, will trample the PC into the dirt and urinate on it.
That's another technical expression, by the way.