Intel Pentium 4 CPU
Review date: 21 November 2000.Last modified 03-Dec-2011.
I have Held The Power. And I have Not Been Very Impressed.
Darn, I've spoiled the surprise.
If you don't have time to read this whole thing, here's the executive summary. Intel's new P4 is a speedy processor, but it's not tremendously speedy for the money. The CPU's price is high, as you'd expect for a new processor, but it's the hidden costs that kill it, primarily the cost of expensive memory. And early adopters are going to be stuck with a motherboard that can't accept a faster, better version of the P4 that should be coming along in 2001.
Pentium III or Athlon - or even Celeron or Duron - systems which give performance close to, or in some cases better than, that of the early model P4s can be had for much less money.
So don't buy a P4.
The Pentium 4 will in the future, though, probably be a pretty spiffy chip. To find out why, read on.
New silicon
The P4 is the first genuinely new Intel CPU for some time. The Pentium Pro (Intel's serious server chip back in the original Pentium days), Pentium II, Celeron and Pentium III are all based around the Intel "P6" core. The current "Coppermine" P-III version of the P6 is considerably more advanced than the old Pentium Pro version, but the basis is the same.
The Pentium 4, though, is based on the new "NetBurst" micro-architecture. The first processor core to use the new architecture is the "Willamette", and it's different from the P6 core in a large number of ways.
It's still an x86-compatible CPU, of course, with strong ties to its predecessors. But it's different enough to qualify as a completely new design. Those who care about the new NetBurst features can find them explained concisely in AnandTech.com's excellent P4 review, here.
But none of the NetBurst design changes are of any particular interest to people who aren't interested in advanced semiconductor engineering. The question is - how well does the thing perform, and is it value for money?
The test box Intel let me play with was a very expensive computer indeed, because they'd loaded it with a 64Mb GeForce2 Ultra graphics card, 256Mb of memory (of which more in a moment) and sundry other non-P4-related extras - fast ATA/100 hard drive, DVD-ROM drive, and an LS-120 drive instead of a floppy.
None of this stuff has anything to do with the new CPU. You can install monster graphics cards and all the rest of the trimmings in any current PC. The important parts are the CPU and the motherboard. And the RAM. Because the P4 doesn't use the same memory as pretty much every other computer on the market today.
The CPU was a 1.5 gigahertz (GHz) P4, the flagship processor at launch time (there's a 1.4GHz version as well at the moment, and apparently slower as well as faster P4s will be coming in the next few months). My review P4 wasn't a retail chip - well, not unless retail chips have "INTEL CONFIDENTIAL" stamped on them along with the rest of their identifying information. Its performance, though, is according to Intel exactly the same as the production processors.
The front and back of the P4. The new 423 pin chip form factor ("Socket 423", against the Socket 370 used by other current Intel socketed CPUs) seems likely to be fairly short-lived. Later model, higher clock speed P4s will come in the "mPGA478" form factor, with 55 more pins.
There might be mPGA478 Willamettes; I've pored over leaked roadmap documents, but I'm darned if I know. Around the time Intel makes the form factor switch, though, they'll also transition from the Willamette 0.18 micron manufacturing process (the same as is used by most other current CPUs) to a 0.13 micron process, making a hopped-up "Northwood" core. Northwood is an evolution of the Willamette in much the same way that the "Coppermine" core was an evolution of the original P-III's "Katmai" core. All mPGA478 CPUs, of course, will be incompatible with Socket 423 motherboards.
Intel have been coy about exactly how fast the Willamette will go before the Northwood takes over. They've talked about P4s at 2GHz coming next year, and I sincerely hope those will be Socket 423. If they have to go to Northwood to get 2GHz, and don't make Socket 423 Northwoods, the current P4 boards will be real white elephants.
Given Intel's recent history of announcing processors that nobody can buy - they've had a hard time maintaining clock speed parity with AMD of late - I take any Intel CPU roadmap rumours with a large grain of salt. Who knows what the heck they'll end up selling.
The 1.5GHz P4 is, at the moment, selling to Original Equipment Manufacturers (OEMs) for $US819 in thousand unit quantities. The 1.4GHz version's $US644. As I write this, retail bare-processor pricing's still all over the place, as the official launch hasn't happened yet and it's impossible to dig the CPU price out of the prices of pre-built P4 systems.
What's clear enough is that the P4 is following the standard pattern for new Intel CPUs; it's not a great deal faster than the fastest previous-generation CPU that it replaces, but it's substantially more expensive. The big deal about the P4 isn't that it's terribly fast, compared with a 1GHz Pentium III; it's that it will be a lot faster when you can get P4s that run at higher clock speeds than 1.5GHz. It's the potential, not the actuality.
There's nothing new about this. The original 450 and 500MHz P-IIIs weren't exciting compared with the 450MHz P-II, and the original 233 and 266MHz P-IIs didn't trample the 233MHz Pentium either.
First generation new model Intel CPUs are, generally, questionable value for money. But first generation P4 systems are really questionable value for money, because they use expensive memory. Expensive Rambus memory.
Million dollar memory
The two 128Mb memory modules from the test machine. Current Australian price, a jaw-dropping $AU900 each. Perhaps some dealer's got pricing that's closer to the current $US250 or so for PC800 128Mb modules - with the P4 launch, I'm sure someone down under's going to start shifting RIMMs in a bit more bulk - but as I write this, Aussie RDRAM prices are still preposterous.
You can spend a lot more than $AU900 for 128Mb of PC800, if it's ECC memory (of which more later) with a fancy brand name on it. Even if you opt for basic PC600 modules, you're still talking $AU650 for 128Mb.
The abovementioned $US250, by the way, is just the low price I found on Pricewatch.com, after discarding the first few really dodgy options. Big-name US dealers, as opposed to deep-discounters with unknown morals, are selling PC800 128Mb modules for more than $US300.
So even if you buy from the States, I really can't see you getting modules to match these two for any less than $AU1200.
$AU1200 will buy you six decent PC133 128Mb SDRAM DIMMs, as used by most top of the line P-IIIs, at the moment. Or two DIMMs, an 800MHz Pentium III, and a good motherboard to put it on. With change.
What's so great about Rambus memory to justify the big prices?
Well, it's got a ton of bandwidth compared with ordinary SDRAM. But that's not as big a deal as you might think, and to understand why, you'll need a quick primer on the technology involved. I know I did.
So that you can share my pain, here's Rambus Memory 101. Pay attention, people.
Biggs! I hope you brought enough of those for everyone!
RDRAM decoded
Direct Rambus DRAM (which is what people usually mean when they talk about RDRAM, though it's correctly referred to as DRDRAM) is the kind that's used in PCs using Intel's i820 or i840 chipsets, and the new P4 i850. And in the Sony PlayStation2, for that matter. Rambus want it to be used on video adapters as well, but the fastest adapters on the market today all seem to use Double Data Rate (DDR) SDRAM, of which more in a moment.
RDRAM for PCs comes in "RIMMs". Since SIMM stands for Single Inline Memory Module and DIMM stands for Dual Inline Memory Module, you'd think RIMM stood for Rambus Inline Memory Module. Heck, I did, until I discovered that it's actually just an all-caps Rambus trademark that, officially, doesn't mean anything. Which means "Rambus RIMM memory module" is not a redundant designation. You learn something every day.
ECC?
RIMMs come in Error Checking and Correction (ECC) and non-ECC versions, like other RAM varieties. ECC detects when RAM errors have occurred and can correct many of them (the usual system can detect two-bit errors in a byte, and detect and correct one-bit errors), but to do that it needs nine bits per stored byte, not just eight.
Most memory modules handle ECC by putting an extra chip on the module - nine chips instead of eight - but ECC RIMMs have the same number of chips, but more storage in each one. Some unscrupulous, or merely confused, hardware dealers pretend that these modules actually have more space.
So if you see "144Mb" RIMMs advertised, they're actually ECC 128Mb modules. They've got more storage, technically, but one bit out of every nine is part of the ECC pool and is no more useable by the computer than is the extra chip on ordinary RAM modules. For non-mission-critical PCs, the extra money for ECC memory's a waste.
RIMMs look pretty much like DIMMs, but they've got a metal heat spreader plate over the chips to stop hot spots from burning out the module. Rambus modules run fast, they run hot, and they don't spread read/write operations over lots of different chips like SDRAM modules do, so one chip can get very hot and something's needed to even out the thermal energy.
You install RIMMs just like a DIMM - push down into the slot until the latches on the end click in.
RIMMs come in PC600, PC700 or PC800 flavours. PC66, PC100 and PC133 SDRAM (and the newer, semi-official PC150) are all named for their maximum rated clock speed, and so is RDRAM, pretty much. RDRAM ought to always run at the speed in megahertz indicated by its PC-number, but that depends on the chipset you're using it with. PC600 RDRAM on 133MHz Front Side Bus (FSB) Intel i820 or i840 P-III motherboards, for instance, runs at only 533MHz.
RDRAM uses DDR clock-doubling technology, so the actual RAM bus clock runs at half the RAM speed, with two transactions per clock "tick". This can be a source of confusion; it's correct to say that the bus speed for PC800 RAM is 400MHz, but marketing people of course prefer to say that it's 800MHz RAM.
You can use any flavour of RDRAM you like on Intel's i850 P4 motherboards, like the D850GB in the test machine, and you can even mix modules of different speeds. But the faster modules are considerably more expensive, and mixed-speed systems will run all of the memory at the speed of the slowest module.
Current motherboards using ordinary SDRAM, and single-channel Rambus boards, can have odd numbers of memory modules. The i840 and i850, though, use "Dual Rambus", which requires you to use pairs of identical modules, just like older SIMM-memory motherboards.
This means that if you want a 128Mb computer, you can't buy one 128Mb module. You have to buy two 64Mb ones. At current pricing, that practically doubles the price of your memory. Sheesh.
The biggest RDRAM modules on the market today are 256Mb units, while you can get 512Mb SDRAM modules. Neither's very popular, though, because modules with half the capacity are rather less than half the price. Unless you need more than 512Mb of RAM on your RDRAM board or 1024Mb on your SDRAM one, there's no need to bother with the very expensive maximum-size modules.
Intel's doing package deals for OEMs that include a P4 CPU, an i850 motherboard and a pair of 64Mb PC800 memory modules. OEMs that take the deal get Intel's "RDRAM Credit Program" that refunds $US70 per P4 bought, or $US60 per P4 after the end of this year. Well, whoopee.
All of this double-modules stuff is happening because dual-channel memory allows the computer to run two separate RAM access jobs at once. This gives twice the theoretical RAM bandwidth and an average reduction in latency (of which more in a moment), whenever data's needed from different physical memory chips simultaneously.
RDRAM's competitor in the high-bandwidth stakes is Double Data Rate (DDR) memory, which is essentially SDRAM technology but twice as fast, clocking the RAM on the rising and the falling side of the clock signal for twice as much data throughput. DDR RAM has PC-numbers to describe its rating, but just to be confusing, the numbers are much bigger than those for any other RAM flavour else.
266MHz DDR RAM, for instance, which has a doubled 133MHz clock speed, is called "PC2100". The big number indicates the theoretical raw bandwidth of the memory in megabytes per second.
Plain PC100 SDRAM has a theoretical bandwidth of 800 megabytes per second; PC133 can shift 1067Mb/s. And so 266MHz memory using the same bus width scores 2100-odd raw megabytes per second.
You don't get that much real available RAM bandwidth on any PC; for actual real-world tasks there's a ton of overhead, so PCs generally have a difficult time managing half of the data transfer rate their specs suggest. But the comparative speeds work out pretty much the way you'd expect.
Not so with RDRAM, though. Its fundamental design is quite different from that of SDRAM systems, and you can't compare its theoretical bandwidth directly with SDRAM numbers. PC800 RDRAM's got a theoretical bandwidth of 1600 megabytes per second; dual-channel PC800's peak bandwidth is 3200Mb/s. But these numbers are only useful for RDRAM-to-RDRAM comparisons.
Why? Glad you asked.
Bandwidth versus latency
Bandwidth - megabytes per second - is only part of the RAM speed story. The other part is latency.
Latency is the amount of time that elapses between asking for a read or write operation to occur, and actually having it happen, so that you can do something else with the storage device.
For a hard drive, latency is composed of seek time - the amount of time the read/write head assembly takes to move to the right position for the operation - and rotational latency, as the stack of platters rotates until the right starting point is under the heads. All this commonly takes something under ten milliseconds (ms, thousandths of a second), on average, for modern drives.
RAM has no moving parts, and its latency is much smaller than a hard drive's. But it also has to deal with many more operations per second, and high latency can significantly reduce the number of operations per second a memory system can handle. This reduces the benefit you can get from data moving really fast once the operation starts.
You can think of a DRDRAM system as a daisy-chain of chips. Data and instructions can be usefully described as going in to one end of the first module, through all of the chips on that module one after the other, going out at the end of the module, going in to one end of the next module, and so on. At one end of the chain there's the RAM controller, at the other end there are terminating resistors.
These dummy RAM modules sit in the unused RAM slots of any RDRAM-equipped machine that doesn't have a real RIMM in every slot. They're more correctly referred to as Continuity RIMM Modules or C-RIMMs, and they're there to provide electrical continuity through all of the RAM slots. Remove them and the motherboard won't work, because the daisy chain is broken.
The problem with this design is that at the very high speeds we're talking about here, signal propagation time is significant. Every chip in the chain adds some delay, and the memory chips further away from the controller take longer to respond.
To avoid nasty data ordering problems - where an operation on a nearby chip happens so much faster than one on a distant chip that it finishes first, even if it were ordered second - DRDRAM systems calibrate themselves on startup and each memory module deliberately, artificially, delays its response to read requests as necessary to keep everything in the right order. Distant modules have no delay at all, close modules have the most, and the whole system responds slower and slower the more modules there are.
As clock speeds rise, the delay becomes more and more significant compared with the rate at which the CPU's capable of telling the RAM to do things.
As explained in some other excellent articles, like this one at Ars Technica, RDRAM has some other... features... which slow its response to requests. The upshot, though, is that compared with SDRAM, RDRAM latency is lousy. And dual channel RDRAM doesn't solve the problem - it just gives the computer the chance to have two high-latency operations going at once, most of the time.
A lot of people get hung up on bandwidth and latency issues, and miss a more significant fact - most desktop computer tasks are relatively insensitive to RAM speed. Bandwidth and latency both.
If you play a lot of 3D games, it may help a bit if you get a large amount more RAM speed. But for the tasks most people do with most PCs - even with cutting-edge really expensive PCs - RAM speed makes surprisingly little difference.
This isn't a new revelation. Intel marketroids made proud noises about the system performance increase that'd spring from the step from the 66MHz Front Side Bus (and RAM) speed to 100MHz, and from 100 to 133, but the benchmarks didn't show any big improvement for desktop computer tasks. By testing P-II and P-III processors with similar core speeds but different FSBs, it rapidly became clear that the RAM bandwidth changes didn't make much of a difference.
Now, the step from PC133 to dual channel PC800 is bigger than the step from PC66 to PC133, but it's not actually a heck of a lot bigger. PC133's raw bandwidth is twice PC66's; dual channel PC800's is three times higher than PC133's. All things being equal, therefore, you could only expect 50% more RAM-related performance improvement.
With the whole new P4 architecture, though, it's difficult to winnow RAM speed's contribution out from all of the other variables.
Why bother?
So why in heaven's name does the P4 use Rambus memory, when RAM speed isn't all that important, and Rambus memory's not as fast as it seems anyway?
Because Intel own part of Rambus, and they're under contract to keep making RDRAM-supporting hardware until 2003, that's why.
Intel are acutely aware of RDRAM's lack of appeal to the average consumer, and the nasty smell that Rambus, the company, has, thanks to its aggressive enforcement of widely questioned intellectual property, like patents not only on RDRAM but on SDRAM technology as well.
This isn't scuttlebutt and innuendo; Intel's CEO has gone on record as saying that Intel's "bet" on Rambus "did not work out".
Rambus' latest scheme, reported here among other places, is an attempt to collect license fees from anyone who makes anything that interfaces with SDR or DDR SDRAM, and RDRAM as well. Which means every motherboard chipset maker, just for starters.
Rambus makes no physical products at all; all it makes are patents, and it wants its income stream to come from license fees for RDRAM, and for DDR and plain SDRAM as well, technologies of which Rambus also claims ownership.
So far, the Rambus plan's working, despite widespread opprobrium. But Rambus is rapidly achieving the kind of anti-fan-club among hardware aficionadoes that Microsoft has among software people.
Intel don't even have a non-RDRAM motherboard chipset planned until late next year, but they're looking into licensing the P4 architecture to other chipset makers (like Via Technologies, for instance), so that cheaper plain and DDR SDRAM boards can be made earlier.
Whether we're likely to see any number of such boards before Intel are making their own, though, remains to be seen. For the time being, RDRAM's where it's at for the P4.
Making numbers
I didn't have the time to give the P4 test machine a thorough working over, but fortunately Intel did part of the job for me, and shamelessly provided quite a lot of unexciting statistics. You can read them in my AustralianIT P4 review here.
Essentially, even Intel's specs didn't show the P4 as beating a 1GHz P-III machine by a large margin overall, and my own tests against my now-quite-humble 800MHz Athlon box did the new Intel chip no favours either.
A few benchmarks and remarks that I didn't mention in the other review:
Ziff-Davis' CPUMark 99 gave the P4 92.1, versus 74.5 for my 800MHz Athlon. A 24% win, but only two thirds as fast as the Athlon, clock for clock, since the P4's clock speed is 87.5% higher than the Athlon's.
The P4's SPECfp2000 result was 79% higher than the 1GHz P-III's, but SPECfp2000 leans heavily on RAM speed. The SPECfp benchmark's made up of a bunch of big scientific floating point computation loops - exactly the sort of thing that benefits from gigantic RAM bandwidth. These sorts of tasks are best done by traditionally designed supercomputers, which have enormous pipes between their components. If you've got super data throughput, you do well in SPECfp.
SPECfp is, however, not at all representative of the kinds of algorithms that desktop computers normally have to deal with, even if you do professional 3D rendering for a living.
For desktop PCs, the SPECint benchmark is much more informative, and for that test even Intel's stats say the 1.5GHz P4's only 23% faster than a 1GHz P-III.
Or, on a clock-for-clock basis, 18% slower.
The SPEC benchmarks are "compilable" - they come as source code, which has to be compiled to work on the platform on which you want to run the benchmark. You'd better believe that Intel used their latest and shiniest P4 optimised compiler to make their SPEC executables, but they still only managed 23% more speed for SPECint. This, by the way, suggests that they didn't cheat and run the same executable on the P-III system as well, but instead optimised it properly for each platform.
Clock-for-clock, the P4's results (79% better for SPECfp2000, 41 to 44% better for Quake III, 32% better for 3D WinBench 2000's Processor Test, and 16% better for 3DMark2000's CPU Speed Test) versus the P-III made it 19% faster, 6 to 4% slower, 12% slower and 23% slower, respectively, than the much cheaper, older system.
I considered giving the test machine a whirl with SiSoft Sandra as well, but since Sandra has been widely accused of using advanced Making Up Numbers technology, I decided not to.
Sandra's great on Win95/98 systems, for seeing reams and reams of detailed data about your system configuration. But its benchmark results - which are all it's good for under Windows 2000, which the test machine was running - are, sometimes, just weird. Sandra can make quite different numbers from run to run on Via Technologies chipsets, and its cross-platform results may or may not be reliable. I believe moon phase may be involved.
There should be a new version of Sandra out more or less as I write this, which may be better (and which includes P4 benchmarks as one of its standard comparison systems, so you can see how your computer stacks up against the new kid), but it's come too late for me to try it out on the P4 I had to play with.
If you want Sandra benchmarks, allow me to recommend the HardOCP P4 review, here.
In some other tests, the P4 looked really bad. But it had some excuses.
The P4's new core doesn't match up well with current CPU-optimised code, which is tuned to run fast on P-IIIs or Athlons (or, commonly, both) but doesn't know what to do with P4s. This applies, for instance, to graphics drivers, which have optimisations for the extended instruction sets of the earlier CPUs but not for the new Willamette Streaming SIMD Extensions 2 (SSE2) instructions.
But that's not the whole story. The P4 is a new CPU, made to scale up to much higher speeds than it's running at now. Until it hits those speeds, its ordinary per-clock performance hurts it, and no amount of RAM bandwidth can save it.
The distributed.net client software, which I talk about in more detail in my AustralianIT piece, is a perfect example of badly non-P4-optimised software. The P4 was smashed by the much slower clocked Athlon in the distributed.net RC5 benchmark, and lost to it by a smaller but still humiliating margin for OGR cracking.
If a new version of the distributed.net client that works better with the P4 doesn't come out soon, I'll be surprised. But it's possible that the P4 will never perform especially well for this hard-core integer-math test, since its huge memory bandwidth and super FSB speed are pretty much perfectly irrelevant.
Similarly, a quick burn with the downloadable version of WinTune gave my 800MHz Athlon a score of 2459 MIPS (million instructions per second, a not very informative integer math statistic) and 1002 MFLOPS (million floating point operations per second, a similarly synthetic floating point number), versus 2661 MIPS and 830 MFLOPS for the P4.
These numbers, again, just reflect the unsuitability of the tests to the new processor. If a 1.5GHz P4 really has 83% of the floating point performance of an 800MHz Athlon then a whole Intel engineering division should commit ritual suicide. Fix the benchmark and you'll get more realistic numbers.
But if you're running something that's as badly un-fixed as WinTune or distributed.net, your new P4 system may give you miserable results.
I was mainly running WinTune to see some whimsically created but repeatable RAM bandwidth scores. Here, the numbers worked out the way they should have, with exactly the level of nuttiness I expect from WinTune.
The Athlon scored an imposing 3155 megabyte per second read speed, 2082Mb/s write speed and 1144Mb/s copy speed from its 100MHz RAM. The big numbers are all physically impossible, of course; WinTune isn't exactly a real-world sustained-transfer tester, it's just something that can do five benchmark runs while you go and get a glass of water. It shunts around blocks of memory from four to 2048 kilobytes in size, and it gives you some numbers, and it's repeatable and consistent across different PC platforms.
The P4 clocked in at 3472Mb/s read speed, 3937Mb/s write, 2542Mb/s copy. Forget the actual numbers; look at the comparative magnitudes. This gives you some idea of the real difference, and the difference for writes and copies is as big as you'd expect it to be from the P4's impressive SPECfp2000 results.
The cheap competition
The thing that Intel take pains not to mention in their PR bumf, of course, is the very existence of the AMD Athlon.
The 1GHz Athlon is, quite uniformly, faster than the 1GHz P-III. Name your real world benchmark, and an Athlon will probably be a little faster than a similarly clocked P-III. But about half the price.
Athlon motherboards cost a bit more than P-III ones, but you still save plenty on the hardware by buying an AMD-based system instead of an Intel-based one, and at a given clock speed you end up with a faster computer.
The 1GHz Athlon, to slightly beat Intel's comparison-system 1GHz P-III, isn't the top of the line, either. There are 1.1 and 1.2GHz Athlons out at the moment, as well. And AMD reps delight in saying that the company could release a rather faster model any time they liked, but don't see any reason to drive down the price of the easier-to-make Athlons when Intel has nothing substantial for them to beat!
Against the 1.2GHz Athlon, even one using plain PC100 SDRAM and not one using not-quite-available DDR memory, the 1.5GHz P4 is not a strong contender. In benchmarks that don't take advantage of monstrous amounts of RAM bandwidth, the Athlon does just as well as, or better than, the Intel flagship.
Bits and pieces
Here are some giblets from the P4 test system.
This is the extra connector from the new ATX12V Power Supply Unit (PSU). It's there to handle the higher power demand of the processor section of the motherboard, and the computer won't work with a plain ATX PSU. The ordinary ATX power connector's been retained, as well.
The big and beefy P4 cooler, weighing in at about a pound - far heavier than the stock coolers for any previous Intel processor.
The P4 heat sink's weight comes mainly from its solid copper base, in which is inset a bunch of bent-metal fins. The whole thing looks a bit agricultural - bear in mind that this is a pre-release processor, and retail gear probably won't be quite as, ah, unfinished - but it's got a lot of surface area and the massive copper base...
...does the heat transfer job well.
The CPU, freshly anointed with a new layer of thermal transfer compound and ready for the cooler to be clipped back onto it. Note the solid black plastic cooler mounts on either side of the socket, retained with four big shiny screws. The screws go through the motherboard and thread into the side of the case, and the cooler's so heavy that the screws absolutely have to be there - you could knock together a gimcrack through-the-board mounting arrangement with cable ties and washers and so on, but there'd be a lot of strain on the board if it wasn't horizontal.
This means it's not possible to install a P4 motherboard in an ordinary ATX case. Well, not unless you drill holes for the mounting screws yourself. I dare say one-size-fits-all cases in which you can install any kind of board will be along very shortly, but at the moment just switching your power supply for one with the extra 12V connector is not going to be enough.
These are the clips that hold the P4 heat sink onto the plastic mounts. The retention mechanism is actually quite easy to deal with - you hook the middle of the clip into the matching groove in the end of the heat sink and onto a matching latch on the mount, and then you push down the ends of the clip until they click into place. You need a flathead screwdriver to pop the clips off again, but it's really quite simple, and holds the processor down very securely.
The i850 main chip lurks under this generously proportioned heat sink, retained with a nifty spring-clip mechanism.
Multiprocessing
Want a dual or quad-CPU P4 machine? Well, you can keep wanting one. The Willamette P4 is single-processor only, full stop. The 0.13 micron P4-to-come will work in dual processor configurations, and the "Foster" server variant (the P4 Xeon, essentially) will work in up to eight-way configurations. But the Willamette will never be any good for multiprocessing.
Now, the Athlon's single-CPU only, as well, but Athlon SMP (Symmetric Multiprocessing) boards ought to be available before the middle of 2001 (AMD say their SMP chipset will be out in the first quarter), whereas P4 SMP systems are currently projected to be available in the second half of 2001.
This is a bit of a blow for Intel's marketroids, who've been selling dual-CPU Pentium Pro, P-II, P-III and Xeon systems to the PC workstation crowd for years, but now have to turn around and deny all of their "if it's got one CPU, it's not a workstation" palaver.
You can buy dual CPU P-III systems right now, of course, and a pair of current model P-IIIs give you a respectable amount of workstation CPU grunt for your money, provided you're running programs that benefit from multiprocessing.
Dual CPUs don't help at all for serial processing tasks, where you have a sequence of calculations that each depend on the results of the previous one. In a situation like this, all the second CPU lets you do is use the machine for some other task while one processor steams away doing your serial computing job.
For parallel tasks, though - where there are separate calculations that don't depend on each others' outcome - multiple CPUs can give you something approaching the raw mathematical improvement you'd expect, with jobs taking half as long if you've got twice as many chips. It helps to have lots of speed in the other subsystems, notably disk storage and RAM, if your tasks shift a lot of data around; otherwise, the two CPUs won't be fed as fast as they'd like. But the extra processor's very much worth the money.
A twin-P4 box with dual channel PC800 RDRAM could actually be quite good value for money for people looking for what Apple keep calling a "desktop supercomputer". It'd certainly be better than a dual-CPU Power Mac G4, which is saddled with perfectly ordinary SDRAM and does not deserve its marketing-driven label at all.
But most people, even most dual-CPU PC buyers, don't actually need that kind of RAM bandwidth, and would be happy to save money by buying a PC with slower RAM. Or the Mac, for that matter.
Overall
P4 performance may be significantly more impressive once it's been out for a while, and more software that's optimised to the P4's new architecture is out. Don't hold your breath for SSE2 optimisation to conquer the software world, though; as Intel have learned with their previous enhanced instruction sets right back to the Pentium MMX, many software developers don't bother to spend the time and money to support the new instructions.
At the moment, the P4's combination of high CPU prices, very high RAM prices, and less than stellar performance is not an attractive one.
If you're a corporate buyer interested in performance per dollar, get your hopeful vendor to knock together test machines using whatever processors you're considering and test your software on them. If it goes like the clappers on P4, and you don't mind the price, then go ahead and buy the new machines.
If you've got a mattress stuffed with money and want a red hot 3D game machine, a 1.5GHz P4 with a GeForce2 Ultra will give you one. But it's only likely to clock a roughly 20% better frame rate, assuming you're not limited by the speed of video card, than will a far cheaper 1.2GHz Athlon.
If you don't run anything unusual, the P4 is really not for you yet. It's financially irresponsible to buy the darn things, if you ask me, when they don't yet beat commodity P-IIIs by much, or top-end Athlons by anything, for desktop computer tasks.
Intel are reportedly pushing as hard as they can to get the 2GHz P4 out the door as soon as possible, and that chip's 33% better clock speed will help pull the benchmarks out of the doldrums. If AMD don't have another Athlon out to match it, it'll be the king of the desktop CPUs for a little while at least. But it'll still be very expensive indeed, especially if you have to buy RDRAM for the darn thing, and if AMD really can pull a 1.5GHz Athlon out of a hat whenever they choose, it'll probably pretty much level-peg with the 2GHz P4 - especially if the Athlon box has DDR RAM.
Intel will, of course, sell plenty of P4s, no matter how they perform. Lots of corporates wouldn't dream of buying a PC powered by any CPU not made by Intel, and a fair subsection of that market just buy the newest and shiniest as a matter of course in their regular upgrade cycles.
Getting the newest and shiniest CPU always costs pots of money compared with slightly slower, slightly older models. Such is the nature of things. And the first model of a new line of CPUs is always a not terribly impressive chip compared with the last model of the line it's replacing.
But the P4 sets a new benchmark. When you factor in the RDRAM, a P4 system is ruinously expensive compared with almost-as-fast P-III or Athlon machines.
If you're want to run hard-core scientific supercomputer jobs on a single-CPU box, a P4 is very much the business. But otherwise, buy anything but. If you have an Intel-or-nothing policy, buy P-IIIs or Celerons. If you can countenance AMD, buy Athlons or Durons.
One day, it'll be good. It ain't now.