In the spring of 2000, AMD's top-end Thunderbirds stepped up to a 133MHz bus speed, which provided a massive performance boost. The Pentium-III, worthy contender though it was, was now hopelessly outclassed. Intel was in panic mode by this time. Never before had Intel's best chips been so easily out-performed, and the company responded by releasing the all-new Pentium 4 before the design was really finished.
This was Intel's first all-new X86 design since the Pentium Pro. Advance reviews varied incredibly, ranging from "damp squib" at one extreme to "greatest advance since the 386" at the other. To our considerable surprise, the "damp squib" predictions were correct — in its first-release form the P4 was incredibly expensive but much, much slower than the Thunderbirds — indeed it couldn't outperform the cheap little Duron or Intel's own P-III. Actually, this was traditional, if you stop to think about it. None of the all-new Intel designs have performed well in their first releases: the 386-16, the 486-20, and the abysmal Pentium-60 — all were inferior to their more primitive but better developed forbears, and yet all went on to become resounding success stories as they matured. In later years the P4 would become a serious performer, but in 2000 it was a non-event.
All in all, 2000 was a turbulent year with more drama and excitement than any three normal years. The limelight went to the Athlon 1000 Classic and the Coppermine 1000, to the Thunderbirds, the extraordinarily quick Athlon "C", and the first release of the Pentium 4. But in reality the most significant CPU that year was the humble Duron. The Duron was faster than a P4, equal to a P III or an Athlon Classic, not all that far behind a Thunderbird, and annihilated the hapless Celeron, yet it was cheaper than any of them. Simple, practical, reliable, affordable, and remarkably powerful, the Duron outsold everything else that year and the next. It actually made a massive difference to the typical computer user.
The second-generation Athlon Thunderbird did for the Athlon what the Coppermine had done for the Pentium-III. Like the Coppermine, it had 256k of full-speed on-chip cache instead of 512k of cache at one-third or one half speed located off the chip. Like the Coppermine, with most applications it got a substantial performance boost compared with its predecessor, and the higher the clockspeed the more significant this became.
But the parallel was not exact. First, the Coppermine had a set of subtle but powerful cache design optimizations which the Thunderbird did not — notably a wider data path between cache and CPU proper. The Thunderbird had a 64-bit path to its 4-way cache, the Coppermine a 256-bit path to an 8-way cache.
Secondly, and on the other hand, the Thunderbird's cache, like the Duron's, was exclusive — this means that the primary and secondary caches do not duplicate each other's work. Thus the effective cache size for the Thunderbird was 128k primary plus 256k secondary = 384k, where the Coppermine's effective size remained at 256k. Thirdly, the Coppermine core, for all its wonderful cache optimizations, remained a 1995 design at heart, where the Thunderbird core was carried over from the much newer and more powerful Athlon design. The net result was that the Thunderbird was, clock for clock, the most powerful modern CPU on the market.
The T-bird 700 was rare: the 800 and 900 were the major sellers early on. Like all the original Thunderbirds, it was available in both socket and slot form. AMD provided the clumsy old-stye slot packaging as well as a new and vastly superior Socket A version so that main board manufacturers could use up existing stocks, but there were timing issues with the Slot A T-birds caused by the longer trace lengths, which meant that the slotted parts could only be used with the less common AMD 750 chipset, not the popular VIA KX-133.
Oddly enough nearly all the Thunderbird 700s we sold here at Red Hill were the odd-ball slot type — but not, as you'd expect, during the early days of T-Birds during the changeover from the Athlon Classic, but much later, when the Duron 800 was considered entry-level and the best-selling Thunderbirds were 1GHz and over. A supplier offered us their remaindered stock of Slot A 700s at a price that was too good to resist, and we used them to do low-budget but effective upgrades on old Socket 7 systems.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Slot A or Socket A | AMD | AMD | June 2000 | Legacy |
Internal clock | External clock | L1 cache | Width | Transistor count |
700 MHz | 200 MHz | 128k at 700 MHz | 256k at 700 MHz | 37 million |
In the entry on the Athlon Thunderbird 700 just above, we said that it was, "clock for clock, the most powerful modern CPU on the market". Note that key word "modern". We didn't say it was "the most powerful", because, on a clock-for-clock basis, it wasn't. In fact, it wasn't even close. Nor was the P III: and the Pentium 4, clock-for-clock, was slower still.
We had to add that word "modern", because there are one or two very old, relatively low clock-speed designs which had astonishing power on a clock-for-clock basis: the K6-III and — surprisingly — the moribund old Cyrix 6x86MX. A Thunderbird 1000 was of course much faster than a K6-III/500, let alone a 6x86MX-200, but it was nowhere near twice as fast as a K6-III, and not within miles of being five times faster than a 6x86.
Why then did the manufacturers, now that they had learned how to make 1000MHz CPUs, not simply make 1000MHz versions of the 6x86 and the K6-III? There were several reasons. First, the older parts, especially the 6x86, were not designed with an eye to reaching higher clockspeeds. Even with state-of-the-art manufacturing facilities, the 6x86 core couldn't be clocked at 1000MHz. Only the designers would know for sure, but it's probably reasonable to guess that a 0.13 micron 6x86MX would top out at perhaps 500 or 600MHz, a 0.13 micron K6-III at 700 or 800. They just won't clock faster than that without a comprehensive redesign.
Second, there are support and connectivity issues: both were based on the Socket 7 layout, with external, motherboard mounted cache — and Super 7 boards were limited to 100MHz. This limits them to 100MHz SDRAM and 100MHz cache RAM. So increasing the clockspeed of a 6x86 to 1000, even if it were possible, would achieve little unless there was a whole new motherboard design to go with it: a "Super Super 7" if you like — say, a 266MHz layout with DDR RAM, and 1MB of expensive 200 or 300MHz cache RAM. Possible? Certainly. Practical, given the difficulty of introducing a whole new mainboard chipset and achieving market acceptance of it? No. Remember how slow and difficult AMD found the introduction of Slot A until the Athlon started sweeping all before it. It is one thing to do this for a mainstream platform, another altogether to try it for a niche market product.
Third, there was (and still is) a pervasive point of view within the industry that business applications just don't need any more performance than they already have. "You can run Office XP on a Pentium-233" goes the logic, "so what is the point of having more?" It is entertainment applications that are driving CPU performance now, according to these people, and for games and sound and graphics processing, you need floating point performance — which is where the P-4 and the Athlons shine. What point then in building faster K6-IIIs and 6x86MXs, with their outstanding integer performance but relatively weak floating point units?
Against this commonly-held point of view, however, business performance does matter. Performance always matters, and the bigger and more inefficient software becomes with the passing years, the more important it is to compensate with faster hardware. While the web is full of hardware sites only interested in gaming frame-rates, gamers remain firmly in the minority. Ordinary home and office productivity tasks still are, and probably always will be, the most common use of computers, and the thing for which CPUs and main boards ought to be designed. The "floating-point theory" people suggest that the slow-down in worldwide CPU sales since about 1999 has happened because, even though AMD and Intel have worked very hard to improve FPU performance, they aren't offering a sufficiently compelling improvement in floating-point application usability. On the other hand, it is quite reasonable to suggest that the main reason people are not upgrading as often as they once did is because the top-line CPUs like the Athlon XP and the Pentium 4 don't provide us with an obvious overall performance boost in the way that the top-line chips of 1999 or 1995 did. In part, of course, this is because of main board, RAM and hard drive issues as much as strictly CPU-related ones. Nevertheless, while the floating-point performance of an Athlon or the SSE performance of a P-4 is phenomenal, their mainstream integer performance is nothing special, and for many applications a 500MHz K6-III with its massive caches was equal or superior to a 1000MHz class part.
Finally, there is the effect of scale. CPUs have scaled up faster than hard drives or RAM in recent years. As you make any given component faster, the effect of this component on the performance of the system as a whole diminishes. If you were to instantly double the performance of your main board and CPU, the net result would be small, because you would then be limited by your RAM and hard disc drive. Where upgrading from, say, a K6-200 to a K6/2-300 provided an immediate and obvious improvement, upgrading from a Duron 700 to a Thunderbird 1000 provided much less. (Several years have gone by since we wrote this section, but it remains as tue as ever: for business users, switching from an Athlon XP 2500 to an Athlon 64 3500 achieves little.)
There was an incredible variety of announced P III parts. So much so that we did not list them all here (far from it!) or take time to check the details on each one. In fact, we wondered if there was any point to doing that, rather than waiting to see which of them actually ended up appearing on the market in any sort of volume.
In retrospect, the only one of these three to sell well and deserve a mention was the 866 — which was reasonable, as with its 133MHz bus speed it was easily the best of them.
Form | Design & Manufacture | Announced | Available from | Status |
---|---|---|---|---|
Slot 1 | Intel | December 1999 | August 2000 | Legacy |
FC-PGA | Intel | March 2000 | August 2000 | Legacy |
FC-PGA | Intel | March 2000 | October 2000 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
800 MHz | 100 MHz | 32k at 800 MHz | 256k at 800MHz | 28.1 million |
850 MHz | 100 MHz | 32k at 850 MHz | 256k at 850MHz | 28.1 million |
866 MHz | 133 MHz | 32k at 866 MHz | 256k at 866MHz | 28.1 million |
Not much to say about these, as we don't remember ever seeing one. When they first came out they were the fastest X86 money could buy, and you needed a great deal of it to get one. But few people did: the race to 1GHz was on in earnest during the first months of 2000, and within a few weeks of the 850 there were even faster-clocked parts. By the time the price of the 850 dropped to something more reasonable, a month or two after the 1000 was announced, it was worth waiting just a little longer for the Thunderbird, which was just around the corner.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Slot A | AMD | AMD | February 2000 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
850 MHz | 200 MHz | 128k at 850 MHz | 512k at 340 MHz | 22 million |
The smallest "real" Thunderbird, in the sense that the 700 was made all but invisible by the faster T-birds on the one hand, and the Durons on the other. The 800 was in the price-performance sweet spot when the Thunderbird first arrived and accounted for the majority of sales early on, but it was a fairly short-lived part: the follow-on products rapidly raised the ante, and the popular choice Athlon became the 900, the 1000, the 1100, and then the 1200C.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Slot A or Socket A | AMD | AMD | June 2000 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
800 MHz | 200 MHz | 128k at 800 MHz | 256k at 800 MHz | 37 million |
Intel's first all-new X86 design since the Pentium Pro debuted to very mixed reaction in November 2000. Unlike the Pentium II, the Pentium-III, and the various Celerons, it owed nothing to the Pentium Pro design, and was new from the ground up. It arrived none too soon, as it was a full five years between new Intel X86 processors by then. Intel's frequent cosmetic name changes could not obscure the fact that the Pentium II/III, Celeron and Xeon were not actually new chips, merely variations of the P6 (Pentium Pro) design. In the same period, AMD produced three all-new designs (K5, K6 family, and Athlon), and even lowly Cyrix managed two (6x86 family and M3).
Willamette had been delayed so long because of the need for Intel to divert engineering resources to so many other tasks: the IA-64 project in particular, but also a huge number of variations on the P5 (Pentium) and P6 cores.
Astonishingly, the Pentium 4 did not improve on the old P6 design in either of the normal two key performance measures: integer processing speed or floating-point performance. Like most of the rest of the world, we were dumbfounded by this. At 1.5GHz, the Pentium-4 was not only inferior to the Athlons, it couldn't beat the Pentium-III!
On the face of it, it looked like complete and utter madness. Why would a long established and extraordinarily competent company spend six years and countless millions of dollars developing a new product that was not only no faster than their six-year-old design, but also bigger, more expensive, and demonstrably slower? Madness indeed, but madness with a method.
The Pentium 4 design sacrificed orthodox performance in order to gain two things: clockspeed, and SSE performance. While it did quite a lot less per clock-tick than an Athlon or a Pentium-III, it ticked over faster — 1.5GHz on introduction, 2GHz inside the year, and 3 to 4GHz within the next year or so after that. A fast-ticking Pentium 4 may or may not have been successful in out-performing a slower-ticking, harder-working Athlon, but this was not really the point: Intel's strategy was founded on the theory that to Joe Average, who knows little about the ins and outs of computer performance, a 1.5GHz Pentium 4 sounds faster than a 1.0GHz Athlon. Bigger number means faster, doesn't it?
The second reason for the P-4's extraordinary design, however, had real technical merit. It is SIMD performance. The Pentium 4's SSE unit was blindingly fast — easily faster than the equivalent SIMD units (be they MMX or SSE or 3DNow) in any of the AMD chips or in the Pentium IIs and IIIs.
You can read a lot more about the original Pentium 4 design fiasco on this very long but well-informed page.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Socket 423 | Intel | Intel | November 2000 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
1400 MHz | 400 MHz | 8k at 1400 MHz | 256k at 1000MHz | 42 million |
Right from the start the Athlon was designed to be run at high clockspeeds. Alas, as the first generation Athlon Classic ramped up, the higher speeds provided less of a performance boost than might be imagined. With the first generation product, the primary performance limitation was the external cache chips which were, like those on the Pentium II and Katmai Pentium-III, mounted on a board alongside the CPU chip itself, with the whole assembly hidden inside a cartridge. The more recent on-chip cache was much faster. There are two reasons for this. First, it was very difficult and extremely expensive to source cache RAM chips that could run much over 400MHz. Second, the latency of off-chip cache was much higher — particularly so with very high clock multipliers.
The answer, of course, was to move the entire L2 cache onto the chip itself. This is what Intel did with the Coppermine, and AMD were soon to do with the Thunderbird and Spitfire (the "baby Thunderbird" which was to become better known as the Duron). In the meantime, we didn't sell or recommend any Athlon Classic over about 800MHz — the price premium for the 1000MHz one in particular was huge, and the performance gain very small. If you really wanted 1000MHz, you were wise to wait for one of the on-chip cache parts to become available: Thunderbird 1000 or P III 1000.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Slot A | AMD | AMD | March 2000 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
900 MHz | 200 MHz | 128k at 900 MHz | 512k at 300 MHz | 22 million |
950 MHz | 200 MHz | 128k at 950 MHz | 512k at 317 MHz | 22 million |
1000 MHz | 200 MHz | 128k at 1000 MHz | 512k at 333 MHz | 22 million |
First of the 'Morgan' core Durons, baby brother to the Athlon XP rather than the Thunderbird. Like the Athlon XP, which was to follow a couple of months later, it was based on the Athlon MP core. Overall it was an incremental rather than a radical improvement, but the sum of several minor changes added up to a worthwhile total. The Morgan core Durons (1000MHz and up) had hardware pre-fetch, a thermal diode for overheat protection, SSE as well as 3DNow instructions, and an extra 180,000 transistors.
The Duron 1000 was one of the few Durons to stand out from the crowd a little: it was the biggest selling member of the Morgan family, and was also more or less the fastest chip that you could feel confident of plugging into an older Socket A mainboard without 266MHz bus speed and without the ability to supply enough 3.3V power for the very demanding Thunderbirds in the over 1000MHz class. For people looking to upgrade a twelve or eighteen month-old Duron 600 or 650 without buying a new motherboard, a second-hand Thunderbird 900 or 1000 was ideal but hard to find, and the Duron 1000 was often the only choice.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Socket A | AMD | AMD | August 2001 | end of life |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
1000 MHz | 200 MHz | 128k at 1000 MHz | 64k at 1000 MHz | 25.2 million |
We don't remember ever seeing a Thunderbird 950, though it and the 900 were both readily available, and the 900 was a strong seller in its day. For quite some time the 1000MHz part commanded a stiff price premium because of the psychological appeal of the four figure number, and the 950 was only marginally cheaper. As always here at Red Hill we kept our eye on the relative values: this made the 800 the one to have on first release, but it wasn't too long before the 900 dropped to not much more than that. It provided most of the performance of the 1000 at a significantly lower price and took its turn as the most popular of the Athlons.
The actual hands-on difference between the Athlon speed grades was small (as it is with nearly all modern CPUs), so upgrading from an 800 to a 900 or a 900 to a 1000 made a barely noticeable difference. This is less a matter of diseconomies of scale than it is of marketing: AMD and Intel have both taken to releasing new parts in ever-smaller increments. Blind Freddie could tell the difference between a 386SX-25 and an SX-33, or even between a Pentium 133 and a Pentium 166, but the average user struggles to tell which is the faster chip now unless they are three or four speed grades apart. It was not to be until the release of the Athlon "C" that the differences became noticeable again.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Socket A | AMD | AMD | June 2000 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
900 MHz | 200 MHz | 128k at 900 MHz | 256k at 900 MHz | 37 million |
950 MHz | 200 MHz | 128k at 950 MHz | 256k at 950 MHz | 37 million |