The biggest limitation on the 200 and 300MHx systems of 1997 and early 1998 was bus speed. The industry standard 66MHz bus speed had been around since the Pentium 66 of 1993, and common since about the end of 1995. Several attempts had been made to increase bus speeds in an ad-hoc sort of way, with varying success — the 75MHz 6x86-200 was a great success, the 83MHz 686MX-266 and various home-brew overclocks tended to be problematic. Primarily, this was an issue for the motherboard makers to sort out — in particular, they needed to find a way to decouple the PCI bus so as to get faster RAM access without cooking the video card.
April 1998 brought two good solutions. Intel produced a proprietary system, and a coalition of mainboard and CPU makers led by AMD and VIA developed the open Super 7 standard. Both systems worked well right from the start, and the result was a golden age for performance enthausiasts as ever-faster chips from AMD and Intel pushed the boundaries out.
Incredibly expensive when it first came out, but this was without doubt the fastest X86 chip money could buy.
As the new and high-end CPUs always had been, it was introduced at a price point that could not be justified by any rational analysis. Rationality and CPU buying habits, however, rarely used to go together. There were always a few people who really did not care about the cost — especially those that were buying with other people's money, of course — and a great many more who could be bamboozled by fast-talking salesmen into spending a small fortune on a chip that, in reality, was only marginally faster.
As late as January 1998, by which time the P-II 300 was already about to relinquish its flagship status to the 333, it offered a mere two percent system performance boost over the 266, or six percent better than the K6-233. But the cost was crazy: twenty-five percent higher than the P-II 266 and more than three times higher than the top K6.
In a strictly rational market that would be a recipe for disaster. But it actually made a great deal of sense. To be sure, even in those less well-educated days, only a very small percentage of people would exchange 200% more dollars for 6% more performance, but those few sales were almost pure profit. And with subsidiary products at more reasonable prices — at this time it was the Pentium MMX that mainly filled that role — it was possible to have both benefits: high volume with the entry-level parts, and a carefully crafted series of steps upward in performance (each one larger than the one before) to extract the maximum possible number of dollars from the market. This was the other great significant point about the Intel flagship chips: simply by having the fastest X86 on the market, the reflected glory was sufficient to allow Intel to charge a premium for the lesser chips as well.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Slot 1 | Intel | Intel | 1997 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
300 MHz | 66 MHz | 32k at 300 MHz | 512k at 150 MHz | 7.5 million |
A vastly better performer than the woeful original. Overnight, the Celeron went from being easily the slowest chip on the market to being a genuine contender. The transformation of the woeful Slugeron into the excellent Celeron-A was the most dramatic we have ever seen.
The difference between the two parts was L2 cache: that aside, they were identical. Indeed, the Celeron 300, the Celeron-A 300, and the Pentium II 300 were all identical apart from cache. The Pentium II had 512k in external chips at 150MHz, the Sluggeron none at all, and the Celeron-A had just 128k but was the first CPU to integrate the L2 cache on the CPU itself. This required no less than 19 million transistors, making it easily the largest CPU made thus far, but allowed the L2 cache to run at full clock speed (300MHz). In caching, bigger is better, but faster is better yet.
The performance effect of different cache arrangements varies according to the workload. In the case of a very small program loop, L2 cache makes no difference as the entire loop can fit within the primary cache. With very large loops, it still makes no difference, as even the biggest cache cannot contain the data. The most common case, however, is that of medium-sized blocks of code or data. Here, performance varies greatly between different caches. Imagine a task that loops its way through a 100k data block. A Celeron-A 300 will out-perform a Pentium II 300 because, once the 100k of data is in its cache, it can access the data twice as fast (at 300MHz instead of the 150MHz of the Pentium II). On the other hand, for a larger task of 200k, the Pentium II's slower but larger cache is better — the Celeron-A's faster cache confers no benefit as it is not big enough to contain the needed data.
The Celeron-A was an outstanding CPU and sold quite well, but never in the quantity that it deserved to. (Outside the USA, which has always been an Intel stronghold.) Three things held it back.
- The reputation of the previous Celeron was hard to overcome.
- Although not expensive, it was no bargain either. In addition to a traditional sizeable Intel margin, the clumsy Slot 1 form made it dearer to manufacture than a standard socket CPU, and Slot 1 motherboards were quite expensive.
- It had the misfortune to be on the market at the same time as the superb AMD K6-2/300. These two great chips stole sales from each other.
The Celeron-A became legendary in the overclocking community. It was almost routine to put one on a BX chipset main board and run the bus at 100MHz for a 50% overclock and 450MHz. A great many Celeron 300As were rock-solid at that speed, and though BX boards were expensive, for a very well-performed 450MHz machine the total cost was more than reasonable.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Slot 1 | Intel | Intel | August 1998 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
300 MHz (more if desired) | 66 MHz or more | 32k at 300 MHz | 128k at 300MHz | 7.5 million (plus 11.5m cache) |
Far and away the outstanding CPU of 1998; a part that dominated the market in a way that only three or four chips ever had. For 1998, this was the hot-selling CPU, and it rocketed AMD into market leadership almost overnight. The K6-2 family chips soon became the best-selling parts on the retail market, and brought a new world of affordable performance with them.
The K6-2 was essentially a standard K6 with a 100MHz bus interface, a significantly faster MMX unit, and 3DNow! graphics extensions. Of these three changes, the bus speed was by far the most important. The 100MHz bus improved communication between CPU and memory — the main bottleneck in those days as now — by a massive 50% over the standard 66MHz bus, and did it for all programs, not just specially written multi-media games. 100MHz bus systems would become near-universal over the next year or two, with only the lack-lustre later-model Celeron and the dated M-II family still running 66 or 75MHz main boards.
Bus speed was particularly significant in Socket 7 systems (which had motherboard-mounted cache RAM). Slot systems either had no cache at all (Celeron) or CPU-mounted cache (Pentium II/III, Athlon, Celeron-A), which meant that bus speed mattered much less to them.
Oddly enough, where it had been AMD that pioneered 40MHz systems and Intel that took the great leap forward to 66MHz, it was Cyrix (in its glory days) which was the first to design for 75MHz and 83MHz bus speeds (with the 6x86-200 and 6x86MX-266). But by '98 Cyrix had lost the plot and begun to leave the pioneering to AMD and Intel.
With the K6-2, AMD also added 3D graphics functions into the CPU instead of in the more traditional place, a 3D graphics accelerator card. 3DNow added 24 SIMD (Single Instruction, Multiple Data) extensions to the X86 instruction set — it was a sort of "super MMX". Like MMX, the 3DNow extensions were mainly of interest to games players and heavy graphics users. 3DNow enabled software rapidly became quite common (notably through Microsoft's Direct-X version 6 and higher) and we started to see 3DNow enhanced parts from Cyrix and Centaur gradually appearing too. Intel did something very similar with SSE a year or so later.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Super 7 | AMD | AMD | April 1998 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
300MHz | 100 MHz | 64k at 300 MHz | *1MB at 100 MHz | 9.3 million |
New product from Cyrix was long overdue by the time these were announced — or, more to the point, not announced!
It seems that Cyrix were so conscious of their failure to produce competitive mid-range product over the summer of 1998/99 that they just snuck the 366 onto their web site without so much as a press release — perhaps in the hope that they could pretend that the part had already been out for ages.
We had always thought that, outside of manufacturing samples, this was a non-part and that none ever shipped. Certainly they never appeared on Australian wholesale price lists and we have never yet seen one in the flesh. But apparently there were a good many of them sold in the USA, where they were regarded as difficult and hot-running.
We are left to assume that the clockings listed below are correct. If they are, the first one would have been as tricky as all the 83MHz bus parts, but the second should have performed very well for an early 1999 part, about equal to a K6-II/350, which was right in the retail sweet spot at that time.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Super 7 | Cyrix | National Semiconductor | April 1999 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
290MHz | 83 MHz | 64k at 290 MHz | *512k at 83 MHz | 6.6 million |
250MHz | 100 MHz | 64k at 250 MHz | *512k at 100 MHz | 6.6 million |
Officially, just the Celeron 333, but you wouldn't want to confuse this excellent performer with the snail-like originals. In fact, we were very surprised that Intel didn't change the name: these fast Celerons with 128k internal cache would have generated a lot more buyer interest if it wasn't for the guilt by association factor. Despite their inherited bad name and their huge and clumsy form, they were a serious performer. We liked them, and sold a slow but steady stream of them to the games fraternity. If it hadn't been for the K6-2/300 and 350, we would have sold them by the bucketful.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Slot 1 | Intel | Intel | August 1998 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
300 MHz | 66 MHz | 32k at 300 MHz | 128k at 300MHz | 7.5 million (plus 11.5m cache) |
Seriously fast in its day, but it never become a popular choice because it required a 95MHz bus and very few motherboards provided this early on. In reality, the K6-2/333 was largely a marketing tool: the K6-2/300 and 350 were the ones that sold in volume. Unlike the Cyrix "333MHz" part (which should really be regarded as a 300), the 333MHz K6-2 was very competitive with a Pentium II or Celeron-A at the same clock speed.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Super 7 | AMD | AMD | April 1998 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
333MHz | 95 MHz | 64k at 333 MHz | *1MB at 95 MHz | 9.3 million |
Another day, another 33 Megahertz. In its heyday, Intel's production engineering team was incredible.
We had major doubts about the Pentium II's huge and clumsy design (it just had to be the child of a committee) and the silly bunny-suit marketing blitz too, but the ability of Intel's production team was simply unmatched. Even IBM lagged well behind. On introduction this one-time fastest X86 chip in the world was a full 100MHz in advance of any competitor's product. (Not counting the Alpha.)
While we have always stressed bus speed as a major contributor to performance, and would rather have gone for a 100MHz bus P II-350 than the 66MHz P II-333, bear in mind that the Slot 1 chips were less sensitive to bus speed than Socket 7 parts. This is because the most speed-critical component of all, the cache RAM, was on the internal bus in Pentium IIs, not on the main system bus.
In practice, the 333 remained fairly rare: in the early part of its life span it was very, very expensive; in mid-life it was overshadowed by the 350, and while it wasn't officially dropped until early in 1999, it was in very short supply for the last six months of its market life.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Slot 1 | Intel | Intel | January 1998 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
333 MHz | 66 MHz | 32k at 333 MHz | 512k at 166 MHz | 7.5 million |
The even-numbered K6-2s were the ones to have (300, 350, 400, 450 and 500). The 366, although announced as available, was really a once-off product to allow some major OEMs to use up mouldy old stock of motherboards and RAM which couldn't cope with 100MHz bus speed. At 66MHz bus it was inferior to the 350, the 333 and probably even the 300. The 380, like the 333 and the 475, seems to have been more a marketing gimmick than a real high-volume product. Its role was to hold a place in line for the 400.
Form | Design | Manufacture | Introduction | Status |
---|---|---|---|---|
Socket 7 or Super 7 | AMD | AMD | April 1998 | Legacy |
Internal clock | External clock | L1 cache | L2 cache | Transistor count |
366MHz | 66 MHz | 64k at 366 MHz | *512k at 66 MHz | 9.3 million |
380 MHz | 95 MHz | 64k at 380 MHz | *1MB at 95 MHz | 9.3 million |