Conditions in the Intel 8087 floating-point chip's microcode

WalterBright · 2026-01-20T01:04:55 1768871095

I've always thought the 8087 was a marvelous bit of engineering. I never understood why it didn't get much respect in the software business.

For example, when Microsoft was making Win64, I caught wind that they were not going to save the x87 state during a context switch, which would have made use of the x87 impractical with Win64. I got upset about that, and contacted Microsoft and convinced them to support it.

But the deprecation of the x87 continued, as Microsoft C did not provide an 80 bit real type.

Back in the late 80's, Zortech C/C++ was the first compiler to fully implement NaN in the C math library.

kstrauser · 2026-01-20T02:20:56 1768875656

I’d agree that the engineering was brilliant (but 68882 gang represent!). Its ISA was so un-x86-like, though, as it was basically an RPN calculator. X86 had devs manipulating registers. X87 had them pushing operands and running ops that implicitly popped them and pushed the result back on the stack.

That’s not better or worse, just different. However, I can imagine devs of the days saying hey, uh, Intel, can we do math the same way we do everything else? (Which TBH is how you’d end up with an opcode for a hardware-accelerated bubble sort or something, because Intel sure does love them some baroque ISAs.)

jamesfinlayson · 2026-01-20T03:35:17 1768880117

> Its ISA was so un-x86-like, though, as it was basically an RPN calculator

Yeah I remember when I first came across floating point stuff when trying to reverse engineer some assembly - I wasn't expecting something stack-based.

WalterBright · 2026-01-20T03:04:16 1768878256

Eh, as far as compiler backends go, the RPN stack was worse.

I thought the X86_64 instruction set was a giant kludge-fest, so I was looking forward to implement the AArch64 code generator. Turns out it is just as kludgy, but at right angles. For example, all the wacky ways of simply loading a constant into a register!

kstrauser · 2026-01-20T20:29:00 1768940940

That's fair. Are there any instruction sets that strike you as pretty?

WalterBright · 2026-01-20T22:37:34 1768948654

The PDP-11 instruction set. It fits neatly onto a single sheet of paper. It's all very regular and orthogonal. It's simplicity is a work of genius.

NetMageSCW · 2026-01-21T00:55:20 1768956920

How do you feel about the 6502?

WalterBright · 2026-01-21T03:15:34 1768965334

I never programmed one, and haven't studied its instruction set.

jdsully · 2026-01-20T04:21:00 1768882860

Excel needed the x87 as well as they cared about maintaining the 80-bit precision in some places to get exactly the same recalc results. So they would have fixed it eventually most likely.

mschaef · 2026-01-20T03:29:06 1768879746

What do you mean by respect? Here's a layperson's perspective, at least.

Up through the 486 (with its built in x87), the x87 was always a niche product. You had to know about it, need it, buy it, and install it. This is over and on top of buying a PC in the first place. So definitionally, it was relegated it to the peripheries of the industry. Most people didn't even know x87 was a possibility. (I remember distinctly a PC World article having to explain why there was an empty socket next to the 8088 socket in the IBM PC.)

However, in the periphery where it mattered, it gained acceptance within a matter of a few years of being available. Lotus 1-2-3, AutoCAD, and many compilers (including yours, IIRC) had support for x87 early on. I would argue that this is one of the better examples of marginal hardware being appropriately supported.

The other argument I'd make is that (thanks to William Kahan), the 8087 was the first real attempt at IEEE-754 support in hardware. Given that IEEE-754 is still the standard, I'd suggest that x87's place in history is secure. While we may not be executing x87 opcodes, our floating point data is still in a format first used in the x87. (Not the 80-bit type, but do we really care? If the 80-bit type was truly important, I'd have thought that in the intervening 45 years, there'd be a material attempt to bring it back. Instead, what we have are a push towards narrower floating point types used in GPGPU, etc.... fp8 and f16, sure... fp80, not so much.)

WalterBright · 2026-01-20T04:17:16 1768882636

> What do you mean by respect?

The disinterest programmers have in using 80 bit arithmetic.

A bit of background - I wrote my one numerical analysis programs when I worked at Boeing. The biggest issue I had was accumulation of rounding errors. More bits would put off the cliff where the results turned into gibberish.

I know there are techniques to minimize this problem. But they aren't simple or obvious. It's easier to go to higher precision. After all, you have the chip in your computer.

adrian_b · 2026-01-20T13:15:44 1768914944

Yes, the argument of Kahan in favor of the 80-bit precision has always been that it will allow ordinary programmers, like the expected users of IBM PC, who do not have the knowledge and experience of a numerical analyst, to write programs that perform floating-point computations without subtle bugs caused by unexpected behavior due to rounding errors.

mschaef · 2026-01-20T12:32:20 1768912340

> The disinterest programmers have in using 80 bit arithmetic.

I don't know, other than to say there's often a tendency in this industry to overlook the better in the name of the standard. 80-bit probably didn't offer enough marginal value to enough people to be worth the investment and complexity. I also wonder how much of an impact there is to the fact that you can't align 80-bit quantities on 64-bit boundaries. Not to mention the fact that memory bandwidth costs are 25% higher when dealing with 64-bit quantities, and floating point work is very often bandwidth constrained. There's more precision in 80-bit, but it's not free, and as you point out, there are techniques for managing the lack of precision.

> A bit of background - I wrote my one numerical analysis programs when I worked at Boeing. The biggest issue I had was accumulation of rounding errors.

This sort of thing shows up in even the most prosaic places, of course:

https://blog.codinghorror.com/if-you-dont-change-the-ui-nobo...

In any event, while we're chatting, thank you for your longstanding work in the field.

adrian_b · 2026-01-20T13:28:01 1768915681

The 80-bit format was included in the IEEE standard since the beginning.

The IEEE standard had included almost all of what Intel 8087 had implemented, the main exception being the projective extension of the real number line. Because of this deviation in the standard, Intel 80387 has also dropped this feature.

Where you are right is that most other implementers of the standard have chosen to not provide this extended precision format, due to the higher cost in die area, power consumption and memory usage, the latter being exacerbated by the alignment issue. The same was true for Intel when defining SSE, SSE2 and later ISA extensions. The main cost issue is the superlinear growth of the multiplier size with precision, a 64-bit multiplier is not a little bigger than a 53-bit multiplier, but much bigger.

Nowadays, the FP arithmetic standard also includes 128-bit floating-point numbers, which are preferable to 80-bit numbers and do not have alignment problems. However, few processors implement this format in hardware, and on the processors where it would need to be implemented in a software library one can obtain a higher performance by using double-double precision numbers, instead of quadruple precision numbers (unless there is a risk of overflow/underflow in intermediate results, when using the range of double-precision exponents).

In general, on the most popular CPUs, e.g. x86-64 based or Aarch64 based, one should use a double-double precision library for all the arithmetic computations where the traditional 80-bit Intel 8087 format would have been appropriate.

WalterBright · 2026-01-20T21:48:08 1768945688

Haha the calculator app misses one critical feature - a history of the numbers you typed in, so you can double check the column of numbers you added.

> thank you for your longstanding work in the field.

I sure appreciate that, especially since I give away all my work for free these days!

Cold_Miserable · 2026-01-20T02:20:51 1768875651

x87 should have been killed off. It would have forced lazy game developers to use SSE around the 2005 era.

WalterBright · 2026-01-20T04:18:05 1768882685

Game floating point precision doesn't matter much - speed does. But if you're doing numerical analysis, it does matter.

dboreham · 2026-01-19T19:17:50 1768850270

Until I read this I did not know that 1970s microprocessors had register renaming. Feel a little cheated, thinking for all those years that they were actually moving the bits.

dapperdrake · 2026-01-19T23:47:48 1768866468

If you work through a math problem with pen and paper or nand2tetris or nandgame.com then it becomes obvious that changing indexes into a register file (a.k.a. pointers) are way faster and easier than wires to move stuff around.

peterfirefly · 2026-01-19T20:47:34 1768855654

How do you think the EXX and EX AF,AF' instructions work on the Z80?

avadodin · 2026-01-19T21:48:10 1768859290

And EX DE, HL

WalterBright · 2026-01-20T00:16:46 1768868206

E to the u, du dx, E to the x, dx!

avadodin · 2026-01-20T08:36:00 1768898160

I must admit I do not know what you are referencing here, sir, but it is always a pleasure to run into your comments on HN.

So much positive compiler-dad energy.

WalterBright · 2026-01-20T22:33:30 1768948410

It's the MIT song! At least it used to be, it was a long time ago.

> always a pleasure to run into your comments on HN.

Wow what a nice compliment! Makes my day!

kens · 2026-01-19T22:59:31 1768863571

If you feel cheated now, wait until you find out that the ALU in the 8-bit Z80 was just 4 bits. :-)

mschaef · 2026-01-20T03:32:11 1768879931

Does this have any similarities at all to the fact that the Pentium 4 used a 16-bit ALU?

kens · 2026-01-19T18:53:22 1768848802

Author here if anyone has questions...

hnthrowaway0315 · 2026-01-19T22:38:36 1768862316

Hi kens, thanks for the knowledge sharing all these years. Can you please confirm this one? From Wikipedia, it says that 8087 uses CORDIC algorithm. Does that mean that it's the same (but different speed) as what I'd implement the functions in software, except in microcode (which has more granularity than usual assembly code)?

I found it a bit surprising that as a 45-year old chip, there is no public information of its microcode. I guess hardware is indeed much more secret than software.

kens · 2026-01-19T22:58:13 1768863493

Yes, the 8087 uses CORDIC. I extracted the constants from the 8087's internal constant ROM and they are arctangent and log values for the CORDIC algorithm. You can implement the same functions in software, which is what floating-point emulation libraries did back then.

There's almost no public information on the 8087 microcode, but I'm working on that :-)

hnthrowaway0315 · 2026-01-19T23:03:31 1768863811

Thanks Ken, appreciate the work!

farseer · 2026-01-19T19:23:49 1768850629

Is there 8087 IP available in verilog etc?

kens · 2026-01-19T19:36:54 1768851414

As far as I know, you can't get the 8087 itself as an IP block. You can get generic IEEE-754 floating-point as an IP block, e.g. from AMD: https://www.amd.com/en/products/adaptive-socs-and-fpgas/inte...

pwdisswordfishy · 2026-01-19T20:16:43 1768853803

IPv6 has seriously gone too far.

dapperdrake · 2026-01-19T23:44:59 1768866299

Thank you for the deep dive.

mschaef · 2026-01-20T03:30:18 1768879818

Thank you. As always.

0xsn3k · 2026-01-19T19:20:10 1768850410

super cool! i wonder how difficult it would be to recreate the entire chip at logic gate level in, say, VHDL or Verilog

kens · 2026-01-19T19:32:11 1768851131

It would be difficult, but not impossible. The main problem is tracing out all the circuitry, which is very time-consuming and error-prone. Trust me on this :-)

The second problem is that converting the circuitry to Verilog is straightforward, but converting it to usable Verilog is considerably more difficult. If you model the circuit at the transistor level in Verilog, you won't be able to do much with the model. You want a higher-level model, which requires converting the transistors into gates, registers, and so forth. Most of this is easy, but some conversions require a lot of thought.

The next issue is that you would probably want to use the Verilog in an FPGA. A lot of the 8087's circuitry isn't a good match for an FPGA. The 8087 uses a lot of dynamic logic and pass transistors. Things happen on both clock edges, so it will take some work to map it onto edge-trigger flip-flops. Moreover, a key part of the 8087 is the 64-bit shifter, built from bidirectional pass transistors, which would need to be redesigned, probably with a bunch of logic gates.

The result is that you'd end up more-or-less reimplementing the 8087 rather than simply translating it to Verilog.

0xsn3k · 2026-01-19T20:03:02 1768852982

ah, i see, thanks for the insight! do you have any advice on how one might get started with IC reverse-engineering? i think it would be interesting to reimplement these chips in a way that's at least inspired by the original design

kens · 2026-01-19T20:13:36 1768853616

How to get started reverse engineering? That's a big topic for a HN comment, but in brief... Either get a metallurgical microscope and start opening up chips, or look at chip photos from a site like Zeptobars. Then start tracing out simple chips and see how transistors are constructed, and then learn how larger circuits are built up. This works well for chips from the 1970s, but due to Moore's Law, it gets exponentially more difficult for newer chips.

I also have a video from Hackaday Supercon on reverse engineering chips: https://www.youtube.com/watch?v=TKi1xX7KKOI

monocasa · 2026-01-19T20:30:52 1768854652

Do you have any good tips on what to look out for when buying a used metallurgical microscope for looking at decapped chips? Even if not a complete set constraints, I'd appreciate some off the cuff thoughts if you have the time.

kens · 2026-01-19T20:58:37 1768856317

I use a basic metallurgical microscope (AmScope ME300TZB). An X-Y stage is very useful for taking photos of chips and stitching them together. A camera is also important; my scope has a 10MP camera. I'm not into optics, so I don't know what lens characteristics to look for.

dapperdrake · 2026-01-19T23:49:38 1768866578

Noob here,

does VH have options for encoding working with both clock edges?

kens · 2026-01-20T00:09:31 1768867771

There's a difference between what Verilog will allow and what is "synthesizable". In other words, there is a lot of stuff that you can express in Verilog, but when you try to turn it into an FPGA bitstream, the software will say, "Sorry, I don't know how to do that." Coming from a software background, this seems bizarre, as if C++ compilers rejected valid programs unless they stuck to easy constructs with obvious assembly implementations.

Using both edges of a clock is something that you can express in Verilog, but can't be directly mapped onto an FPGA, so the synthesis software will reject it. You'd probably want to double the clock rate and use alternating clock pulses instead of alternating edges. See: https://electronics.stackexchange.com/questions/39709/using-...

dlcarrier · 2026-01-21T03:44:17 1768967057

Coming from an electronics design background, I'm even more amazed that Verilog can't gracefully handle multi-phase clocks, let alone two phases of a single clock. That's a big part of getting the most out of your power and timing budget. Also, it seems half the discussion around clocking in FPGAs are around the metainstability of communicating between logic on separate single-phase clocks. If even one clock used two phases, you'd have entirely stateful conditions.

I've found that the FPGAs themselves can handle multi-phase clocks in combinatorial logic. If you want to use the built-in clock routing and latches, I would recommend running the output of the PLL to a LUT input, then outputting that input as well as its inverse from the LUT, routing each to a global clock input. That will keep the phase right at 180°, let you drive directly off global clock fanout, and let you run the clock at the highest frequency that the fabric supports.

derefr · 2026-01-20T01:27:17 1768872437

> Coming from a software background, this seems bizarre, as if C++ compilers rejected valid programs unless they stuck to easy constructs with obvious assembly implementations.

To my understanding, isn’t it more like there being a perfectly good IR instruction coding for a feature, but with no extant ISA codegen targets that recognize that instruction? I.e. you get stuck at the step where you’re lowering the code for a specific FPGA impl.

And, as with compilers, one could get around this by defining a new abstract codegen target implemented only in the form of a software simulator, and adding support for the feature to that. Though it would be mightily unsatisfying to ultimately be constrained to run your FPGA bitstream on a CPU :)

dlcarrier · 2026-01-21T04:11:41 1768968701

The non-synthesizable features of Verilog not only work in current simulators, they were expressly developed for that purpose. Verilog has those features to describe conditions that might exist in a semiconductor as manufactured, but aren't part of any design, so that they can be more accurately simulated. For example, a pin state can be unknown, or two pins can be connected with a delay line. These allow a real-life semiconductor to be characterized well enough to insert into a simulation of the electronics circuit as a whole.

It's more akin to directives than instructions. Debug instructions can also serve a similar purpose, although they actually run on the hardware, whereas compiler directives and non-synthesizable verilog instructions are never executed.