Doesn't radiation mostly cause random one-off errors rather than permanent defec...

ejolto · on Feb 28, 2021

In addition to SEUs (single event upsets) which are bit flips, there are also the following Single Event Effects (See) that are destructive:

- Single Event Burnout, SEB

- Single Event Gate Rupture, SEGR

- Single Event Latch-up, SEL these can be recoverable

In addition there are also Total Ionizing Dose (TID) Effects https://radhome.gsfc.nasa.gov/radhome/tid.htm

jessriedel · on March 1, 2021

Sure, no doubt they occur. But at what rate?

londons_explore · on Feb 28, 2021

Many types of bit error are not recoverable without a full system reset. It isn't a matter of a simple "this bit in ram got corrupted", but more "this floating point unit has got into a state where it will not produce a result, and will therefore hang the entire processor".

Therefore boot time becomes critical - if you end up rebooting due to bit errors multiple times per second, you can't afford to wait for Linux to start up each time...

ajuc · on Feb 28, 2021

Run 9 systems in parallel and reset the ones that give less common results or no results at all.

You still have 10% the surface area, power usage and weight and 10 times the speed of the radiation hardened ones.

dawnerd · on Feb 28, 2021

And that’s why it’s wise to have multiple systems running at the same time, if one errors you still hopefully have another online. There’s a reason airplanes and now cars are designed this way. I’m sure they’re working towards this too.

spockz · on Feb 28, 2021

Well I suppose they do not have to load all the kernels and drivers that Linux provides today.

I wonder how one could use micro kernels to further improve startup time and have a mini distributed OS/kernel for each component.

jessriedel · on March 1, 2021

This is a problem with floating point operations happening at a lower level than the error correction you're imagining. In principle, that's not at all necessary. Are you arguing that it's infeasibly expensive to design a chip with operations that are error correctable?

londons_explore · on March 1, 2021

It's possible - but you'll end up reinventing nearly every step of the IC design process, which will cost a lot.