One of the things I find really cool about Ingenuity is how it is in large based...

marcinzm · on Feb 28, 2021

As the name RAD750 indicates, that processor is designed to be radiation hardened which matters for longer missions. I doubt the Snapdragon 801 will survive as long or have as few errors but it also doesn't matter since Ingenuity isn't aimed for long term usage.

Robotbeat · on Feb 28, 2021

It’s less about longevity and more about reducing errors. Interestingly, Ingenuity has some sort of watchdog that can reset the Snapdragon in-flight fast enough to recover if there is an error.

ericbarrett · on Feb 28, 2021

A consumer chip is a lot more likely to be permanently damaged by a "silver bullet" cosmic ray, though. Rad-hard chips don't just have shielding, they can also have redundant circuits and modifications to the foundry process. That said, I'm sure Ingenuity's processor is fit to purpose.

alexvoda · on Feb 28, 2021

It's a bit of a shame that radiation hardened chips are stuck so far in the past.

I believe the volume is so low it does not warrant the investment to make radiation hardened versions more often.

simonh · on Feb 28, 2021

Actually being stuck in the past might be a feature. Denser circuitry is likely to be more vulnerable to interference by ionising radiation and more vulnerable to physical damage from high energy particles.

numpad0 · on March 1, 2021

Parts of hardening include enlarging physical features and increasing operating voltage

Dylan16807 · on March 1, 2021

> A consumer chip is a lot more likely to be permanently damaged by a "silver bullet" cosmic ray, though.

How much more likely? Are the odds all that high over the mission duration?

ericbarrett · on March 1, 2021

I don't have any numbers for you, but here's a starting source: https://hps.org/publicinformation/ate/q11162.html

Most of my knowledge of the subject is from a friend who did his PhD dissertation on it; specifically, triply-redundant adder circuits for single-bit operations, allowing for some rad-hard chip designs using regular foundry processes.

EarthLaunch · on Feb 28, 2021

Reminds me of SpaceX or Erlang; build a process that handles errors rather than an error free process.

NikolaeVarius · on Feb 28, 2021

I mean its space. Any computer expected to be in space and do non completely trivial things have tons of mechanisms in place to survive computer issues.

jjoonathan · on Feb 28, 2021

Reminds me of Apollo 11, where the computer kept resetting on the way down to the lunar surface but still got Neil Armstrong and Buzz Aldrin to the moon with the whole world watching.

kabdib · on Feb 28, 2021

It wasn't resetting, it was ditching lower priority tasks that it didn't have (real) time to accomplish. Notably, the computer was still doing important work (ahem, flying the LM :-) ), and the tasks it didn't have time to do were not critical.

The team had recently seen similar failures in simulation, and was able to quickly decide it was okay to proceed.

jhayward · on Feb 28, 2021

> Notably, the computer was still doing important work (ahem, flying the LM :-) ), and the tasks it didn't have time to do were not critical.

I believe this is incorrect. From an earlier HN discussion:

"The 1202s were also a lot less benign than is often reported. They occurred because of the fixed two-second guidance cycle in the landing software. That is, once every two seconds, a job called the SERVICER would start. SERVICER had many tasks during the landing. In order: navigation, guidance, commanding throttle, commanding attitude, and updating displays. With an excessive load as caused by the CDU, new SERVICERs were starting before old ones could finish. Eventually there would be two many old SERVICERs hanging around, and when the time came to start a new one, there would be no slots for new jobs available. When this happened, the EXECUTIVE (job scheduler) would issue a 1201 or 1202 alarm and cause a soft restart of the computer. Every job and task was flushed, and the computer started up fresh, resuming from its last checkpoint. It was essentially a full-on crash and restart, rather than a graceful cancellation of a few jobs. And unlike is often said, the computer wasn't dropping low-priority things; it was failing to complete the most critical job of the landing, the SERVICER.

Luckily, the load was light enough that of the SERVICER's duties, the old SERVICER was usually in the final display updating code when it got preempted by a new SERVICER. This caused times in the descent when the display stopped updating entirely, but the flight proceeded mostly as usual. However, with slightly more load, it was fully possible that the SERVICER could have been preempted in the attitude control portion of the code, or worse yet, the throttle control portion. Since each SERVICER shared the same memory location as the last one (since there was only ever supposed to be one running at a time), this could lead to violent attitude or throttle excursions, which would have certainly called for an abort. Luckily, this didn't happen -- and the flight controllers didn't abort the mission not because 1202s were always safe, but because they didn't understand just how bad it could be, were the load just a tiny bit higher."

[1] https://news.ycombinator.com/item?id=20791307

inamberclad · on Feb 28, 2021

Yes, there's a flight qualified microsemi FPGA (a ProASIC, I think) that acts as a supervisor for the vehicle.

ddingus · on Feb 28, 2021

Presumably. That is all part of this test.

birktj · on Feb 28, 2021

What I imagine for long lived and high-cost missions is using some sort of co-processor setup with a radiation hardened processor and a faster and more modern processor. These rovers run a lot of computer vision algorithms and I believe more powerful hardware would be quite useful. They may already do something like this, however my understanding is that there is a lot of skepticism in integrating these less fault-tolerant processors. Ingenuity could help remove some of the skepticism and lead to more systems like this in the future.

numpad0 · on March 1, 2021

Do keep in mind that radiation cause permanent damages at predictable paces... I don’t understand why online discussions like this fails to address permanent damages. You can’t reset burnt chips.

GimbalLock · on March 1, 2021

You’re not far off. The Snapdragon is monitored by some more radiation tolerant hardware (FPGA if I recall correctly).

Great talk about it from the FSW Workshop in 2019.

https://m.youtube.com/watch?v=mQu9m4MG5Gc

alexvoda · on Feb 28, 2021

I wonder if it wouldn't have been possible to do this on Perseverance too. After Ingenuitys mission is done they could have lugged it arround for extra processing power for as long as it survives. I doubt the extra weight causes that much extra energy consumption.

The software stack is probably not ready for that since the plan is to abandon it after its mission. And the Zigbee protocol is rather limmited.

m463 · on Feb 28, 2021

One thing that's not clear to me is what the radiation is like on mars. Mars has an atmosphere, but no magnetic field.

Is it like the situation high in earth's atmosphere (like using your ipad on a commercial flight), or would it be more like on the moon with no protection?

madaxe_again · on Feb 28, 2021

Between the two. 30 µSv per hour on the surface of Mars on average compared to 60 µSv on the Moon - compared to 5 µSv on a jet. Not friendly, but not too awful.

dmurray · on Feb 28, 2021

Or 0.5 per hour on Earth at sea level. That's only 2 orders of magnitude difference. Given that we don't expect radiation damage to be a major source of failures in Earthbound consumer or commercial electronics, it's a bit surprising it's such a big deal in space.

azernik · on Feb 28, 2021

The composition of that radiation is important - Earth's atmosphere preferentially filters out a lot of the higher-energy particles that are likely to permanently damage electronics. The sievert as a unit is weighted based on damage to biological systems, not electronics.

contravariant · on Feb 28, 2021

Well 2 orders of magnitude can turn a problem that occurs once a decade into a problem that occurs once a month so it's not too surprising that the problem is a bigger deal in space.

nradov · on Feb 28, 2021

There are actually a lot of bit flips in Earthbound electronics due to lack of parity error checking. But those bit flips don't necessarily cause failures, or when they do it's impossible to determine the root cause.

emkoemko · on Feb 28, 2021

yes but these chips having to function and survive on the way to the planet right? and they do all sorts of software updates etc during this time. Maybe we could shield them and that would be fine? or would that add to much extra weight

darknavi · on Feb 28, 2021

> communicates with Perseverance using the Zigbee protocol

I hope they installed the Home Assistant Core docker container on Perseverance. Gotta get those sweet dashboards.

samfisher83 · on Feb 28, 2021

In general the older nodes work better for radiation and other cosmic forces.

https://en.m.wikipedia.org/wiki/Radiation_hardening

That's why they don't use the latest and greatest in the space.

Robotbeat · on Feb 28, 2021

They work better primarily because they had previously invested in the tooling to rad-harden them and it’s really expensive to do that again one time for each mission; cheaper to rely on already-rad-hardened designs.

MayeulC · on Feb 28, 2021

Not necessarily; physical size matters a lot.

Just an example: consider you were to add some extra electrical charge to the gate of a transistor (from an electron or ion beam, I don't know).

A larger transistor has a higher gate capacitance, it is therefore quite immune against a few extra charges. On smaller transistors, though, it could dramatically increase the voltage, leading to a bit error, or destroying the transistor. Capacitance in this case is proportional to the area.

Higher density also has some inherent drawbacks against particles, since a damaged part is proportionately more damaged if it is smaller.

More ancient processes are also higher-voltage, and higher-current, so they can handle a lot more noise on these signals.

Asraelite · on Feb 28, 2021

This raises the question, why not simply continue to make larger circuits today for this purpose?

Surely we could still make a chip today with the same transistor size as one from 2001, but better in other ways.

alexvoda · on Feb 28, 2021

I believe the reason is that volume is too low. The radiation hardened chips for Curiosity and Perseverance are variants of Power chips made by BAE. I am surprised though that they have not released any newer chip in time for Percy. There is a newer generation, the 5500, but I believe it was not ready for Percy. Percy uses the same chip as Curio.

MayeulC · on March 1, 2021

That's one way of doing it. But making bigger integrated circuits (IC) is hard. And it wouldn't automatically give you increased performance.

In microelectronics, costs are directly proportional to the die area. Actually, they might rise faster due to yield issues.

Making masks [1] is expensive. The bigger the mask, the more expensive. A mask set for a modern CPU can easily be in the millions, I think. And it gets more expensive with size.

Then you have yield. The bigger the chip, the more likely it has some defects (due to dust or other issues during fabrication). Often, processes work more or less well across a wafer: temperature higher in the center, etc. That can affect performance.

Due to yields, bigger chips have to be scrapped more often, and are generally less performant. Binning (selecting the fastest, slowest, more efficient, or chips with specific intact features across a wafer) is less effective. You might have to add redundancy or mechanisms to cut power to damaged areas to avoid short circuits.

Now, that's why we don't generally make bigger integrated circuits. Now, we could make bigger integrated circuits with today's latest clean rooms and equipment to try to raise yields. I don't know if that's being done already, but it would likely raise costs. On the other hand, progress is being made on bigger chips as well [2].

Another more promising direction (IMO) is to use chiplets like AMD does it. You could use more of these for a bigger virtual size.

Now, like I wrote, a lot of the performance improvements actually come down from physically scaling down the transistors: if the gate is smaller, the transistor needs less electrons to charge up. That means faster transistors, and less energy. Also, transistors are closer, so signals reach the next one faster [3].

If you want bigger chips at a previous technological node, you are going to need a huge heatsink, or disable part of the chip ("dark silicon") [4].

The real answer might come from completely different architectures, based on light or spin, or more power-efficient circuit/computer architectures like with adiabatic computing [5] (or non von neumann based, closer to what I do).

Power efficiency is key, since that's the limiting factor for performance nowadays (ask any overclocker: you don't want to melt your CPU. Also, rovers have a small energy budget). With better efficiency, you have room to grow performance again.

Paradoxally, software seems headed in the other direction, generally speaking.

[1] https://en.wikipedia.org/wiki/Photomask

[2] https://news.ycombinator.com/item?id=20739408

[3] https://en.wikipedia.org/wiki/Dennard_scaling

[4] https://en.wikipedia.org/wiki/Dark_silicon

[5] https://en.wikipedia.org/wiki/Adiabatic_circuit

jessriedel · on Feb 28, 2021

Doesn't radiation mostly cause random one-off errors rather than permanent defects? If so, then if the rad-harden stuff is 100x slower (which I think is approximately right?), it is almost certainly better to use error correction on non-rad-hardened hardware.

ejolto · on Feb 28, 2021

In addition to SEUs (single event upsets) which are bit flips, there are also the following Single Event Effects (See) that are destructive:

- Single Event Burnout, SEB

- Single Event Gate Rupture, SEGR

- Single Event Latch-up, SEL these can be recoverable

In addition there are also Total Ionizing Dose (TID) Effects https://radhome.gsfc.nasa.gov/radhome/tid.htm

jessriedel · on March 1, 2021

Sure, no doubt they occur. But at what rate?

londons_explore · on Feb 28, 2021

Many types of bit error are not recoverable without a full system reset. It isn't a matter of a simple "this bit in ram got corrupted", but more "this floating point unit has got into a state where it will not produce a result, and will therefore hang the entire processor".

Therefore boot time becomes critical - if you end up rebooting due to bit errors multiple times per second, you can't afford to wait for Linux to start up each time...

ajuc · on Feb 28, 2021

Run 9 systems in parallel and reset the ones that give less common results or no results at all.

You still have 10% the surface area, power usage and weight and 10 times the speed of the radiation hardened ones.

dawnerd · on Feb 28, 2021

And that’s why it’s wise to have multiple systems running at the same time, if one errors you still hopefully have another online. There’s a reason airplanes and now cars are designed this way. I’m sure they’re working towards this too.

spockz · on Feb 28, 2021

Well I suppose they do not have to load all the kernels and drivers that Linux provides today.

I wonder how one could use micro kernels to further improve startup time and have a mini distributed OS/kernel for each component.

jessriedel · on March 1, 2021

This is a problem with floating point operations happening at a lower level than the error correction you're imagining. In principle, that's not at all necessary. Are you arguing that it's infeasibly expensive to design a chip with operations that are error correctable?

londons_explore · on March 1, 2021

It's possible - but you'll end up reinventing nearly every step of the IC design process, which will cost a lot.

w33dwizrd · on March 1, 2021

These projects also have long development periods, they want to be sure everything works so they can't just sub things out at the last minute. So you end up with a system using 10 year old tech.

Will be interesting when missions are being launched utilising the tech of today, like rad hardened neuromorphic chips https://www.businesswire.com/news/home/20200902005406/en/Bra...

MereInterest · on Feb 28, 2021

Non-mobile link: https://en.wikipedia.org/wiki/Radiation_hardening

JKCalhoun · on Feb 28, 2021

Can't all of the radiation issues be mitigated with proper shielding of the electronics?

lisper · on Feb 28, 2021

Yes, but the problem is that radiation shielding is heavy because it's typically made of lead.

ElevenLathe · on March 1, 2021

https://media.ccc.de/v/36c3-10575-how_to_design_highly_relia... is a really interesting talk by some guys from CERN who have to design chips for use in particle accelerators and other physics experiments where they must survive unusually hostile radiation environments.

Teever · on Feb 28, 2021

It's trickier than it seems because some forms of radiation can induce different kinds of radiation when they hit the dense shielding so you're kind of back to square one in that you still need a radiation resistant processor.

samstave · on Feb 28, 2021

If you were to build the exact same helicopter here based on those parts what would be the cost vs what’s the device on Mars cost? Also, I know plenty of solid engineers who I could build one thing based on that, obviously not taking atmosphere and G-forces into account to get there.

Also Linus finally got his progeny to Mars. That’s a pretty cool accomplishment:

“What have you built?”

Well, I invented one of the most prolific operating systems the world has ever seen, but not just this world - there is a helicopter on Mars that is flying due to the seeds which I planted that day...”

What have you built?

kingo55 · on Feb 28, 2021

It's not just atmosphere and g-forces. The cold temperatures mean the helicopter spends 2/3 of its battery power keeping the batteries and electronics from freezing.

Good rundown of how it was built: https://www.youtube.com/watch?v=GhsZUZmJvaM

mushufasa · on Feb 28, 2021

One of the fascinating things about space silicon is that NASA spends many years hardening specific processors to withstand the types of shocks and electromagnetic interference from space travel. These intensive processes mean that the equipment they can use is always 10-20 years behind the modern equivalents.

marcinzm · on Feb 28, 2021

Aren't smaller transistor sizes also more susceptible to radiation issues which means you can't really use newer processors without ever more effort in radiation hardening?

Robotbeat · on Feb 28, 2021

Smaller transistors also have a smaller cross-section so for the same number of transistors this somewhat cancels out.

eecc · on Feb 28, 2021

But then you have to build for redundancy rather than just performance (say, sacrifice some floor-plan to error correction, recovery, circuit duplication, etc...)

Robotbeat · on Feb 28, 2021

Still a game of probability either way.

emkoemko · on Feb 28, 2021

and yet they are fine with using a snapdragon arm processor on the helicopter?

syoc · on Feb 28, 2021

> Ingenuity runs Linux (for the first time on Mars) and uses the open-source F' software framework on a 2.26 GHz quad-core Snapdragon 801 processor. Radiation hardened processors aren’t fast enough for the real-time vision requirements of the experiment—but as an unprotected COTS processor, it will fail periodically due to radiation-induced bit flips, possibly as much as every few minutes. NASA’s solution is to use a radiation-tolerant FPGA ProASIC3 to keep an eye on the CPU (paper) and software that attempts to double-check operations as much as possible. “[I]f any difference is detected they simply reboot. Ingenuity will start to fall out of the sky, but it can go through a full reboot and come back online in a few hundred milliseconds to continue flying.”

Source: https://orbitalindex.com/archive/2021-02-24-Issue-105/#ingen...

KMag · on March 1, 2021

I would be interested in seeing what they did to get Linux booting and their userspace daemon running in ~300ms or less. Depending on their non-volatile storage read rate, using an uncompressed kernel might not actually save boot time. I'm guessing they aren't running traditional init or systemd.

I've been told LinuxBIOS is able to get you a text console login prompt faster than hdd platters can spin up, but it takes Ubuntu tens of seconds on my SSD laptop to get me a login prompt.

I'm surprised they don't have an FPGA MEMS autopilot with a simple degree 2 or 3 polynomial model of the flight dynamics , with the CPU and Linux only being involved in making adjustments to the autopilot. Or, maybe that's what they're doing, and by "falling out of the sky" what they really mean is the autopilot drifting.

mhh__ · on Feb 28, 2021

The helicopter is both only rated for a certain mission and potentially liable to smash straight into the ground.

On top of that, I'm sure JPL would love to move with the times, which is why the helicopter is using the more modern processor.

fuzzy2 · on Feb 28, 2021

The CPU will probably be destroyed by radiation before long. I'd guess the key factors here were weight, power draw, size and perhaps performance. A radiation-hardened CPU probably didn't fit the bill. It's also super expensive.

lnsru · on Feb 28, 2021

Any individual part is peanuts compared to overall mission cost. Anyway, it’s a great PR stunt for QCOM. It’s not that big secret, that cubesats successfully use automotive grade off shelf parts.

lights0123 · on Feb 28, 2021

> perhaps performance

absolutely performance. Yes to the other three for sure, but the engineers reported that there was no way they were running flight control using image tracking on a 200MHz CPU.

tal8d · on Feb 28, 2021

Not in python, maybe. Smartbombs have been doing the necessary image processing with much less processing power, on much less capable sensors, for a long time.

fuzzy2 · on Feb 28, 2021

Dunno about guided bombs, but cruise missiles have been using some pretty fascinating techniques to navigate before GPS was a thing. It's probably hard to find CPU specs, because they're defense technology.

tal8d · on March 1, 2021

It is surprisingly easy to find, as that stuff is pretty thoroughly covered in academic/industry journals. The only hinderance to access is a credit card number for the paywall.

6pac3rings · on March 1, 2021

The helicopter project is somewhere said to cost $80 million.

Would be interesting to know cost allocation for that Sony/Samsung chip. Project manager and scientists/engineers could have listed design challenges for industry player like TSMC/Apple go beat with an offering, a short run of 20 chips specifically for this helicopter.

NikolaeVarius · on Feb 28, 2021

What. I dont understand how you can make this claim. Guided bombs are NOT using CV with optical cameras. They use lasers, GPS, and other non "fancy" techniques.

I just don't get in what world you think military munitions are using CV for targeting bombs.

tal8d · on March 1, 2021

So I guess you've never heard of the AGM-62 Walleye, or anything else that came out of China Lake. Before you try backpedaling with some silly nonsense about how gating isn't real CV, maybe do a quick search through the journals that cover this stuff: aiaa would be a good start. Another path would be in relation to counter-counter-measures, and ground noise rejection for air-to-ground radar guided munitions. That stuff was deployed regularly all the way back to Vietnam.

cpgxiii · on March 1, 2021

The other poster mentions analog techniques used in contrast-tracking TV-guided munitions like the Walleye, but digital "CV-like" image/contour matching methods were used on the original Tomahawk cruise missile and the Pershing 2 missile to provide terrain-matching navigation and target guidance. GPS was neither sufficiently complete or accurate for strategic weapons in the late 1970s/early 1980s.

In more modern weapons, imaging IR sensors are well-established for terminal guidance on missiles like LRASM, JASSM, or NSM to distinguish targets from clutter and identify specific target features (specific parts of a ship, for example). Of course "traditional" "IR-homing" SAMs and AAMs now use imaging sensors (often with multiple modes like IR+UV) to distinguish between the target and decoys/jammers. Even your basic shoulder-fired anti-tank missile like Javelin requires some amount of CV to identify and track a moving target.

tal8d · on March 1, 2021

> analog techniques used in contrast-tracking

aka edge detection :) I don't remember if it was the Sidewinder or Walleye that eventually dropped in a CCD (or both), but I know that the Maverick (which is technically older than Walleye) got along without a CCD until the GWOT - when it finally upgraded. The Javelin actually beat Maverick in that regard, having a 64x64 sensor 10 years earlier - able to handle scaling and perspective change for the 2-d designated target pattern.

https://apps.dtic.mil/dtic/tr/fulltext/u2/a454087.pdf

raverbashing · on Feb 28, 2021

It seems other parts are more modern. Yes, the main processor is a RAD750, but the peripherals can use modern components and there's some USB and Ethernet here and there (like the cable between the sky crane and the rover)

Reliability is very important and space is harsh. I assume on the surface the radiation levels are low enough for Earth systems to work (maybe playing a bit with voltage/clock frequency helps, not sure how much shielding they can add, probably not too much).

dehrmann · on Feb 28, 2021

It's nice that NASA isn't falling victim to not-invented-here. If a commodity part or protocol can do the job, no sense reimplementing it.

tomerico · on Feb 28, 2021

The problem is that NASA is designed around long term and expensive projects. At one point, a fast iterating company like space-x will surpass their achievements.

mhh__ · on Feb 28, 2021

Why would SpaceX bother doing science, though? If you imagine a project like Voyager, you might think they just dump the data and "go home" (not quite, obviously) but to analyse the data and to know what to look for they had to hire geologists and meteorologists (for example) along with the planetary scientists and co.

NASA should be about long term and expensive projects, SpaceX is just a tool to achieve that goal which is to do new science regardless of whether it is in the air or in space.

newsbinator · on Feb 28, 2021

SpaceX will do any science that’s profitable in the near/mid term

alexvoda · on Feb 28, 2021

Musk will do everything to achieve his dream of colonising Mars.