People are better understood intuitively. We understand how people fail and why....

istjohn · on May 9, 2023

Or we can build trust using black box methods like we do with humans, e.g., extrapolating from past behavior, administering tests, and the like.

SequoiaHope · on May 10, 2023

We can, but the nice thing about neural networks is the ability to do all kinds of computational and mathematical manipulations to them to basically pick them apart and really find out what’s going on. This is important not just for safe deployment but also for research on new methods that could be used to make them better. Plus we need this ability to help avoid neural networks with intentionally hidden features that appear to behave linearly in certain regimes but are designed with a strong nonlinear response when special inputs are applied. You could have all the tests you want for a self driving car based on real world conditions but some bad actor with access to the training system could create a special input that results in dangerous behavior.

int_19h · on May 10, 2023

The more fundamental problem is the sheer size of them, and this is only going to get worse as models grow larger to become more capable. Being able to look at the state of individual neurons during inference is very convenient, but that does not by itself make it possible to really find out what's going on.