This doesn't seem like a good criterion to judge the system's ability to understand/model. It seems like a reflection of OpenAI's (lack of) ability to inspect/interpret/manipulate the system's state. I highly doubt that there is a long list of "counter-prompts" stuffed into the input (i.e. "but don't say this this this..."). They probably have some side mechanism to detect "dangerous" prompts and short-circuit its responses or a second-layer "metacognition" system that gates access to the full model and presumably can be trained fast/independently. The point is, failures to gate access to the full model do not imply failures of the full model to understand.
>There’s many other examples of nondeterministic arithmetic, naive word associations, and general “WTF moments”
Its failures of arithmetic are well explained by BPE encoding. Judging it by its other failures is also a mistake. Humans make odd mistakes all the time. It's simply that the failure modes for LLMs are different than that of a human and so we don't recognize understanding in LLM failure modes while we do in spite of human failures.
This is exactly the kind of internally inconsistent reply I’d expect from ChatGPT.
You can’t claim that it‘s a duck because it quacks and walks like a duck as long as you ignore all the times it barks like a dog. If the model isn’t accessible how do you know it exists? If it has an actual model how does it break from something as silly as BPE encodings? It’s like a rocket scientist tripping up over 1 + 1. “Don’t talk about subject X” is literally something a five tear old can generalize.
It’s no accident that most of its failure modes remind me of Charlie from It’s Always Sunny in Philadelphia or Ricky from Trailer Park Boys where the writers have dumb characters play with word association for comedic effect. Two turnips in heat.
>You can’t claim that it‘s a duck because it quacks and walks like a duck as long as you ignore all the times it barks like a dog.
People make mistakes, but people understand. Therefore, the existence of mistakes do not unilaterally discount understanding. You have to do actual argumentative work to demonstrate the system that shows occasions of model-based reasoning is not in fact doing model-based reasoning. All else being equal, a single (convincing) example of model-based reasoning is enough to demonstrate the existence claim. You have to provide an actual argument to undermine the value of the example by examples of unrelated failures. All I'm asking is for an actual argument rather than sophistry. For some reasons these discussions never progress beyond this stage.
> People make mistakes, but people understand. Therefore, the existence of mistakes do not unilaterally discount understanding.
I'm not claiming people don't make mistakes or that an algorithm has to be perfect, I'm claiming the algorithms makes the same mistakes that people do when they're clearly don't understand something but try to cover it up with basic word association.
> You have to do actual argumentative work to demonstrate the system that shows occasions of model-based reasoning is not in fact doing model-based reasoning.
The burden of proof is on you. I can prove beyond a shadow of a doubt that a neural network is a gigantic algebraic function. You have to provide convincing evidence that it actually "understands" or has anything resembling what humans would call a "model of the world."
> All else being equal, a single (convincing) example of model-based reasoning is enough to demonstrate the existence claim.
Might as well have a quadrillion monkeys banging on type writers. As long as one of them manages to output Shakespeare's Othello, they must be as smart as humans, right? That's just cherry picking to support your preexisting conclusion.
> You have to provide an actual argument to undermine the value of the example by examples of unrelated failures.
See above. The burden of proof is on you.
For what it's worth, my mind has been blown away over the progress we've seen in the last few months and I'm very bullish on AI. I've been regularly using Replicate and OpenAI for both business and pleasur and my conclusion isn't that computers are becoming "smart" but that Brooks-Moravec-Minksy were right: https://en.wikipedia.org/wiki/Moravec%27s_paradox
>There’s many other examples of nondeterministic arithmetic, naive word associations, and general “WTF moments”
Its failures of arithmetic are well explained by BPE encoding. Judging it by its other failures is also a mistake. Humans make odd mistakes all the time. It's simply that the failure modes for LLMs are different than that of a human and so we don't recognize understanding in LLM failure modes while we do in spite of human failures.