As we don't know for sure what is happening 100% within a neural network, we can say we don't believe that they're thinking and we would still need to define
the word thinking. Once LLM's can self-modify, the word "thinking" will be more accurate than it is today.
And when Hinton says at MIT, "I find it very hard to believe that they don't have semantics when they consult problems like you know how I paint the rooms how I get all the rooms in my house to be painted white in two years time," I believe he's commenting on the ability of LLM's to think on some level.
1. Show GPT-4 a GPT-produced text with the activation level of a specific neuron at the time it was producing that part of the text highlighted. They then ask GPT-4 for an explanation of what the neuron is doing.
GPT produces "words and phrases related to performing actions correctly or properly".
2. Based on the explanation, get GPT to guess how strong the neuron activates on a new text.
"Assuming that the neuron activates on words and phrases related to performing actions correctly or properly. GPT-4 guesses how strongly the neuron responds at each token: '...Boot. When done _correctly_, "Secure...'"
3. Compare those predictions to the actual activations of the neuron on the text to generate a score.
So there is no introspection going on.
They say, "We applied our method to all MLP neurons in GPT-2 XL [out of 1.5B?]. We found over 1,000 neurons with explanations that scored at least 0.8, meaning that according to GPT-4 they account for most of the neuron's top-activating behavior." But they also mention, "However, we found that both GPT-4-based and human contractor explanations still score poorly in absolute terms. When looking at neurons, we also found the typical neuron appeared quite polysemantic."
And when Hinton says at MIT, "I find it very hard to believe that they don't have semantics when they consult problems like you know how I paint the rooms how I get all the rooms in my house to be painted white in two years time," I believe he's commenting on the ability of LLM's to think on some level.