One thing that’s always struck me as strange about modern NNs is they tend to be two-dimensional and one-directional, right?
Whereas our brains are three-dimensional and have neural feedback loops. It seems clear to me that those feedback loops are an essential part of “thought”.
Any idea why more architectures don’t have neural loops built into them? Is it because they would cost too much to train? Lack of control?
Good point, and yes, feedback is critical in CNS networks at several levels.
But much of neural network analyses started with neural retina (retina is a fun word that is derived from reticulum = net). And the retina is mainly a feed-forward system or was perceived as such in the 1950s through 1970s.
Even the cortex was and still is crudely modeled using mainly feed-forward connections.
But the more you know about CNS structure and function, the more you see and appreciate feedback. Horizontal cells in retina provide a strong feedback integration/bias signal back to rod photoreceptors. Reciprocal dendro-dendritic synapses are key in many systems. Presynaptic terminals also respond to the same and transmitters that they release and the ion fluxes that they induce.
At the level of connectomes the recursion is deep and hierarchical—see the lovely work by David van Essen and colleagues on the laughably complex connections among the dozens of cortical regions that respond to visual input from thalamus and midbrain (by the way—true, this is even in mouse).
The ultimate feedback is of course generated by our own behavior——whether a blink, a saccade, or our own movements.
Plunk an LLM in a Roomba but add with more input modes and more recursive self-prompting. That is a non-threatening start at what could also become SkyNet.
Our brains are highly concurrent/distributed systems where each neuron exists & acts independently of any specific task you give the network. Inputs (both from external sources/senses as well as other neurons) might come in at different times, and neurons continuously compute their excitation levels (and then, with a bit of delay, might or might not go on to excite their neighbors). Put differently, there is no such thing as an inference "run", where you provide an input to the NN, follow its topology and do all your matrix computations, and then get an output. It's a highly cyclic machine that continuously reads inputs and continuously produces outputs. Heck, it seems that parts of the brain are active even when there's very little input and no visible output (like when you're dreaming). The feedback loops you mentioned certainly play a huge role here – excitations can take on a life of their own.
> Is it because they would cost too much to train?
I think it's both: Even a single run of, say, ChatGPT is already quite expensive in terms of compute and now you want to keep the network "alive" at all times – that would surely cost orders of magnitudes more. But on top of that, by introducing a time axis and continuously feeding inputs to the network and reading its outputs, you also lost that clear input—output relationship that training data exhibits these days. While it's certainly possible to introduce a time dimension to training data, too, that seems like a whole different ballgame to me.
I was just thinking lately that brains rewire themselves all the time. That's how they learn. Neural networks, though, never do. There's a separate training stage that (from my understanding) runs the thing in reverse and modifies the weights to move the output closer to what is desired. After that, the weights are never updated.
Also, yes, information always flows through a neural network in one direction, from inputs to outputs, then leaves it. Whether the inference result was correct or wrong, it doesn't have any effect on the weights.
So I'm thinking that there is a need for some new kind of neural network architecture that can 1) update its own weights all the time, not just when training, and 2) propagate information in reverse direction from outputs to inputs, or have "backwards" connections between layers. Maybe have a set of outputs that tell the system whether and how the NN wants its own weights updated?
You automatically get exponential, recursive and hyperbolic behavior when your predictions are recycled. On paper there is very little middle ground between rote repetition and a spiritual peak experience. This is probably why consciousness emerges.
The dimensionality isn't 2 or 3 dimensional. Value systems may have such dual or triune symmetry. The dimensionality is IIRC said to be nominally 1000. I would wager the average is the same as the number of muscles in the human body.
More than that, our brains have multiple input and output devices that simultaneously interact with the brain. Proprioception, sight, hearing, touch inputs. Movement, cognition, hormone regulation as outputs. There's a biological basis for why walking stimulates thought, the same networks are involved.
It's not the same thing as training the same NxM deep neural network on multiple modalities and then sending one signal or the other into the input end.
I feel this is mostly a good display of the confusion around what the heck is even doing what at this point. You can count me in this camp. And I am terribly interested to know.
Whereas our brains are three-dimensional and have neural feedback loops. It seems clear to me that those feedback loops are an essential part of “thought”.
Any idea why more architectures don’t have neural loops built into them? Is it because they would cost too much to train? Lack of control?