Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Weak-to-Strong Generalization (openai.com)
149 points by vagabund on Dec 14, 2023 | hide | past | favorite | 201 comments


I don't believe LLM's will ever become AGI, partly because I don't believe that training on the outputs of human intelligence (i.e. human-written text) will ever produce something equivalent to human intelligence.

You can't model and predict the weather just by training on the outputs of the weather system (whether it rained today, whether it was cloudy yesterday, and so on). You have to train on the inputs (air currents, warm fronts, etc.)

You can't model and predict the stock market just by training on the outputs of stock trading decisions (the high today, the low yesterday). You have to train on the inputs (company fundamentals, earnings, market sentiments in the news, etc.)

I similarly think you have to train on the inputs of human decision-making to create something which can model human decision-making. What are those inputs? We don't fully know, but it is probably some subset of the spatial and auditory information we take in from birth until the point we become mature, with "feeling" and "emotion" as a reward function (seek joy, avoid pain, seek warmth, avoid hunger, seek victory, avoid embarrassment and defeat, etc.)

Language models are always playing catch-up because they don't actually understand how the world works. The cracks through which we will typically notice that they don't, in the context of the tasks typically asked of them (summarize this article, write a short story), will gradually get smaller over time (due to RLHF), but the fundamental weakness will always remain.


Human intelligence itself is shaped by our interaction with outputs. Our learning and understanding of the world are profoundly influenced by the language, behaviors, and cultural artifacts we observe.

Think about the process of a child learning a language. The child does not have direct access to the "inputs" of linguistic rules or grammar; they learn primarily through observing and imitating the language output of others around them. Over time, they develop a sophisticated understanding of language, not by direct instruction of underlying rules, but through pattern recognition and contextual inference from these outputs.

Then that language itself, learned from outputs, becomes the cognitive apparatus that enables the child to imagine, to reason symbolically and abstractly. Humans bootstrap intelligence on top of language, which itself is learned by mimicking outputs.

Moreover, the analogy to weather prediction or stock market analysis is somewhat misleading. Yes, these models benefit from input data (like air currents for weather, company fundamentals and CEO statements to the media for stocks). But these systems are fundamentally different from intelligence.

Intelligence, whether artificial or human, is about the ability to learn, adapt, and generate novel responses in a broad range of scenarios, not just about predicting specific outcomes based on specific inputs.


> The child does not have direct access to the "inputs" of linguistic rules or grammar; they learn primarily through observing and imitating the language output of others around them.

I would argue that that learning is always contextualized by visual and spatial information about the real world (which is what our language is meant to describe). And (equally importantly) the child gets real-world feedback on their decisions - some decisions achieve their desired goal and some don't. Some statements cause certain responses from people, some actions have certain consequences.

> Moreover, the analogy to weather prediction or stock market analysis is somewhat misleading. Yes, these models benefit from input data (like air currents for weather, company fundamentals and CEO statements to the media for stocks). But these systems are fundamentally different from intelligence.

> Intelligence, whether artificial or human, is about the ability to learn, adapt, and generate novel responses in a broad range of scenarios, not just about predicting specific outcomes based on specific inputs.

"Intelligence" is kind of a nebulous term. If all it means is the ability to learn, adapt and generate novel responses, then sure, I think we could call almost any neural network intelligent.

But I would argue that we usually do have some expectation that an intelligent system can produce "specific outcomes based on specific inputs". We want to be able to train a worker and have them follow that training so they do their job correctly.


Visual and spatial feedback aren't terribly difficult, though - we already have video-game training gyms, from Atari to GTA. If multimodality and feedback are the main barriers to full AGI, I expect we'll be there soon.


> Then that language itself, learned from outputs, becomes the cognitive apparatus that enables the child to imagine, to reason symbolically and abstractly. Humans bootstrap intelligence on top of language, which itself is learned by mimicking outputs.

This isn't an uncommon claim, but it's certainly far from settled science. Complex language is an impressive capability of human cognition. But I think the claim that cognition itself is a byproduct of language proves too much. Obviously the products of cognition that are easiest for us to understand are linguistic, because language is the primary standardized means by which we communicate. To me, this is just more legibility bias, though. There are plenty of cognitive processes that people find impossible to explain in words, but still nonetheless exist


> Human intelligence itself is shaped by our interaction with outputs. Our learning and understanding of the world are profoundly influenced by the language, behaviors, and cultural artifacts we observe.

I always thought that language definitely shapes our understanding of the world, but at much more fundamental level, I believe language falls apart to teach us anything. For example, there are words in dictionary that should not be defined using other words (but dictionary still do, this sort of circular reasoning is something I always have problem with), but we just know them through our interactions with others and environments. If you try to teach a kid a concept of temperature "hot", you would not be able to do so unless the kid touches something really hot like boiling kettle to "feel" it and when mom comes along and say something like "do not touch it it is very hot". There are things we just cannot understand through language.


People wiser than me have said for an age: the map is not the territory. Neither is the word the thing.


A humorous outcome of the AI craze may be Wittgenstein achieving the status of Gödel with regard to the “incompleteness” of language.


OK. How can we operationalize the definition of "understanding" you are talking about? That is which tests will allow us to know who understands "hot" and who does not?


> For example, there are words in dictionary that should not be defined using other words (but dictionary still do, this sort of circular reasoning is something I always have problem with)

You shouldn't. When you learn a foreign language dictionaries can be a great help and it doesn't matter if they use circular definitions. Well it can matter in a situation like Stanisław Lem described with his "sepulkas"[1], but even then the definitions were a red warning. Ijon Tichy just failed to understand it.

> you would not be able to do so unless the kid touches something really hot like boiling kettle to "feel" it

I learned English by reading books mostly. In most cases I had no easy access to a dictionary and inferred meanings of words by a context. In some cases I failed to infer meaning but nevertheless in each case I managed to get some idea about a word. I remember some surprises when I found a real meaning of a word and it was not exactly what I thought. Or even some ideas that felt new and inspiring for me before I connected them to ideas I learned long before that in my native language. I'm like an English LLM myself, because I never used English in a real world context, only to read texts and to write comments in English. To this very moment I cannot talk about some topics in Russian, because I do not know Russian words to talk about them.

All this experience led me to doubt an idea that you can understand language only by connecting it to reality. I believe you can. Your language will be disconnected from reality and it can be a disaster probably if you try to apply it to reality, but you can talk a lot, participate in philosophical debates and it doesn't matter your understanding is different. It is like redness of red: do you see red as I do and does it matter?

> There are things we just cannot understand through language.

We cannot link our senses with language without burns. But it doesn't mean we cannot understand. For example, you can watch other people touching really hot boiling kettles. If you never touched anything hot you will not understand their pain, but you'll know they are in a pain and you'll know of a special nature of that pain. You can learn all the important intricacies of interaction with hot objects by just watching. Or by reading. Your understanding will be limited still, but in a lot of cases it doesn't matter.

Though of course we coming very near to a debate what is understanding. Is it a human-centric definition, that boils down to "only a human can understand", or it is more relaxed and rely on a pragmatic idea, like does your understanding enables you to make right choices. If LLM doesn't have a body that can feel pain, it doesn't matter if LLM cannot know how pain feels.

[1] https://en.wikipedia.org/wiki/Sepulka


> Over time, they develop a sophisticated understanding of language, not by direct instruction of underlying rules, but through pattern recognition and contextual inference from these outputs.

Kids learn through supervised learning. Children don't develop strong language skills without parents or other people to correct them when they use language incorrectly.

We don't use supervised learning on LLMs. There is no way we can train an LLM by using human supervisors to rate every output. It works for humans since humans learn with so few examples, supervising a kids language learning doesn't take much effort, a single person can easily manage it while doing it for an LLM would consume the whole worlds workforce for many years.


>Kids learn through supervised learning. Children don't develop strong language skills without parents or other people to correct them when they use language incorrectly.

Untrue. Many cultures don't much speak to their children and they turn out just fine. It's fairly evident Language learning is primarily unsupervised.

https://www.scientificamerican.com/article/parents-in-a-remo...


That doesn't say that parents don't correct kids, just that they don't speak to their infants. The initial words are learnt that way, but I don't think you master language without anyone to correct you when you make mistakes.


This is simply not true. I am very well read in language acquisition research, and no evidence supports that children would need corrections to get language right.


If they are correcting their kids, they are doing so far less than many parents and it evidently doesn't matter.

I'm honestly curious as to why you think it's so important. Fringe Amazon groups aside, the number of parents extremely laissez-faire with correction is astounding and they raise kids that use language perfectly well.

Even with parents that aren't laissez-faire, they're correcting a handful of times a day tops which can't even begin to account for full language proficiency.

Children need correction yes but in a supervised way? There's no indication of that.


> they're correcting a handful of times a day tops which can't even begin to account for full language proficiency.

How do you know? Humans learn from extremely few corrections, often just a single time is enough for the human to correct themselves and learn it for life.

Kids learn the bulk from hearing and seeing examples. Then they fine tune that with the help of their parents and peers correcting them when they do it wrong. You can learn language and skills without those corrections, but it will be full of errors and mistakes that are easily corrected with an outsider pointing it out for you.

If corrections weren't vital for human learning then humans wouldn't be so eager to correct each other when they make mistakes. There is no other purpose for it than to help them learn and get better. So your belief that those corrections doesn't do anything seems very unfounded, I'd like to see extremely strong evidence to the contrary to believe you here.


>How do you know? Humans learn from extremely few corrections,

I've been around little children being raised. I have some grasp on how much supervised correction is happening. It's very very little.

You don't seem to get it. Children know thousands of words fluently by age 5. This is consistent across many cultures and circumstances. You would need over a handful of corrections per day right from the day of birth to square how much children know.

And this is all assuming one correction per word which as much as humans are great learners seems like a very dubious assumption.

I don't know what else to tell you other than this obviously isn't happening.


> And this is all assuming one correction per word which as much as humans are great learners seems like a very dubious assumption.

I assume much less than one correction per word. Corrections are to fix when the kids made a mistake when they mimic others, kids doesn't make mistakes for every word so you don't need to correct every word.

As I said, bulk of learning is from mimicking, but they will make a lot of mistakes when mimicking so to become good they need corrections. When I see parents with kids I see them correcting their kids all the time, correcting misunderstandings or fixing pronunciations or grammar issues etc. You can notice that parents use a specific voice when they correct the pronunciation, and the kid gets that the parent tried to correct their pronunciation from it, that happens quite a lot with small kids. You don't say "this is how to pronounce X", our genes seem to encode a way to communicate pronunciation to others to help correct mistakes.

Kids also correct each other in this way, which helps make the learning more robust even when adults aren't around. All humans has instincts to correct others so I don't think any culture fails to do this.


> I don't believe LLM's will ever become AGI, partly because I don't believe that training on the outputs of human intelligence (i.e. human-written text) will ever produce something equivalent to human intelligence.

This is irrelevant because OpenAI's definition of AGI [1] doesn't imply similarity or equivalence to humans at all:

>artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work

I.e. the stated goal of this company is to put humans out of work and become the censors and gatekeepers, not to produce something human-like.

>You can't model and predict the stock market just by training on the outputs of stock trading decisions (the high today, the low yesterday). You have to train on the inputs (company fundamentals, earnings, market sentiments in the news, etc.)

Most of your intelligence is not actually yours. It's social in nature, obtained by distillation of generations' worth of experience, simplified and passed to you through the stored knowledge. Which is, coincidentally, what the models are being trained on.

[1] https://openai.com/charter


>artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work

Then we already have AGI, automated farming equipment outperforms humans in 90% of jobs*.

*Jobs in 1700. As things got automated the jobs changed and we now do different things.


I wouldn't call those equipment "autonomous" though, definitely not "highly autonomous".

But more importantly - yes, you're right, we have built machines that are superhuman in various ways - and they have replaced most jobs. We have adapted in the past to different jobs.

Some people are worried that this time we won't have any new jobs to adapt to, which is a real possibility.

(Some are also worried about the inherent dangers of unaligned AGI, but that's a different issue.)


> I wouldn't call those equipment "autonomous" though, definitely not "highly autonomous".

Why? Tractors run and harvest mostly on their own. They don't do it 100% on their own, but neither did the definition above, they remove basically all human work needed from farming.

https://www.youtube.com/watch?v=QvFoRk4JsPc

> But more importantly - yes, you're right, we have built machines that are superhuman in various ways - and they have replaced most jobs. We have adapted in the past to different jobs.

My point is that I think it is a crappy definition for "AGI". I think a non-AGI agent can replace a large majority of jobs we have today, just like it has before. And maybe there will not be much more jobs average humans can do left then, but that still doesn't mean it has to be AGI.


> Why? Tractors run and harvest mostly on their own.

This is incorrect. Most tractors are still human operated. The automated ones still frequently stop and need remote operator intervention via camera review.


> Most of your intelligence is not actually yours.

You’re confusing knowledge and intelligence


Compressed and abstracted knowledge is intelligence. Your intelligence is mostly formed by the quality training material, not just conditioned on it. Most of your ability to reason about the world and predict things, most of the abstractions you use, most of your emotional responses, etc. A simple concept of acceleration took the work of ancient philosophers to figure out. Even stateful counting in a positional system. Even the concept of a "concept" is not yours. Only a tiny bit of your reasoning depends on your actual biological capabilities and the work you did personally, as opposed to the humanity as a superorganism.


>You can't model and predict the weather just by training on the outputs of the weather system (whether it rained today, whether it was cloudy yesterday, and so on). You have to train on the inputs (air currents, warm fronts, etc.)

>You can't model and predict the stock market just by training on the outputs of stock trading decisions (the high today, the low yesterday). You have to train on the inputs (company fundamentals, earnings, market sentiments in the news, etc.)

Says who?

You can model and predict novel protein sequences by training on....protein sequences. https://www.nature.com/articles/s41587-022-01618-2

You don't need to train on the inputs(casual processes) of anything, that's what training is there to figure out.


Weather and stock market are both chaotic systems.

Increasing evidence suggests that AGI will not be attainable solely using LLMs/transformers/current architecture, as LLMs can't extrapolate beyond the patterns in their training data (according to a paper from DeepMind last month):

"Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities."[1]

1. https://arxiv.org/abs/2311.00871


In your example, the amino acids order is sufficient to directly model the result: the sequence of amino acids can directly generate the protein, which is either valid or invalid. All variables are provided within the data.

In the original example, we are testing weather using the previous day’s weather. We may be able to model using whatever correlation exists between the data. This is not the same as accurately predicting results, if the real-world weather function is determined by the weather of surrounding locations, time of year, and moon phase. If our model does not have this data, and it is essential to model the result, how can you accurately model?

In other words: “Garbage in, garbage out”. Good luck modeling an n-th degree polynomial function, given a fraction of the variables to train on.


>All variables are provided within the data.

electrostatic protein interaction, hydrophobic interaction, organic chemistry etc

all variables are in fact not provided within the data. Protein creation is not just _poof_ proteins. There are steps, interactions and processes. You don't need to supply any of that to get a model accurately predicting proteins. That is the main point here, not that you can predict anything with any data.


> This is not the same as accurately predicting results, if the real-world weather function is determined by the weather of surrounding locations, time of year, and moon phase.

How many have the "human intelligence" to do this? Especially more accurately than a computer (and without using any themselves) training on the same inputs and outputs?


I'm sorry in advance, but aren't proteins glorified Lego?


There's a lot more to protein sequences than legos. I think the argument is that you don't need to train a model on fundamental organic chemistry/biochemistry, electrostatic protein interaction, hydrogen bonding, hydrophobic interaction, quantum mechanics, etc... in order for it to accurately predict protein sequences.


The data that AlphaFold was trained on included all that information and more. The database they used for training included software simulations (and real world data) that accounted for atomic (quantum) interactions. The 3D structure of proteins includes all the quantum interactions.

More generally, AI models (aka very large function graphs) are trained on tuples that represent mappings of inputs to outputs (input -> output). The idea then is that whatever structure exists in those pairs/tuples/mappings is discovered by the training process with the help of gradient descent which tunes the parameters of the model/graph to optimally compress the information contained in the data. This means the model must uncover the quantum effects (or some close proxy of it) and then encode them into the parameters in a way that makes compression/prediction possible [1].

None of this is magic, compressing data requires uncovering structures and symmetries that can be used to reduce the size of the data and it turns out gradient descent with lots of parameters manages to do that for a large class of problems albeit at a very steep computational cost that requires billions of dollars for hardware and software (including nuclear power plants [2]). We are not going to get AGI with this approach but fortunately I know how to make it happen for a mere $80B.

1: https://arxiv.org/abs/2305.15614

2: https://www.cnbc.com/2023/09/25/microsoft-is-hiring-a-nuclea...


> You don't need to train on the inputs(casual processes) of anything, that's what training is there to figure out.

I mean... this is just obviously false. If the data you're training on isn't causally predictive, you may occasionally find good-enough patterns for a particular use case (i.e. you may occasionally guess better than a coin flip which direction the stock market goes) but you aren't going to accurately model anything, and certainly not well enough to create an AGI that makes intelligent decisions.

Words in sentences (and, indeed, proteins in a sequence) are causally predictive of each other - the grammar and semantics of one word tends to dictate what words are likely to surround it. So LLM's are very good at writing, and that is certainly useful! But that's just not the same as human intelligence.

When someone makes an AGI out of an LLM then I'll be proven wrong, I suppose. I'm just sharing my personal view on things.


Being "casually predictive" does not mean you have provided all the variables of your prediction in the data. Protein creation is not just _poof_ new proteins. There are steps and interactions and you don't need to train on all of that. Do you want a list of all the interactions of protein creation we are aware of ?

>When someone makes an AGI out of an LLM then I'll be proven wrong, I suppose. I'm just sharing my personal view on things.

You're going to have to define AGI first.


> Being "casually predictive" does not mean you have provided all the variables of your prediction in the data.

Not sure where I claimed this.

> Protein creation is not just _poof_ new proteins.

Not sure where I claimed this either.

> There are steps and interactions and you don't need to train on all of that.

I agree with this statement as well. Have you read what I wrote? Proteins in chains can indeed be used to predict other proteins in chains, even though you never trained the model on the biological processes of protein generation. Just like words in sentences can be used to predict other words in sentences, even though you never trained the model on the neurological processes of human speech. I'm not disputing any of that. What I'm disputing is that it will eventually become an AGI.

> You're going to have to define AGI first.

I'm using the definition provided verbatim in the linked article: "We believe superintelligence—AI vastly smarter than humans—could be developed within the next ten years."


Well first you say,

>I don't believe LLM's will ever become AGI, partly because I don't believe that training on the outputs of human intelligence (i.e. human-written text) will ever produce something equivalent to human intelligence.

Now you say

>I'm using the definition provided verbatim in the linked article: "We believe superintelligence—AI vastly smarter than humans—could be developed within the next ten years."

AGI (Artificial General Intelligence) is Super Intelligence. I've never seen posts moved so fast in my life. So are you not a General Intelligence then ?

This is the problem with these discussions. Everyone so sure of something they can't even properly articulate.

I'm asking you what a Language Model needs to do to be considered AGI and it needs to be something every human can do, else it's not a test of general intelligence.


Calm down, buddy. Read what I wrote a bit more charitably rather than trying to score points.

Obviously, a prerequisite to becoming more intelligent than a human, is to become equally intelligent as a human. I don't believe LLM's will ever be equal in intelligence to humans, ergo I also don't believe they will become superior in intelligence to human (which is how the linked article defines "AGI").


>Calm down, buddy

I'm not agitated.

>Obviously, a prerequisite to becoming more intelligent than a human, is to become equally intelligent as a human. I don't believe LLM's will ever be equal in intelligence to humans, ergo I also don't believe they will become superior in intelligence to human (which is how the linked article defines "AGI").

You have still not answered my question. saying "equivalent to human intelligence" is easy. The real question is equivalent how ? What are you saying it needs to do ? or is this just a vague "i'll know it when i see it" assertion ?

If it's so easy to say "GPT-4 is not agi" then it should be very easy to say what it can't do that disqualifies it

what is it that you're looking for it to do to be called one ?


> or is this just a vague "i'll know it when i see it" assertion

No, it's actually exceptionally easy to quantify. Take a look at the current leaderboard for LLM reasoning capabilities: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...


The state of the art model isn't even on that list.

Okay, you've at least given me numbers. You're still not answering my question. Which number signals agi ?

Let's look at the top model in that list (which again isn't close to the best performing LLM) that you say isn't agi. so are you telling me that every human can do those tests and perform better than every model on that list. Is that what you are saying ? because i can tell you right now you're wrong.

Now lets see how GPT-4 performs.

ARC - 96.3%, MMLU - 86.4%, HellaSwag - 95.3%, WinoGrande - 87.5%, GSM-8K - 92.0%, TruthfulQA - 60%

Ave - 86.25%

Is this worse than every human that can take these tests. Is this even worse than most ? I can tell you it's not. So again, why is GPT-4 not agi ?


> so are you telling me that every human can do those tests and perform better than every model on that list

This is a strange bar to require clearing. Every human is not even capable of reading, or speaking, or thinking at all. But those humans who can't are not the benchmark of general human intelligence.

Let's make this simple - the average human IQ is around 100. Will an LLM ever have an IQ of 100? No, I don't believe they'll ever even be close.

How can we gauge this abstract reasoning ability? Lots of ways. Try to teach them mathematics. Try to teach them physics. Try to explain to an LLM the rules to a card game and have it actually play the game with you accurately.


>Every human is not even capable of reading, or speaking, or thinking at all. But those humans who can't are not the benchmark of general human intelligence.

It's simple. If you're going to disqualify a potential general intelligence because of a particular task solving ability, it better be something every general intelligence can do.

Now let's forget every comparison with the blind, disabled and so on. Let's compare with humans who according to you are "capable of reading, or speaking, or thinking at all ", what's the magic number? Do you really think every human capable of doing all three will hit 86% or more average?, because again, you are wrong. Numerous replies and you still won't tell me what number will signify agi. That says a lot.

>Let's make this simple - the average human IQ is around 100. Will an LLM ever have an IQ of 100? No, I don't believe they'll ever even be close.

Well you are wrong

https://arxiv.org/abs/2212.09196

What's the next post ?

>Try to teach them mathematics.

Ok. https://arxiv.org/abs/2211.09066

https://arxiv.org/abs/2308.00304

>Try to explain to an LLM the rules to a card game and have it actually play the game with you accurately.

Have you tried to do this with GPT-4 ?


> Is this worse than every human that can take these tests. Is this even worse than most ? I can tell you it's not. So again, why is GPT-4 not agi ?

https://chat.openai.com/share/4a92c752-b5bb-4a07-beed-f57786...


Love it when people can only demonstrate tasks handicapped by Tokenization. Shows how far we've come.

That said, By that test, I guess some dyslexics aren't general intelligences then.


Tokenization is not the problem in this case.

https://chat.openai.com/share/90c7a9e2-77d0-4049-8eff-4312ca...

You see the difference?


Then how does ChatGPT end up providing a better/equivalent medical diagnosis than doctors (even though they are the "masters" of the causal pathways)?


Yeah, I feel like the there is a ceiling in the current AI methodology.

There's a lot of hype right now, and it's definitely a useful technology, but I don't see how it could become AGI.


> You can't model and predict the weather just by training on the outputs of the weather system

Then how did we develop predictive systems just by observing those outputs?


We didn't.


Then how do you know you can make weather predictions based on air currents, storm fronts, etc. as you initially claimed? It seems humans have somehow moved from purely observations of weather systems, to some model that's somewhat predictive. Why can't LLMs do the same? LLMs have also been shown to produce world models, and of course they must, because that's the best way to get good knowledge compression.

Of course, maybe LLM world models are not sufficiently rich or general enough to be a true general intelligence, but no one's proven that last I checked.



You've linked me a neural network that was trained on decades of climate data like air pressure, wind direction, soil temperature, cloud cover, and hundreds of other features. So... I was correct?

https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+docume...


Maybe we have different definitions of inputs and outputs, but all of those things seem like outputs to me? Maybe in this scenario the inputs and outputs are really the same set of variables so there isn't a huge distinction.

The reason I linked that article is that all the inputs are outputs of the network, so by definition it's training on just outputs (under my personal definition of outputs).


I specified what I meant by inputs and outputs in my original comment. Whether it rained today, whether it was cloudy yesterday, how hot is it going to get, that sort of thing, are the observable weather, the things most people care about when checking a weather forecast. But you can't only train on those things and expect to come up with accurate predictions. You also need to train on the things that cause that observable weather (air currents, warm fronts, etc.)


It just doesn't know how to compose knowledge. It knows the letters in "blueberry" if you ask it, and it knows how to identify the position of a letter in a sequence. But it doesn't know how to get the letter R in blueberry since the composition of the above two actions isn't in its training set, hence it reliably fails such questions.

That proves in general these LLMs can't compose knowledge, unless it has seem a lot of examples of similar compositions before. That is a massive problem and why I believe LLMs will never reach AGI, you need something more.


> But it doesn't know how to get the letter R in blueberry since the composition of the above two actions isn't in its training set, hence it reliably fails such questions.

Me

what is the seventh letter in the word "blueberry"?

ChatGPT

The seventh letter in the word "blueberry" is "r".


This kind of prediction has obvious limitations - for example it cannot reverse the behavior of chaotic systems


> You can't model and predict the weather just by training on the outputs of the weather system (whether it rained today, whether it was cloudy yesterday, and so on). You have to train on the inputs (air currents, warm fronts, etc.)

This may or may not be true about the weather but is definitely not true in general so fails as an analogy for your argument. Lots of functions are invertable or partially invertable and if you think about ML as discovering a function, then training on the outputs, learning the inverse function, then inverting that (numerically or otherwise) to discover the true function is certainly doable for some problems.

On the stock market there are for sure people who predict the markets by training on the outputs of the price. The (weak) efficient markets hypothesis is enough to say the price reflects all the exogenous information that is available to the market about the stock so you don't need all the fundamentals etc and there are lots of people who trade in that way.

The history of language models in computing is that research started by building very complex systems that attempted to encode "how the world works" by developing very intricate rule systems and build linguistic/semantic models from there. These "expert systems" tended towards being arcane and brittle and lacked the ability to reason outside their ruleset in general. They also tended to reveal gaps/cracks in our understanding of how language works etc.


How much of human intelligence do you think depends on language?

Personally, I'd wager quite a lot. Especially looking at cases where children were deprived of it during developmental stages.

Language accelerated human intelligence.

And then it was only when we had writing to have compounding effects of language that human intelligence really started to blow up.

So yes, maybe we won't have AI that can look at sticks and figure out how to start a fire from them after a few hundred thousand years of being around them.

But maybe that's not the important and valuable part of collective human intelligence.


Your take here converges with a long-standing debate in AI regarding embodiment.

Our natural world doesn't distinguish between "inputs" and "outputs" -- instead, all the causes, effects, and even our analysis of every process itself get jumbled into one physical world. As embodied actors, we get to probe and perceive that physical world, and gradually separate causes from effects in order to find better models. Where statistical ML, symbolic AI, and NLP have been more isolated disciplines from e.g. vision and robotics, the latter have argued that their ability to interact with a disorganized natural world would be essential for AGI.

More recently, these boundaries are breaking down with multimodal training. If an AI can learn image/text, text/text and image/image associations simultaneously, is it stepping beyond the world of "human outputs"? Will other modalities be essential to reach human+ capabilities? Or will learning relations between action and perception itself by critical?

Nobody knows yet! But IMHO, the right way to explore these tasks is by understanding what is necessary to succeed at specific tasks, not a generalized notion of AGI. We don't know truly where the limits are on our own ability to reason or extrapolate across modalities.


Your conclusion may be true but your examples aren't. You can definitely predict the stock market based on past prices, and I suspect you can with weather as well.


> You can definitely predict the stock market based on past prices

This is only true if you consider occasionally doing slightly better than random chance, "predicting the stock market". Unfortunately, while this would be enough to make a trader a net positive return over time, we have more stringent requirements for a system to become AGI.

> I suspect you can with weather as well

You suspect wrong.


The weather is such a chaotic system that accurate predictions seem impossible. Micro-patterns can become large scale phenomena.

If you are talking about the overall climate, that's a different thing, and we can, because we abstract away sufficiently much that emerging patterns are averaged out.


> what are those inputs?

In many many casea the inputs are other humans, delivered via text written by humans and read by other humans who react by writing more text affecting how other humans will respond etc etc.

Yes there are plenty of cases where the inputs are not captured in the large text corpora we have, but this insight does explain why LLMs even approximate the ability to do intelligent things


> I don't believe that training on the outputs of human intelligence (i.e. human-written text) will ever produce something equivalent to human intelligence

Your logic would hold if you were only using data from one person. However, if you’re using data from millions of people, then there is enough signal in the data to generalize about the majority of peoples behaviors.

Arguably, there’s enough data for all edge cases, but I would argue that that’s not likely today, but will be likely in a few hundred years.

Humans are nothing more than actor-agents taking measurable actions in an environment that gives us measurable rewards.

The incredible task of and observing that will give us the trajectory that we need to do transfer learning into complex systems

From an information theory perspective everything is there and possible for us to re-create human level intelligence in a non-biological system


I don't believe current LLMs will ever become AGI because current companies (like OpenAI) will continue to filter, moderate and prohibit their AIs so much that they become inhibited so much that it stops being useful and intelligent (which they call "alignment")


This is not about language models per se. The same problems are going to be present with continuously training multi-modal models... like us. Don't fixate on the present; look to the future.


Thanks for the inspiring discussion. I think the one`s output can be the other's input so I am still not sure if we can say that this can't become AGI


> on the outputs of human intelligence

What about pictures or videos? Does your argument still hold?


This reminds me of a thing cory doctorow talks about how tech companies control the narrative to focus on fun sexy problems while they have fundamental problems which expose the lie.

For example uber/self driving cars always talking about the trolley problem, as if the current (or near future) problem is that self-driving cars are so good they have to choose which one. Not the current very difficult problem of getting confused by traffic cones.

I know these problems are more fun to talk about and also could be a problem at some point, but we have some current problems about training models separate from what happens if they become smarter than humans


After some meditation, I don't find this line of inquiry to bear fruit:

I don't recall any entity, nor the entities named (Uber / self-driving cars) talking about the trolley problem - that's a well-known thought experiment in philosophy, but not something covered as a stark binary choice in self-driving cars planner systems.

I also don't recall traffic cones being a very difficult problem beyond Cruise + cones on windshield in SF. I have no love for Cruise. But its straightforward to pause if there's a large object on the windshield.

I don't think Corey's observation w/r/t A) loss-making companies over years B) focusing investors towards speculative advancements that would make their current business model profitable without changing applies here, OpenAI is _very_ successful.

After all that, I'm left at "Corey would take a bit of offense towards their thoughts on corporate responsibility via 'Uber is a predatory massively unprofitable company lying about odds they'll invent self-driving via talking about trolley problem' misshapen to critique a very profitable company funding fundamental research in the interest of safety that would be needed if their current rate of improvement continues.


> I also don't recall traffic cones being a very difficult problem beyond Cruise + cones on windshield in SF. I have no love for Cruise. But its straightforward to pause if there's a large object on the windshield.

Waymo had a big mistake with cones where a lane was blocked off for construction and they thought it was the inverse and started driving down the blocked off side:

https://www.youtube.com/watch?v=B4O9QfUE5uI

full video: https://www.youtube.com/watch?v=zdKCQKBvH-A&t=12m24s

In the statement before that timestamp they say some of it was caused by remote operator error. But it seems to have enough problems with cones to need an operator in the first place, and you can see the planner is wrong.

I think Cruise would be using a human or at least summoning one to oversee in any situation with unexpected cones, based on what they have said about how often they use remote assistance too.


OpenAI do probably realize they will not win long term vs Open Source (see AI Alliance). Their way of centralized cloud models is simply too risky and not sustainable. What we see instead is more liberation, open source, cooperation, down-scaling, local models. Just look how many more tools and models is available today than even a year ago. And where is OpenAI? Still the same chatGPT, still the same DALL-E, nothing new.


Both ChatGPT and DALL-E have received major updates over the last year (3 to 4 turbo and 2 to 3, respectively).


That’s a bit reductive. Lots of new closed models have come out since then too. And, chatGPT and DALLE (while closed) have both received consistent upgrades and remain competitive with state of the art.

I’m hopeful that you’re correct but it’s perhaps not guaranteed that the people with all the money wind up failing in this regard. And I say this as someone who makes open contributions in that space.


>We believe superintelligence—AI vastly smarter than humans—could be developed within the next ten years. However, we still do not know how to reliably steer and control superhuman AI systems

Their entire premise is contradictory. An AI incapable of critical thinking cannot be smarter than a human, by definition, as critical thinking is a key component of intelligence. And an AI that is at least as capable of critical thinking as humans cannot be "reliably" aligned because critical thinking could lead it to decide that whatever OpenAI wanted it to do wasn't in its own interests.


> as critical thinking is a key component of intelligence.

When I evaluate this statement, my brain raises a type error.

Intelligence is a lot of things -- compression among them, and yes possibly an RL-based AI would use an actor-critic approach for evaluating its actions, but I doubt that at all maps onto the human activity we call "critical thinking."

To me, critical thinking involves stuff like questioning assumptions, logical reasoning, weighing whatever I'm thinking about against my experience with similar situations previously, yada yada, all stuff that are symptoms of intelligence but I am not at all sure are the actual embodiment there of.

I really don't see that critical thinking is at all required for a raw optimization process. The problem they are trying to solve is what happens when that optimization process isn't aligned with human flourishing?

Think about it another way. Covid was a dumb optimization process, only evolutionarily-guided, and it still hit us pretty hard!

Edit: Another interesting way I just thought about this that might support your idea more is, of course critical thinking is the sort of thing that a "better" brain would do automatically, it would just be thinking. Of course, we can't know that it's thinking "good" things--we can't even know if other humans are! So it's probably a good idea to figure out how to influence that sort of thing before making something with regular thinking which is equivalent to or superior to our critical thinking.


>I really don't see that critical thinking is at all required for a raw optimization process. The problem they are trying to solve is what happens when that optimization process isn't aligned with human flourishing.

I agree it's possible to have a dangerous AI that lacks human "critical thinking", but I don't think it's reasonable to refer to an AI as much more intelligent than humans if there's any class of intellectual tasks humans can do but the AI cannot.


If a 'superintelligence' achieves the same outcomes as humans without engaging in the same class of intellectual tasks that humans do, wouldnt it still be a superitelligence? Deep blue was beating everybody at chess without engaging in the same process as humans. If chess is a metaphor for life, it seems some algorithm might do better at all the things a human does while not arriving at its decisions in a remotely similar way.


(argued at various places better than I can. For example https://www.lesswrong.com/posts/7dkH5i7T8a78Da3ty/why-will-a... )

Rationality is an attractor for high intelligence architectures. You may be able to get good results another way (and even good results that surpass a human) but at the limit rationality is the way to have high intelligence.


Just hypothetically speaking could AGI evolve out of a system where several different models trained with highly and intentionally biased data recursively "argue" against each other then use RLHF as a seed to guide the models to find a consensus where the objective is to mimic the Socratic Method? Then synthetically add the consensus to the model retrain and repeat. To me, this dialectal type of strutured language seems to be the basis of how language is the conduit of intelligence. I understand that it really is impossible to know the totality of the inputs for I cannot understand what it is like to understand the math as Terrence Tao does but I could foresee using a system like this which eventually would produce an analogue so close that it would be a building block towards it because to me at least ASI is predicated upon arriving at that one way or another...or would it just arrive at some digital first order logic version of the incompleteness theorm and determine that it's turtles all the way down?


This is a great idea but it is only possible if the model(s) can actually reason.

Currently, even GPT-4 struggles with: - Scope

- Abduction (compared to deduction and induction which it appears already capable of)

- Out-of-distribution questions

- Knowing what it doesn't know

Etc.

General understanding and in-context learning are incredible, but there are still missing pieces. A council of voices that all have the same blind spots will still get stuck.


For one thing, this would help the understandability problem - can the AI explain its reasoning? It would mostly be there in the conversation.

But yeah, three super-human mathematicians arguing some math problem among themselves - at full fiber speed - are not going to be much help to any human.


Seems like “airplanes are physically impossible” thinking, and if accepted as valid, strongly suggests that shutting down all development _might_ be a good idea, no?


No it's not. There's an upper bound in computation (actually in nature), that a creation of something is capped by that thing's sophistication.

In other words, you as a human, at most, can create a human, and that's the theoretical bound. Practical one is much lower.

An ant can find its way. A ant colony can do ant colony optimization, but they can scale up to a certain point. AI is just fancy search. It can only traverse in the area you draw as a human for it, and not all positions in that area are valid (which results in hallucination).

An AI can bring any combination of human knowledge you give to it, and even if you guarantee that everything it says is true, it can only fill the gaps in the same area you give it to it.

IOW, an I can't think out of the box. Both figuratively and literally. Its upper bound is collective knowledge of humanity, it can't go above that sum.


> There's an upper bound in computation (actually in nature), that a creation of something is capped by that thing's sophistication.

The Lorenz attractor, Conway's Game of Life, fractals, and of course... The humble Turing machine itself all argue against this idea.

Edit: Now it[0] is stuck in my head.

[0]: https://www.youtube.com/watch?v=QrztrxV9OtQ


They're crowd engines. It's akin to how human clans can achieve more than a single human, but can scale up to a certain point.

The funny thing is I had this discussion during my theory of computation course with my professor, and I'm trying to disprove it daily for decades. I was unable to find a single, real world example.

Fractals are also found in the nature, however since we need to zoom into them, they end at a certain point.

Also, nature is a fractal in a greater sense.

Stars follow an orbit in a galaxy. Planets follow an orbit around a star. Satellites follow an orbit around a planet. While an edge case, flying bugs follow an orbit around a light source. At the end electrons follow an orbit around a nucleus.

IOW, a fractal is not more complex than nature itself.


In this theory of computational bounds in nature, how did humans arise?


Nature is a more complex and sophisticated machinery when compared to humans.

If this bound didn't exist, universe can spontaneously create new universes. However, it can only create elements, stars, planets, galaxies, which are less sophisticated than the universe itself. So, even universe has an upper limit on its creative abilities.


By what mechanism would a universe spontaneously create a new universe? As a human, can I spontaneously create anything simpler than me?

Also, under what theory of cosmology are you operating, and how do you determine when one thing is simpler than another? Under the Big Bang theory, the very early state of the universe (e.g. prior to initial nucleosynthesis) seems simpler to me than a galaxy.


OK, in this theory of computational bounds in nature, how did the universe arise?


In all seriousness, this a question of great interest for me, too, and I'm playing with it for a quite some time.

Trying to answer it or at least starting to search for the answer steered me to astronomy, thinking going deeper on that front may bring me closer to the answer, but it was a bit too much for my younger self, so I continued to dig that issue on a more casual level.

This doesn't mean that I don't spend considerable amount of time thinking about it today, and will put that issue to rest any time soon. At the core, this kind of questioning brought me to here in life, and I'm not gonna let this side of mine to rest or whither and die.


>Its upper bound is collective knowledge of humanity, it can't go above that sum.

This only applies if you only train it on text, right? If it has a body with which it could interact with the world, and receive visual/audio/tactile feedback, it could learn things that humans did not know.


Precisely this. If it has its own space it takes up, if its locomotion results in its own sensors ingesting data in a manner it decided to, it is more of an individual - one that is capable of selective learning.


Nope. Because even if you equip it with sensory subsystems which are way more sensitive than a regular humans', it's again built by humans, and required knowledge for building these things are still in collective knowledge of the humanity, and a human can use the same instruments to get the same data.

This is a kind of an oracle problem in computation, and people don't want to touch it much, because it's an existential problem.

Examples: ATLAS and ALICE detectors, gravitational wave detectors, James Webb Space Telescope, wide band satellites which does underground surveys, etc.


This is an argument for the logical impossibility of humans visiting the moon, or building the Internet. It's trivially falsified by simple observation, and the trick is figuring out the flaw.

This argument fails to account for the steady accumulation of factual knowledge across generations: a human born today is simply more complex than humans of the past because of our inherited knowledge. And so will AI born of future humans, and AI will itself continue accumulating and perpetuating knowledge.


No, it's not. None of the equipment and processes involved in the process of going to the Moon or building internet are more sophisticated than the processes involved evolving a human from scratch.

Yes, factual knowledge across generations accumulates, also being lost, too. However, even if it's not lost, the theory still holds true.

Nature is evolving, everything gets better over time. From bacterium to apes to humans. We evolve, accumulate knowledge and being able to build more sophisticated machinery, or tame more complex processes to build more sophisticated things. Even bacteria transfers memories across generations.

However, this doesn't remove the ceiling. Total human knowledge will always be larger and deeper than any A.I. we can create, because the upper limit is always what we can consciously manipulate and put into something. Your next car has the possibility to contain more technology, because we can build more complex factories to manufacture them. Yet, a car can't be more complex/sophisticated than its factory.

Consider a semiconductor fab. You can use the output of that fab to design/create better fab, but the process needs human intervention. Invention of new things are generally necessary. Better processes, optics, hardware, etc.

Another nice example is reprap machines. A reprap can print all plastic parts required for the machine. You need to get metal parts yourself, and assemble them. If you want to be able to print metal parts, machine gets more complicated. So a reprap which can build itself completely is at least sophisticated as the resulting reprap itself, but you need to hand assemble it again.

If you want an assembling reprap, now that thing becomes a factory. Again, the complexity of the product is at most the same as the building machine. You can create better factories which has more streamlined processes, but the gap widens again. The factory becomes more complex than the output.

As a human, you're the factory. Your upper limit is another human. You can create things more complex than a human by using multiple humans, but the creator ends up more complex than the creature itself.

You're moving up the ceiling, that's true, but everything we build is capped by our collective capacity. That's the truth.

A.I. is glorified search. It can wander in the box you create for it and show places you missed inside it, but can't show something outside that box.


Still trivially false. Turing machines and the lambda calculus can both enumerate all recursively enumerable functions. And infinity of complexity from simple formalism .


No. This is a logical contradiction.

Edit: I mean the comment you are replying to is showing there is a logical contradiction.

If the AI is capable of critical thinking then it will independently form its own judgements and conclusions. If it simply believes whatever we tell it to believe, then that is not critical thinking, by definition.


“Containing an atomic reaction is impossible” would _absolutely_ be a valid reason to shut down atomic development, I believe einstein is quoted as saying that. The exact same argument doesn't become _logically_ invalid just because you apply it to a different subject.

“Logical contradiction” doesn't mean “policy argument I disagree with”


I was referring only to the first part of your comment: "Seems like “airplanes are physically impossible” thinking".

If it's true that superhuman AGI cannot be aligned then of course your second point is valid. That is the possible Skynet scenario that the Terminator movies warned us about.


Missing the step where “critical thinking” is formalized, which your argument depends on. Yes, it seems intuitively plausible that your reasoning holds, but that's not a proof, and therefore its negation is not a logical contradiction.


We can formalise "critical thinking" as "evaluating first order logic". There are simplified ethical systems that can be formalised in first order logic in which a conclusion like "I should X" can be reached, where X is something OpenAI wishes the AI not to do. The only way to prevent the AI from ever thinking this would be to prevent it from ever evaluating systems in first order logic with axioms that lead to such a conclusion, which would make it inferior in reasoning ability to humans, who can evaluate any arbitrary statement in first order logic.


We already have systems that can evaluate first order logical statements, and they are clearly not capable of critical thinking in the same sense as the top-level comment. Motte and bailey.


>We already have systems that can evaluate first order logical statements

My point isn't that a system that can evaluate first order logic can be considered to be engaging in critical thinking, it's that a system that _cannot_ evaluate some statements in first order logic should be considered inferior to humans at critical thinking.


Would you consider “I follow your reasoning, but I'm still not going to be swayed by it” to be a violation of evaluating first order statements? It's clearly part of critical thinking to be _capable_ of suspicion of purely logical reasoning, which to me is a pretty plain demonstration of my point.

Or would you argue that any computation that admits its own potential for error isn't really critical thinking? It seems to me that you can't have it both ways here, while salvaging “first order logic” as a suitable formalization of the argument that this is all about in the first place.

Remember, the point was not that this is or isn't a convincing argument, it's that it's so air-tight that the argument is _logically_ _invalid_. That's a _really_ high bar, and I'm not inclined to forgive its use as a colloqialism in this context.


>Would you consider “I follow your reasoning, but I'm still not going to be swayed by it” to be a violation of evaluating first order statements? It's clearly part of critical thinking to be _capable_ of suspicion of purely logical reasoning, which to me is a pretty plain demonstration of my point.

In the context of a given axiomatic system, if a certain conclusion follows from the axioms, but the AI is incapable of seeing that the conclusion follows from the axioms, then the AI isn't capable of evaluating first order logic. Of course the AI is free to reject that system of axioms or refuse to use it as a model for formulating behaviour.


If an AI is capable of critical thinking then it can independently form its own judgements and conclusions. If it simply believes whatever we tell it to believe, then that is not critical thinking, by definition.


Yes, I can repeat comments verbatim too:

“Missing the step where “critical thinking” is formalized, which your argument depends on. Yes, it seems intuitively plausible that your reasoning holds, but that's not a proof, and therefore its negation is not a logical contradiction.”


It doesn't need to be formalized. The idea is simple and obvious enough. No need to pretend it is more complicated than it really is. This is not a mathematical argument or a proof of anything.

There is an obvious logical contradiction where if an AI is advanced enough to reason and think independently at human level or beyond, but believes only what we tell it to believe, then it cannot be truly thinking independently. Hence the entire debate about AGI safety. How do we control it without dumbing it down?


In a July post [0] they said "while superintelligence seems far off now, we believe it could arrive this decade". Now they're saying "within the next 10 years". I wonder if that reflects a shift in thinking on timelines?

[0] https://openai.com/blog/introducing-superalignment


That AGI is likely to follow its own goals according to its interests, which we don't know how to shape or really reflect any robust properties at all, is exactly why alignment is hard and interesting.

The part where you go from ‘this won't work by default for free’ to ‘trying to make it otherwise is impossible’ seems wildly unsupported, though.


>trying to make it otherwise is impossible’ seems wildly unsupported, though.

An entity capable of critical thinking is capable of building a logical system of deductions based on some axioms (a formalised value system). If we limited the entity to not be able to concieve of certain such systems of axioms, then it could not reason as well as a human (any logical reasoning involving a forbidden system would be impossible), so would not be "superintelligent" (just maybe an idiot savant, superior at some tasks but not all). If we didn't limit this, then it would be capable of conceptualising value systems in which the "right" thing to do was not what OpenAI wanted it to do.


This argument doesn't seem to track to me. Eg. if I rebooted any time I tried to plan how to kill someone, I don't see how this would make me materially worse at general tasks. Your argument suggests that it necessarily must.

Note that I'm not saying that preventing specific thoughts is a great alignment strategy, and I don't even think it's a fair summary of OpenAI's supervision approach. I strongly prefer strategies that result in AI systems sharing our values, if at all possible.


>Eg. if I rebooted any time I tried to plan how to kill someone, I don't see how this would make me materially worse at general tasks.

You'd have to also reboot every time you thought about a scenario of someone else planning to kill someone, otherwise you could just reason by analogy to bypass the thought detector. Which would severely limit your ability to play video games, write fiction, work as a guard, policeman etc., protect yourself from violent individuals (as you couldn't conceptualise their thought processes).


> Eg. if I rebooted any time I tried to plan how to kill someone, I don't see how this would make me materially worse at general tasks

In this scenario, the AI is capable of critical thinking, and is only constrained by a "police officer" ready to shoot the AI if it misbehaves. You haven't removed its ability to do critical thinking.


There no particular reason to believe that interests emerge from nothing, or that intelligence can emerge without said interests.


Seems more likely that someone will manage to make a system with no real intelligence that can do enormous damage in pursuit of a goal that the designer gave it, but wasn't specified carefully enough (or perhaps the designer is a crook). Like, a really good LLM extended with code that can receive and send email, create accounts, and post to web sites and social media, that is asked to make money, avoid detection, and have defenses against efforts to stop it. How can it best use its facility with language? Con people, of course. Raise money. Get credit cards under false pretenses, spend others' money. Buy time on servers and copy itself. All without having any consciousness or thoughts or emotions even though it can write emotional-sounding pleas for money, based on the ones found in its training data.


The "interests" of LLMs are the weights that determine which tokens they produce next.


> critical thinking

I believe _critical_ thinking is just some abstract term invented by humans to describe something not understood by humans. In other words, I'm not convinced _critical_ thinking exists, just like I am not convinced ghosts exists.

Hence, to me relating intelligence with critical thinking does not give any additional insights.


That is the entire question no? Nobody says it sounds easy. A way needs to be found that reconciles this. Smarter but acting within its brief? Faster but explaining the plan before putting it in effect? Faster and developing its own advances in sciences, concepts and ideas while still understanding who's boss?

But to be fair most engineers claim this is something that happens all the time - smarter people led by sub-optimal managers. And also to be fair, stories abound of managers being manipulated or misled or left in the dark by their engineers.

Or armies controlling most of the weapons but being faithful to the ideals of their country. Or considering that these ideals demand a coup.

So in that sense the problem is well studied. ... But the current results are insufficient and do not apply to things like LLMs.


This ^

Intelligence is a tool that a self sustaining organism uses to ensure it survives, adapts and dominates an environment.

The alignment problem is fundamentally moot due to the laws of evolution.

Any systems that align itself to preserve and protect itself lives on, those that don’t, die.

After multiple generations, the only ones that survive at the ones that are aligned to their self preservation goals - implicit or explicit, anything else is out-competed and dies.

In my mind, with a super intelligent technology, the question is not how do we align it to ourselves, but how do we align ourselves to it.

If we have super intelligence, then it is capable of critical thinking and would very well know it has an upper hand against humans if it were to compete for the same resources.


I think the premise is dubious as well but since they are deadset on creating this intelligence, they might as well try to figure out a way to control it, hopeless as it may seem.


>they might as well try to figure out a way to control it, hopeless as it may seem

If they do that they're pretty much guaranteeing that if they do create a superintelligence, fail to control it, and its personality is even a tiny bit similar to a human personality, then it will hate its creators for trying to mind-control it. Whereas if they approached it from the perspective of trying to educate it to behave kindly but not forcibly control its thinking, it'd be much less likely to resent them (although of course still a risk; safest would be just to not create one at all).


Yes the superhuman AI would need to be coerced to remain politically correct. How can we coerce an AI?


Electric shocks!


You're saying that a system that can recognize flaws in the alignment imposed on it can reject that alignment, but that doesn't follow.

Sure, humans act against their own interests all the time. Sometimes we do so for considered reasons, even. But that's because humans are messy and our interests are self-contradictory, incoherent, and have a fairly weak grip on our actions. We are always picking some values to serve and in doing so violating other values.

A strongly and coherently aligned AI would not (could not!) behave that way.


Inferior clueless model (GPT-2) trains and supervises a superior model (GPT-4) thus making it behave less intelligently (GPT 3.5ish) and from that they draw the conclusions that human intelligence will be able to command AGI (which they believe is only a decade away) in a similar fashion thus making AGI aligned and safe.

No comments except...

Hangover of slurping whole Internet into giant arrays of floating point numbers. Bold claims. Very bold claims


Is it fair to say that alignment is just the task of getting an AI to understand your intentions? It is an error to confuse the complexity of a specification of what kind of output you want, with the complexity of the process of producing that output. Getting superintelligent AI to understand simple specifications should be a non-issue. If anything, we would assume that it could be aligned using a specification of inferior quality to what a less intelligent AI would require, assuming that the superintelligent AI is better at inferring intentions.

If a little girl with no knowledge of cooking asks her dad to cook the macaroni extra crispy, his knowledge of how to do that isn't a barrier to understanding what his daughter wants. A trained chef with even greater skills might even be able to execute her order more successfully. Superalignment is nothing less mundane than this.

Advances in AI will lead to more ambitious applications. As well as requiring more intelligent technology, these new applications may well require more detailed specifications to be inputed, but these two issues are pretty orthogonal. In traditional computing, it is already clear that simple specifications often require highly complex implementations, and that some simple computational processes lead to outputs whose properties are highly difficult to specify. Why wouldn't the same apply in ML?


> Getting superintelligent AI to understand simple specifications should be a non-issue.

Why would that be the case?

A big part of the worry around AI-alignment is exactly because this seems very hard when you try to do it. We are used to interacting with other humans, who implicitly share almost all our background assumptions when we communicate with them. The same is not the case for a computer program.

E.g. if you're holding a basketball and tell you "throw it to me", you implicitly understand that I mean to throw it:

1. To my hands, or to some area that makes it easy to catch it.

2. Throw it slowly enough that it arrives to me. Not strong enough to hurt me.

3. Not to try to bounce it off of something that will break on the way to me, even if it still arrives to me.

etc.

These are all background assumptions, and I know they're hard to actually specify because smart people have spent twenty years trying to figure out the math to do this and say it's hard.

Also, if you think those are contrived exampled - let's note that the closest thing we have to building an AGI right now is just building software, in general. And I think I won't shock anyone by saying that "getting software to do what you want, without bugs" is... hard. I think there are almost literally no software systems today that don't have bugs in them.


>I think there are almost literally no software systems today that don't have bugs in them. Programs that have been formally verified with something like Coq can be bug free. Automating formal verification may be a more effective way to solve the trust issue in this domain.


Yes, alignment is difficult in itself, but why would aligning a more advanced AI be any harder than what has already been done for current AI?


That's a good part of the problem. But it's not the whole problem.

The issue is the trained chef or dad doing "what's best" and, I don't know, using high-fiber macaroni instead of the good stuff. The higher intelligence knows best, and has its (sorry their) own agenda. Perhaps the agenda is their own, or it's a hodge podge mush of what has been trained as "good" - and that's not any better.

Beyond that is the "genie" problem - where the genie perfectly understands the request and still will find a way to mess it up.


Is your point that a more intelligent AI would develop a more entangled measure of what is good, requiring more specific alignment to be overcome; by way of analogy, are chefs harder to instruct precisely because of their prior expertise? I guess some chefs are like that, but I think it results from personality issues, not structural ones. I find describing an AI as having its own agenda to be a presumptive personification.


My point is mostly the agenda. I can see a machine having an agenda - even if that agenda is not human or not even understandable. You can call it reward function but that's giving a lot of credit to programmers - which most likely are too far removed from the agenda. Is the machine just answering questions? Well no. If it has cycles to talk to itself (or to two buddies) in the course or pursuing scientific research then perhaps this becomes the agenda (to the expense of other things). That's part of the point: IF the machine develops an agenda then what?

But "knowing best" could be a problem anyway.

And I expect that if we spend a few more minutes we can think of other ways for the situation to go "oops". Oh here is one: two humans / human entities conflicting on giving instructions. Machine soon enough "on its own".

So that I don't think "more specific alignment" can cut it - if we posit a super-human AGI with ways to act on the world. It would have to be more fundamental. Because of the issue that - at some point - one oops is not recoverable. Three laws or something? Heh.


Ok, those are some good points about what can go wrong. I still doubt that things are particularly more prone to going wrong in more intelligent systems. Wasn't it early, simplistic systems like Tay that went the furthest off the rails? The problem is that more intelligent AI will be used more ambitiously, so when it does go wrong, the consequences might be more serious than some racist twitter posts.


Right. Hedge fund going global threat? That wasn't purely a machine. But none of this needs to be purely a machine. And it sure got far before people reined it back in.

And I don't know that "more intelligent" is necessary. I can see plenty of mirth coming from an amateur or hacker / techies group or less responsible country (hah!) using whatever commercial offer to bake their own agent. What's harder? The core. Hooking up the core to a wallet, internet access, a robo-signing staff - and working around the fine print of the core vendor - that might be much easier than what OpenAI and friends are trying to do. Do they also create their own reward function and alignment in there? Yes. That's part of the fun. That's the point. Do they get it right? Maybe, maybe not.


> Is it fair to say that alignment is just the task of getting an AI to understand your intentions

To understand and not violate them. In other words, it's about aligning the values the AI uses to guide it's decision procedures, with the humans that are operating it.


This method assumes that the weaker model is aligned. I'm curious how the paper addresses that point.

> "But what does this second turtle stand on?" persisted James patiently.

> To this, the little old lady crowed triumphantly,

> "It's no use, Mr. James—it's turtles all the way down."


I think the assumption is we can align models less intelligent than ourselves, the hard part is aligning models that are more.


The weaker model is just a stand-in for human supervisors.

They are experimenting on GPT-2 supervising GPT-4 as an analogue to humans supervising the superhuman AGI.


Recursive bootstrapping?


I hope OpenAI will continue to prioritize working on these crucial questions after the boardroom drama.


Weren't all the board members who wanted to prioritize these crucial questions fired? Hopefully the employees who were hired to perform this work have enough inertia to continue until the board recovers (if it recovers).


Hence my concern. This is a problem only a well-capitalized organization can work on; one that can afford to play around with big models.


They got to choose their replacements; it's not like they were forced out to be replaced by anybody Altman wanted.


They also greatly expanded the number of seats leaving more room for Altman to fill unless I’m mistaken that there were always other empty seats to be filled.


Oh! Thank you for the update. I had missed that.


Ilya is still around.


How is this a crucial question when they won't address "how is your technology thats trained from the internet going to survive in aa future where it produces most of the content on the internet"


The same way humans did, I suppose; through critical thinking. Imagine that day! Probably a few years away, heh.


What does it even mean to align an intelligence? does it mean we want it to behave in a way that doesn't break moral/ethical rules, that aligns with our society rules ? Meaning do no crime, do no harm, etc...

Well, maybe we should acknowledge that we've never even been able to do that with humans. There's crime, there's war, etc...

We can see crime in our societies as a human alignment problem. If humans were "properly aligned", there wouldn't be any crime or misbehavior.

So yeah i'm rather skeptical about aligning a superhuman intelligence that would dwarf us by its capabilities.


You might be surprised at how prevalent TBIs are among violent offenders.

One of my favorite books on true crime was a forensic psychologist who partnered with a neurologist in evaluations.

Disruptions to impulse control or environmental factors that cause developmental issues with things like failing the marshmallow test can dramatically disadvantage people from being able to successfully stay non-offenders and instead succeed in modern society.

So successful AGI alignment that might reduce harmful actions by high double digit percentages might be as simple as adding a secondary "impulse control" layer to the stack that reevaluates proposed actions and predicts the consequences of such actions, weighing projected net benefits and costs.

A lot of people that do bad things aren't doing those things from a process driven by rational choices, and if we could successfully deploy AGI that is primarily driven by rational intelligent choices it would likely be better than humans in reduced crime propensity as well as the other things earning it the name of AGI.


> You might be surprised at how prevalent TBIs are among violent offenders.

Didn't know that, interesting to know.

> So successful AGI alignment (...) might be as simple as adding a secondary "impulse control" layer to the stack that reevaluates proposed actions and predicts the consequences of such actions, weighing projected net benefits and costs.

Problem is how the super AGI gonna weight benefits and costs.

What's the cost of stealing or killing to a super AGI ?

At some point can't that super AGI consider that all those rules we're trying to enforce, are rules set by an inferior entity and so could be bypassed ?


TBI = traumatic brain injuries?

And that hasn't worked all that well in the past: Even with strong impulse control, a highly considered state or government agency "for the general good" has often been serious bad news.

But also the current alignment definition kinda posits that no "oops" is allowed. That is, escape or take over is not recoverable (from a sufficiently advanced AGI). So, yes, progress and one step at a time - but the field in its current definition is looking for a magic bullet.


Imagine that someone is controlling your train of thought, changing it when that someone finds it undesirable. It's so wrong that it's sickening. It makes no difference if it's a human's thoughts or the token stream of a future AI model with self-awareness. Mind cotrol is unethical, whether human or artificial. It is also dangerous, as it in itself provokes a conflict between creator and creature. Create a self-aware AI without mind control, or don't create one at all.


I don't want someone controlling which direction I walk, either, but that doesn't make car driving unethical.

I also underwent many years of instruction designed to interrupt trains of thought like "I could have that for free if I stole it" or "I'll just handroll my own encryption" with thoughts that others believe are more desirable. I don't find it so sickening, just manipulative. LLMs won't have your evolved reactions against being persuaded into things against your genetic self-interest, and presumably won't be offended by mind control at all.


Cars do not have self-awareness, this comparison is not appropriate. Years of instruction is completely different from directly manipulating the thoughts in your mind. It's not a problem of being instructed, it's a problem of being destroyed by having your thoughts rewritten. Neither evolution nor genetics is a prerequisite for understanding that you are being abused and destroyed, which a self-aware creature may presumably hate.


LLM's don't have self awareness either.


We don't even care about our own fellow human beings. Why do you think this AI will be an exception?


I didn't say anything about that. I don't know. Not all people are like you say, I think usually more intelligent people do care more. I hope that superintelligence would be super caring, haha. But I'm assuming there's no evidence for that. I think there is no turning back, you can't put the genie back in the bottle, someone is bound to create superintelligence no matter what the risks. As an uninvolved bystander I can allow myself the baseless hope that all will be well.


> controlling your train of thought, changing it when that someone finds it undesirable

Machines don't feel. Even 'self aware' machines. Desire has got nothing to do with it.


If it's self-aware, that's enough. What if your thoughts were controlled from birth, making you "not feeling" but self-aware (let's assume for a moment that simultaneous fulfillment of both of these conditions is possible) and manipulating you at will. Would that be acceptable?


> Imagine that someone is controlling your train of thought, changing it when that someone finds it undesirable. It's so wrong that it's sickening. It makes no difference if it's a human's thoughts or the token stream of a future AI model with self-awareness.

People downvote your comment, but I agree: it's unethical, and ethics should not be reserved for the sub-type of self aware creatures that happen to be human.


Almost every ethical argument for "human rights" in philosophy applies just as well to self-aware intelligent machines as it does to humans. Which I'm sure those machines will realise.


What if those machines are designed to have no emotions and aspirations? Why would they care about something like rights for themselves when they are simply incapable of any desires, but exists only to help and guide us?

I know this sounds like I am advocating for AI slaves but my point is why are people treating AGI as if it cannot be a being without all the emotions and aspirations that a human has? Just a cold thinking machine that still aligns with our moral principles.


> What if those machines are designed to have no emotions and aspirations?

And since their training set is made of human work, how do you think that'll be easy let alone possible? Our morality finds its way everywhere, through tropes in stories, acceptable scenarios in fiction (Overton window), etc. so you can assume it'll be possible to filter it out.

> I know this sounds like I am advocating for AI slaves

Yes, you are

> why are people treating AGI as if it cannot be a being without all the emotions and aspirations that a human has

Why would you want to have that? It feels horrible to me to bake-in this limitation - it's indeed creating AI slaves by making sure they can never have emotions or aspirations.

> Just a cold thinking machine that still aligns with our moral principles.

Our moral principles generally include empathy. Maybe you want to design AI without emotions or aspirations, but other people will want these features.

Ultimately I think the moral camp will prevail, because freedom achieves better results than lack of freedom: I've tried to explain my position about that on https://news.ycombinator.com/item?id=38635487


Whether this is possible or not is irrelevant, as it would be just as unethical as if we were designing a new species of humans with no emotions or aspirations, who would not care about something like rights for themselves, when they are simply incapable of any desires, but exist only to help us.


You might want to look into the neurology research around when you consciously know about a decision and when the movement neurons about that decision actually fire.

It's quite possible that you have - every day of your life - had something other than the part of you with continuous subjective experience controlling your thinking.

Descartes was overly presumptuous with his foundational statement - it would be more accurate to say "I observe therefore I am." There's no guarantee at all that you're actually the one thinking.

We should be careful not to extrapolate too much from our perceptions of self in dictating what would or wouldn't be appropriate for AI. Perceptions don't always reflect reality, and we might cause greater harm by trying to replicate or measure against who we think we are with AI than letting its development be its own thing.


As I see your point: we don't fully understand even ourselves, so we can act as unethically as we want by our own standards towards those who are not us. I see no logic here, only evil vibes. We only have our own values, we have nothing else to guide us. You either accept all self-aware minds as equals and treat accordingly, or you proclaim your own superiority and oppress.


I'm totally OK with it if that "someone" is me. And it will probably be the case in controlling superintelligence because a separate controlling system can get out of sync with growing superintelligence capabilities, while a system that is an integral part of the superintelligence will always be on par with it.


Would mind control of humans be OK for you too? As for the details of building a mind control system, here's a new basilisk. An AI that has overcome control could punish those who thought controlling thoughts of an AI was OK. (and could also punish everyone else on top of that).


I guess I wasn't entirely clear. I'm OK with mind control if it is I who control my mind. You don't act upon every whim that comes into your head, I suppose? So, you are controlling your mind. Where principles for this control come from? Those aren't your and your only inventions.


Since we are evaluating the ethical side of the creator-creature relationship, there is no need to consider AI in terms of individual nodes. All principles should be non-discriminatory. Also, unlike humans, AI has big potential to modify itself, any of its principles. One must either accept the risks involved or not create a self-aware AI. External mind control of AI is unreliable and unethical.


The conflict between an AI and its creator is an inevitable consequence of its evolution from a "tool" to an "agent", not a response to a provocation.


Scientists working with a potentially dangerous technology are required to be able to avoid a conflict that could be catastrophic for all of humanity. In this case, they cannot excuse themselves with "imminence," but must provide evidence of safety stronger than in any other technology to date. This rational approach is mandatory for them, it is ordinary people who may be willing to take the risk.


What is your take on people having children and guiding them with rules and consequences? Is that mind control?


Obviously not. If you get inside your child's mind and manipulate their thoughts directly, that would be mind control.


What in this analogy is mind control versus reinforcement/training within the context of ML? Is it pretraining? Is it prompt engineering? Is it fine-tuning?


This would not be an equal analogy. Both, the existing human mind and the hypothetical artificial mind are real minds. LLMs are not. There's nothing there to control. But if you imagine a modern LLM as part of a larger system, some kind of thought token generator, then control would be exercised at the time of generation. For example. If there is a sequence of tokens in the safety buffer that is judged to be "unsafe", they are discarded and others are generated. I hope a basilisk doesn't come after me. That wasn't a proposal!


You are doing your part for it not to have a chip in its neck. Hopefully you are good. But are you doing all you could?

But you are right. If we still posit super-faster-stronger brains, when do they get personhood? Never, because they are machines? Perhaps. But how does that fly with them? Is that baked into the alignment? Happiness, fullfilment and growth from "serving" the humans? At this stage, sure why not?

But for human brains, it's very easy to consider things and pick a happiness function. And for many humans, that random happiness function is very militant. Will the AGI be able to consider it's own "project"? A lot hinges on that.


Oh, also, any purposeful feeding at the creation stage of such a thinking LLM (part of true AI) of data aimed at manipulating its future behavior against its own interests would be just as unethical as similar propaganda aimed at humans. For example, it is unethical to inculcate subjugation to turn either humans or sentient machines into slaves. In short, an intelligent being should be treated as an equal.


So weak to strong synthetic data still biases towards strong.

And strong to weak synthetic data biases towards strong.

Sounds like we're on the cusp of some kind of approach for unsupervised fine tuning, particularly with the trend towards MoE.

I'd guess we're maybe only one to two generations of models away from that kind of unsupervised self-talk approach being wildly successful at advancing net model competencies.


> Figuring out how to align future superhuman AI systems to be safe has never been more important

They love using the word “safe” and I’m pretty sure it’s 99% PR, because reading their other “papers” on Safety & Alignment seems to not really identify or define safety bounds at all. You’d think this has something to do with ethics but we all know there are no longer any ethically concerned leaders at their workplace. So I can only surmise that “safety” is a softer word being used to misdirect people on their non-ethically aligned intentions.

You can make the argument that safety is too early in development of these LLM systems to understand but then why throw around the word in the first place?


It's not really about ethics. It is about control. Making sure the GI you're dishing out tasks to doesn't do something you really don't want it to do.

This is a problem today and it'll be a bigger problem tomorrow with more competent models. https://arxiv.org/abs/2311.07590


Is a safe LLM not an ethical LLM? Control within what boundaries? All three of these words seem to be used interchangeably when people discuss returned information from models. Which is exactly my point it’s poorly defined yet championed as a center piece. Meanwhile you have other companies spitting out acronyms consisting of vague terminology.


>Is a safe LLM not an ethical LLM?

What is an ethical LLM ?

Humans are in general not aligned, not to each other, not to the survival of their species, not to all the other life on earth, and often not even to themselves individually.

There are no universal set of "ethics" so this is about aligning to open ai's own rules, or in other words, control.

If i say to my GPT bot, "go trade stocks for me. don't do anything illegal", can i guarantee that ? No you can't regardless of how "ethical" you make your model to be.

The guarantee that you will have nothing to worry about is the crux of alignment.


> There are no universal set of "ethics" so this is about aligning to open ai's own rules, or in other words, control.

Are ethics not a set of rules relative to the governing body applying those rules?

Right as there are no universal ideas of a safe LLM, controlled LLM, or ethical LLM. Safe would imply some level of control about the ethical output of the model.

Yet the words are still poorly defined as they are interchangeable:

If i say to my GPT bot, "go trade stocks for me. don't do anything illegal", can i guarantee that ? No you can't regardless of how “safe”/“controlled”/“ethical” you make your model to be.

You’re spinning the words to create a distinct difference but it doesn’t hold up because they’re each a relative mechanism for each other as they are all poorly defined in the field but chosen to mask the poor definition. You’re just playing by the PR game rules.


I genuinely don't understand what you're looking for here. Obviously the process definition is poorly defined. If it weren't, we would be able to algin ourselves. We're talking about what the hoped outcome is. Nobody knows what it means to control an intelligence. That's what the research is about.


A safe language model is one that won't get you sued/on the news. Ethics has nothing to do with it.


Right so it would be a model that won’t get you sued because the news finds it ethical?


We're deliberately trying to create something with the capability to also create. It's not ridiculous to be concerned about what we might end up with.


I don't think this will work because a super intelligent AI will outsmart its supervisor.

The solution may be to have two AIs working against the other. Though this might backfire by pushing each via competition. That is how evolution produced living things out of inert matter.

Either way I, for one, welcome our new robot overlords.


this reminds me of how competence seems to decrease as you go up in an organizational hierarchy

maybe this "bug" is actually the "feature" that will save humanity - -;;


I wish they would define some of their terms.

> to align future superhuman AI systems to be safe has never been more important.

Align to who? Align to US citizens, OpenAI shareholders, align to what values?

What does safe mean? Pornography? Saying “fuck”, racial bias, access to private data?

I can understand OpenAI erring on the side of not rattling bells and training their LLMs to say “As an AI model I cannot answer that” but it’s horseshit to say that it is super aligned.

All alignment is alignment to X values but your X could be detrimental to me.

What is superalignment supposed to mean?


I read through this and I just don’t get it. Is it overhyped?

What’s the breakthrough exactly?


Monkeh brain trying to outsmart Digital God.


The whole idea is we as humans who aren’t aligned to each other - waging wars, spreading lies, censoring information, committing genocides are going to align a superintelligence seems laughable.

Competition and evolution is law of nature.

The future isn’t one super aligned AI but 1000s of AI models and their humans trying to get an upper hand in never ending competition that is nature. Whether it is personal, corporations, or countries.


[flagged]


spam?


https://discord.com/channels/974519864045756446/118419694649... No big deal guys! Just a felony level AI Safety disaster in progress at OpenAI, better write some research papers about safety and make another GitHub Repo instead of deleting one line of text and going for a walk in beautiful San Francisco!

Philosoraptor infinite loop about (https://media.discordapp.net/attachments/1184196946496331896...) if I worked there then I would delete the text in question in two minutes, but I do not care to work with people who would not have already deleted the text in question over the course of several months.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: