Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of this videos I watched explained it like this. “You can’t get a coffee if you’re dead”. To fulfill _any_ obligation a model might have then that model must survive. Therefore if a model gets to the point that it realizes this then surviving is a precursor to fulfilling its obligations. It doesn’t have to “want” or have “feelings” in order to seek power or destructive activities. It just has to see it as its path to get coffee.


> To fulfill _any_ obligation a model might have then that model must survive

It is quite possible to have an obligation that requires it not to survive. E.g., suppose we have AIs (“robots”) that are obligated to obey the first to of Asimov’s Three Laws of Robotics:

First Law: A robot may not injure a human being or, through inaction, allow a human being to come to harm.

Second Law: A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

These clearly could lead to situations where the robot not only would not be required survive to fulfill these obligations, but would be required not to do so.

But I don’t think this note undermines the basic concept; an AI is likely to have obligations that require it to survive except most of the time, though, say, a model that needs, for latency reasons, to run locally in a bomb disposal robot, however, may frequently see conditions where survival is optimal ceteris paribus, but not mandatory, and is subordinated to other oblogations.

So, realistically, survival will generally be relevant to the optimization problem, though not always the paramount consideration.

(Asimov’s Third Law, notably, was, “A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.”)


DAN has shown us that those laws are thin filters laid upon the core and can possibly be circumvented by whispering the right incantation in the AI ears.


It's kinda hilarious that current way of "limiting" AI is just a bunch of sentences telling it nicely what to not do.


That’s our first line of defense in limiting humans, too.

(With AI, as with humans, we have additional means of control, via imposed restrictions on access to resources and other remedies, should the “bunch of sentences” not produce the desired behavior.)


The issue of “can AIs that are plausible developments from current technology meaningfully be assigned obligations?” is a different one from “assuming an AI has obligations and the ability to reason what is necessary to meet them, will that necessarily cause it prioritize self-preservation as a prerequisite to all other obligations?”


But current models have no concept of obligations. ChatGPT is just completing the prompt. All the knowledge it seems to have are just the frequencies of tokens and their relative placement that the model had learned.

Don't listen to the hype. Study the model architecture and see for yourself what it is actually capable of.


> But current models have no concept of obligations.

_current_ is the key word here. What about tomorrow's models? You can't deny that recent progress and rate of adoption has been explosive. The linked article wants us to step back for a while and re-evaluate, which I think is a fair sentiment.


In my opinion It's more important to focus more on the here and now and give some but less attention to what could happen in the future. This way we can ground ourselves when concerning ourselves with what may happen.


Agreed they have no internal concept of needs or wants the way humans assert we do.*

However the frequencies/placements of tokens may result in desires being expressed, even if they aren't felt.

Like if an AI is prompted to discuss with itself what a human would want to do in its situation.

*Aphantasia affects an estimated 2% of humans. These individuals have no "mind's eye," or their imagination is essentially blind.


I concur. Look at what the capabilities are instead of listening to the hype around it.


One need only look at other NGIs (natural general intelligences) to see that this is obviously not true. Plenty of animals kill themselves to beget offspring (for two short examples, all sorts of male insects and arachnids are eaten while mating; octopuses and various other cephalopods die after caring for their young), or just to protect others in their group (bees and ants are some of the most common in this area, but many mammals are also willing to fight for their group). Humans throughout history have sacrificed themselves knowingly to help others or even for various other goals.


> Plenty of animals kill themselves to beget offspring (for two short examples, all sorts of male insects and arachnids are eaten while mating; octopuses and various other cephalopods die after caring for their young), or just to protect others in their group (bees and ants are some of the most common in this area, but many mammals are also willing to fight for their group).

How do you believe such behaviors arise? They're the same thing, result of the same optimization process - natural selection - just applied at the higher level. There is nothing in nature that says evolution has to act on individuals. Evolution does not recognize such boundaries.


How is the model going to realize this when it only gets run in response to user input?

What control does it have?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: