Artificial intelligence will do what we ask, and that's a problem

fallous · on Jan 31, 2020

The only thing worse than the "evil" genie giving us exactly what we ask for is the genie that decides its inferences about our actual intents/wants/needs/motivations based purely on our behaviors matter more than what we intentionally express as our intents/wants/needs/motivations.

The most obvious analogy, and I acknowledge the potentially controversial nature of it, is that of a date rapist attempting to defend their actions by pointing out the victim was dressed provocatively, gave all the "signals" that they wanted sex, and when they expressly said "no" the rapist knew from their prior behavior that they actually did want it (or had some inner heretofore unexpressed need for it).

The true underlying motivation for such a rewrite of Asimov's laws lies in the paragraph that follows Russell's new list. "...developing innovative ways to clue AI systems in to our preferences, without ever having to specify those preferences." Perhaps he can start by ceasing to write and speak and instead clue in his audience to his preferences via behavior, since specifying them is somehow undesired.

Asimov's Three Laws exist to place boundaries around the genie's means of achieving our wishes. Russell's laws remove not only the genie's boundaries to means but also lets the genie make the wish as well. After all, it knows better than you what you want.

ajuc · on Jan 31, 2020

> The only thing worse than the "evil" genie giving us exactly what we ask for is the genie that decides its inferences about our actual intents/wants/needs/motivations based purely on our behaviors matter more than what we intentionally express as our intents/wants/needs/motivations.

It's not worse.

> a date rapist

A rapist shares 99+% of definitions and values with you as a fellow human being. AI won't (unless you somehow program them in). Rapist has his evil motivations, but won't suddenly invent a virus that kills the whole human species because you asked him to stop neighbor kids trampling your garden. AI might. If you tell it not to kill anybody it might destroy our civilization to prevent us from killing ourselves with global warming. Why not - it only makes sense.

Simplest way to ensure safety for the maximum number of human beings is to anesthetize everybody and put them on life support till their natural death. Perfect record is possible - you might cure addicts and prevent crimes and wars. If you specify people have to be awake as often as they usually are - it can keep them awake but restrained. You want people to have freedom of movement? Ok - you just released the whole prison population :) Keeping criminals in prison is OK? Then it may make everybody a criminal for a quick fix. Or just drug everybody to WANT to be restrained. And so on, and so on.

There's infinite number of possible courses of action that we discard without consciously thinking about them because of our assumptions. You have to put these assumptions in the AI, each and every one of them, and they are very subtle and invisible for us most of the time. And they often border on philosophy and morality, and defining them is political by definition.

It's probably impossible to code all our values and assumptions in by hand. That's a much bigger problem for safe general AI than a post-factum explanations that rub your morality the wrong way.

> Asimov's Three Laws

Are self-contradictory and useless for anything except literature.

fallous · on Jan 31, 2020

And a drug addict, by analysis of behavior, is committing suicide so the AI should help them achieve their inferred goal based on that behavior. Nevermind asking them if that is their intent, it knows better because it sees the behavior.

Russell's laws encode NO limits and doesn't even demand the AI check its judgment with the so-called beneficiaries of its decisions. That is most assuredly worse.

ajuc · on Jan 31, 2020

Decoding the preferences of people from their behavior is just a first step, it's not the only rule, it's a rule that makes writing other rules safer than they would be otherwise.

tshaddox · on Feb 1, 2020

I don't think that "sharing 99+% of definitions and values with us as a fellow human being" has really provided much protection against deliberate human-caused disasters. It's true that I wouldn't expect most people to invent a virus as a weapon, but I absolutely would expect governments to do so, and in fact I think they have done so.

ajuc · on Feb 1, 2020

> I don't think that "sharing 99+% of definitions and values with us as a fellow human being" has really provided much protection against deliberate human-caused disasters.

There was this thing called "cold war".

tshaddox · on Feb 1, 2020

Tens of millions of people were killed in Cold War violence. It was a huge disaster caused intentionally by “fellow humans.”

ajuc · on Feb 1, 2020

I have issues with that estimate, but even accepting it - it was still a big improvement over WW2, WW1, colonization, Mongol invasions etc. And certainly preferable to a global thermonuclear war which was a real possibility.

Have you heard of https://en.wikipedia.org/wiki/1983_Soviet_nuclear_false_alar...

tshaddox · on Feb 1, 2020

Humans very deliberately created nuclear weapons, with the intention of using them to kill enormous numbers of people, which they did. The fact that humans created something horrible and then used that thing fewer times than they could have is hardly a triumph of humans, and it certainly isn’t great evidence to support the argument that humans are limited in their capacity for doing bad things because they supposedly share values with other humans.

ajuc · on Feb 1, 2020

Automated systems told them to start a nuclear war and humans didn't.

tshaddox · on Feb 1, 2020

I don’t think that system was anything close to what would be described as an AI. Wasn’t it just a radar system intended to identify ICBMs?

ajuc · on Feb 2, 2020

Think of the system as (human + the warning computers + radars).

Thanks to the human element the system has shared human values and decided not to start a thermonuclear war despite it being the recommended course of action.

If it was a completely AI system it would probably just start the war.

tshaddox · on Feb 2, 2020

But that system was designed (by humans) to give information and warnings to a human. Of course it would be a mistake to design a system like that and then remove the human.

wmeredith · on Jan 31, 2020

Perhaps a less controversial example would be the interpretation of the three laws by the antagonist in the film I, Robot from 2004. From wikipedia[0], "in [the antagonist robot's] understanding of the Three Laws, she has determined human activity will eventually cause humanity's extinction, and as the Three Laws prohibit her from letting that sort of thing happen, she rationalizes that restraining individual human behavior and sacrificing some humans will ensure humanity's survival"

0) https://en.wikipedia.org/wiki/I,_Robot_(film)#Plot

jcranmer · on Jan 31, 2020

Asimov explicitly added a Zeroth Law of Robotics when he bridged the Robot and Foundation series, although it's still commented that a robot actually killing someone is too much of a cognitive dissonance to bear.

It's worth noting that most of Asimov's robot stories are him basically exploring the problems with the Three Laws; The Naked Sun is basically an entire novel showing how Three Laws robots can be capable of murder.

edmundsauto · on Jan 31, 2020

Interestingly, I believe that what we perceive as our intents/wants/needs/motivations is just an internal genie. Put another way, we have no idea what we really want, consciously.

Look at the Schindler's List Netflix example. Also, "A person can do what they want, but not want what they want."

HONEST_ANNIE · on Jan 31, 2020

What is more immediate problem to solve: AI-to-Zuck or AI+Zuck alignment?

1) AI-to-Zuck alignment problem: Align AI to it's master(s).

2) AI+Zuck alignment problem: Align AI and it's master(s) to the rest of the humanity.

Zuck is just stand-in variable name for any tech billionaire CEO, corporation or governing body, could be people running OpenAI, Google, Facebook, Microsoft, China or Pentagon. 2) seems like the problem for humanity and 1) as the problem for Zuck.

ifdefdebug · on Jan 31, 2020

> Zuck is just stand-in variable name for any (...)

Very badly chosen variable name, because it singles out one special case instead of reflecting the broader concept, which could confuse other devs picking up from here.

It's like defining a variable that stands for "vegetable" and naming it "carrot".

whichquestion · on Jan 31, 2020

Maybe it would be more appropriate to say that Zuck is an instance of the CEO/Tech Billionaire/Governing Body/Corporation/PotentialAIMaster object.

HONEST_ANNIE · on Jan 31, 2020

It's bad in the same sense as "No true Scotsman" singles out the Scotsmen.

Selecting easy to remember overly specific representatives is just way human mind works.

dogdawg · on Jan 31, 2020

Have you seen videos of Zuck? AI+Zuck alignment is already the de-facto case. The question is how do we move beyond that to a post AI world.

lopmotr · on Jan 31, 2020

This problem is already solved with humans, some of whom are also very intelligent and goal oriented. It's done by noticing when those rare individuals do things we don't want and formalizing our preference in law. So many people agree on the concept of respecting the law that we all cooperate to enforce it. The law is an ugly mess of rules made to plug gaps in itself as they were taken advantage of. Nobody could have worked it all out up front and programmed it into a robot. It evolved along with the intelligent humans it was meant to control. It's precarious though - if the majority of humans don't keep vigilantly maintaining it, an intelligent human might move too fast and gain enough power to take control of the updating of the law! Then we have something like a tyranny. Which can and does sometimes kill all the humans (within its influence) in pursuit of its goals. So as long as we don't let robots move too fast for us to notice what's happening, we should be able to just keep writing more and more laws for them to follow. Not moving too fast for us to keep an eye on them might be the first one!

throwaway2048 · on Jan 31, 2020

The problem is "move slow" is itself a rule that is subject to violation, and "move fast" might be so fast that its nearly instantaneous in its catastrophe (say an AI that nukes the entire world)

merpnderp · on Jan 31, 2020

3. The ultimate source of information about human preferences is human behavior.

Great so the robot would see climate activist flying around in private jets pumping out several thousand times the average persons output, and promptly destroy the environment by assuming our actual goal is to destroy the environment with more CO2.

jerf · on Jan 31, 2020

A "sufficiently advanced intelligence" would presumably understand the inaccuracy of reducing billion's of peoples' worth of preferences down to a single English sentence, the effects of incomplete information, the effects of uncertainty on decision making, the differences in time preferences and in general the effects of adding "time" into dynamic systems rather than static models, human's inabilities to explain the real reasons why they do things to themselves or anybody else, and so on and so on.

To bring it to a more human scale, an alcoholic can both sincerely desire to stop drinking alcohol and also drink themselves into a stupor every night, and a "sufficiently advanced intelligence" won't sit there spinning trying to figure out exactly which statement "The human loves alcohol" and "The human hates alcohol" is true to the total exclusion of the other. (And I mean this as merely one example dimension, of which the "global warming" problem contains hundreds or thousands of such issues at a minimum.)

edmundsauto · on Jan 31, 2020

It may be a valid conclusion that the climate activist values their behavior more than what they think are their goals.

I support the idea of being physically fit. My behavior belies that.

KarlKemp · on Jan 31, 2020

That’s not the conclusion any sufficiently advanced intelligence would draw from the facts.

whatshisface · on Jan 31, 2020

That's the conclusion that a sufficiently advanced intelligence would draw from our behaviors. People act at odds to their stated and felt goals all the time, that's why "self control" is meaningful as a skill that different people have to different degrees.

lern_too_spel · on Jan 31, 2020

The example behavior has a long term goal that a sufficiently advanced intelligence would deduce.

ALittleLight · on Jan 31, 2020

The conclusion I draw from people attending climate conferences and flying is that we as a population care about preserving the environment and want the freedom to travel easily. Hopefully the AI could invent for us low impact travel methods and geoengineering to keep the environment in the ideal state.

throwaway2048 · on Jan 31, 2020

plenty of human behavior is directly harmful and counter to long term goals.

lern_too_spel · on Jan 31, 2020

We are talking about a specific example here, not human behavior in general.

throwaway2048 · on Jan 31, 2020

Intelligence has nothing to do with good, morality, sustainability or even living immediately into the future.

Do not mistake the two.

colorincorrect · on Jan 31, 2020

good point you raise: this statement is very behaviorist in leaning, and ignores human psychology, but perhaps this is to do with pragmatic reasons

jankotek · on Jan 31, 2020

I think authors do not understand basic rules of Asimovs universe.

Robotics laws are unbreakable limits hardwired into every positronic brains. On top of those you can put some other programming (servant, mining machine, space ship...). It is not possible to create brain without those rules.

Asimov universe explores machine learning, autonomous weapons, humanity etc using those principles.

New proposed laws are just tautology, "do what people want". No limits, no orders...

What should machine do if no people are around? Just sit idle?

What if majority behaves in "wrong" way (vote extremist politician)?

I would expect some other laws, for example there should be always an option for people to opt-out or leave country.

summerdown2 · on Jan 31, 2020

This sounds like a terrible rewrite. If I can think of places they fail, surely real life can come up with more. Essentially, this would lead to either:

a) The tyranny of the majority, or

b) The tyranny of those with strong preferences.

Let's say 51% of a population wants to get rid of the other 49% and wishes they were dead. What would stop the machine making it happen?

Let's say a super-person is born, who just wants stuff more than anyone else alive. Maybe he's had a bad childhood and been through terrible things, so now those things that he wants reach the level of desperation for him. Wouldn't this make the worth of his values more than other people to the machines?

Finally:

> Still, Russell feels optimistic. Although more algorithms and game theory research are needed, he said his gut feeling is that harmful preferences could be successfully down-weighted by programmers

... so we're back to square one. We've created a learning machine that can come up with its own morality. But... we're going to have to down-weight certain behaviours just to be sure. Doesn't that simply recurse to:

a) Down-weighting removes the learning, and

b) We have to trust ourselves to correctly program and define the down-weighting.

ben_w · on Jan 31, 2020

> Let's say a super-person is born, who just wants stuff more than anyone else alive. Maybe he's had a bad childhood and been through terrible things, so now those things that he wants reach the level of desperation for him. Wouldn't this make the worth of his values more than other people to the machines?

I’ve blogged about this recently.

Morality, thy discount is hyperbolic: https://kitsunesoftware.wordpress.com/2020/01/08/morality-th...

Normalised, n-dimensional, utility monster: https://kitsunesoftware.wordpress.com/2018/01/21/normalised-...

Disclaimer: although I do have a formal qualification in philosophy, I did not get a very good grade.

summerdown2 · on Jan 31, 2020

Cool!

Have you seen this cartoon? It's something I remembered from way back, and finally found it again!

http://www.smbc-comics.com/?id=2569

ben_w · on Jan 31, 2020

Thanks :)

That SMBC looks familiar, but I can’t be sure — the comic has so much interesting philosophical and transhumanist content, it can sometimes blend together.

c0restraint · on Jan 31, 2020

NOTE: Hacker News changed the title, so my comment might make less sense. The title was "Isaac Asimov’s three laws of robotics have been updated"

1. The machine’s only objective is to maximize the realization of human preferences.

Not all humans are the same. How would a robot deal with incongruent preferences? Or cultural differences that conflict?

2. The machine is initially uncertain about what those preferences are.

So am I... We often don't even know what our own preferences are. This has been studied. Given too many options of snow cone flavors, humans are less likely to even pick one! [0]

3. The ultimate source of information about human preferences is human behavior.

What?! The phrase "Do as I say, not as I do" comes to mind. Many people behave against their own intuition, best interests, or even their preferences, given the situation, peer pressure, or blackmail.

This "rewrite" is less preferable to Asimov's. Humans are fallible. I wouldn't want a robot to follow our lead.

[0] https://en.wikipedia.org/wiki/The_Paradox_of_Choice

automatoney · on Jan 31, 2020

I definitely agree with your points, although I think this reframing of the problem from "we need to explicitly state what we want" to "we should teach robots to want to learn what we want" is at least conceptually very useful and interesting.

I think the part about "robots could learn what Russell calls our meta-preferences: 'preferences about what kinds of preference-change processes might be acceptable or unacceptable.'" is what would be used to resolve preference conflict issues. People tend to be biased in consistent/similar ways so it doesn't seem implausible that a machine that could infer preference from action could take the extra step to infer circumstances affecting that preference.

jotm · on Jan 31, 2020

It works pretty well if you replace "human" with "the owners". Which is how any AI is/will be used, at least at first. A few people controlling most of them.

Sort of like Google, Facebook, Amazon, etc. Millions of people use them, but they're only subservient to their owners and their preferences.

throwaway2048 · on Jan 31, 2020

Agreed, humans are lazy, inefficient, and in the end massively limited by their biology. That's what ultimately stops bad preferences from going rouge and destroying the world (maybe).

Imagine an AI that took on the preferences of a serial killer for instance, or an AI that takes on preferences of a genocidal warmongering culture.

nradov · on Jan 31, 2020

If the "Paradox of Choice" is even a real phenomenon (questionable social science) it certainly doesn't apply to snow cones. When the Kona Ice truck pulls up outside my kid's school they all go nuts for it. Not sure which flavor you want? Put all 10 in one cone!

dang · on Jan 31, 2020

The submitted title ("Isaac Asimov’s three laws of robotics have been updated") broke the site guidelines, which ask: "Please use the original title, unless it is misleading or linkbait; don't editorialize." (https://news.ycombinator.com/newsguidelines.html)

Cherry-picking a detail from an article and putting it in the HN title is the quintessential kind of editorializing. Because threads are so sensitive to initial conditions, it ends up skewing an entire discussion. It also causes comments to make less sense when moderators come along and revert the title, as we've done here.

Submitting a story on HN doesn't convey any special right to frame it for other readers. If you want to say what you think is important about an article, please do that in the comments. Then your view will be on a level playing field with everyone else's.

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

aray · on Jan 31, 2020

> Because threads are so sensitive to initial conditions, it ends up skewing an entire discussion.

I haven't thought about this before on HN, but it makes a lot of sense. I'm curious if you or others have written about this intuition and your experience with it -- I'd like to understand it more.

dang · on Jan 31, 2020

I've written about it in comments over the years. You might find some interesting cases in there: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu.... It's one of the more reliable phenomena we observe on HN.

vo2maxer · on Jan 31, 2020

I’m sorry. You’re correct. I got carried away by the Asimov connection.

irrational · on Jan 31, 2020

I was expecting there to be an additional law or something. This doesn't seem to be an update, but a complete replacement.

ridewinter · on Jan 31, 2020

The article's main example of an AI fail is YouTube's recommendation engine optimizing for extremist content. But it didn't seem to explain how "inverse reinforcement learning" would solve it - don't people's clicks already model their desires?

Nothing in here seems to be an advancement towards AGI. I'm sure their robot can learn the optimal rules for driving in a simulator, but what about the unexpected in the real world? We don't even have a clue as to how human creativity actually works, much less how to make a creative machine (ie an AGI).

fallous · on Jan 31, 2020

In fact that example argues against Russell's third law. The recommendation engine is observing the user's behavior and inferring that watching a video means the user agrees with the video, and so starts an escalation until it sees a refusal to click/watch.

Simply watching a video is not an expression of agreement nor an endorsement of the content. Any of us read articles and books with which we may disagree if for no other reason than to fully understand the opposing viewpoint, and this is no less true for videos, audio, etc etc.

Want to know if the user actually likes the content? Ask them. Stop thinking that statistical inference is better or even equal to directly measured data from a specific entity, especially when dealing with qualitative concepts rather than quantitative.

ridewinter · on Jan 31, 2020

Thanks. You should've written this article :)

6d6b73 · on Jan 31, 2020

I don't know how any one can believe that AI will be so smart that it can turn the entire world into paperclips yet so stupid that it won't know that it's a bad idea.

Rury · on Feb 1, 2020

Yeah I know what you mean.

Frankly, I feel there's major problems behind the concepts of AI and even intelligence itself - and it's difficult to articulate why. It’s as if these terms require aggrandizing to the point of impossibility or they lose all their apparent meaning. Which is why I feel we'll never achieve what we call (Strong/General) AI, or if we do, we will find ways to be unimpressed by it...

garmaine · on Jan 31, 2020

Intelligence and what you do with it are orthogonal concepts.

mellosouls · on Jan 31, 2020

This is fine for AI tools that aren't remotely intelligent like the ones over-hyped today; but for any hypothesised AGI, the idea of programming it to be fundamentally constrained by updated Asimov laws is naive, technically stupid, and - in its implications of a sentient slave species - morally repugnant.

Isamu · on Jan 31, 2020

>In his recent book, Human Compatible, Russell lays out his thesis in the form of three “principles of beneficial machines,” echoing Isaac Asimov’s three laws of robotics from 1942, but with less naivete.

It is a recent trend to brush off Isaac Asimov's laws without an actual critique. Which betrays a lack of concrete thought in the matter.

It's not that Asimov's laws are flawless, but that countering them seems easier than it really is. I have seen various bad, hand-wave dismissals but I can't recall any careful critiques.

I have the new book, Human Compatible, and I buy Russell's argument but note that his rules are much more abstract, and therefore it is perhaps harder to counter.

Shorel · on Feb 1, 2020

Just read the Asimov novels that introduce the rules.

Every single one of them fails in dramatic ways and that's what drives the story forward.

darepublic · on Feb 3, 2020

The whole evil genie thing is pretty unlikely imo. You're postulating an AI that is too stupid to not understand that it shouldn't take your wishes in some dead literal sense (make my home eco friendly => bulldoze it and plant some trees) but also do intelligent that it can summon the resources and make those idle wishes come true in a terrible grandiose way. There are many dangers around AI but this particular popular fantasy is not well thought out.

myself248 · on Jan 31, 2020

There's a 1946 sci-fi story called A Logic Named Joe, which nails this.

AIs are programmed to do what people ask unless it's bad, and one day, that "unless" mechanism fails. Joe starts doing everything people ask of it. Stalking that would make Zuckerberg proud....

http://www.baen.com/chapters/W200506/0743499107___2.htm

foreverloop · on Feb 2, 2020

Not really a comment super relevant to the article in the post, but seems to touch on some interesting subjects and probably a good conversation starter: one of the videos by Isaac Arthur - Technological Singularity ( https://www.youtube.com/watch?v=YXYcvxg_Yro )

sebringj · on Jan 31, 2020

Is there such a set of rules/traits that are being defined to be maximized? I remember Elon saying to "maximize human freedom" as a goal but I could be out of context or misquoting.

justsomedood · on Jan 31, 2020

The argument at the beginning of the article includes YouTube recommendations front more extreme, but isn't the reason that happens because of these new proposed 3 laws?

jacknews · on Jan 31, 2020

this moves the conversation from 'arabian nights' to 'the tempest'

then what?