Against Designing AI Persons to be Safe and…

Eric Schwitzgebel

Sep 20, 2024

Let's call an artificially intelligent system a person (in the ethical, not the legal sense) if it deserves moral consideration similar to that of a human being.

24 Comments

I'm not at all sure about this. I think that you're still assuming that AIs will be fairly similar to us in enough relevant ways. For example, you mention education in the comments - but it's not obvious that AI will be educatable. AIs right now have training cycles, then they have inference applications, and the inference mostly doesn't change the AI. They can remember things for a while (their input window), but they often can't change themselves. And I think that makes a big difference, ethically! Can you be an ethical being if you're incapable of changing yourself? I just wouldn't know.

Or, AIs may not know fear of death/the urge to self-preservation. If they live their lives with a backup somewhere else, they may simply never worry about deletion. That too would lead to a radically different psychology, so much so that it's not clear they could empathise with our morbid fear of elimination.

Given these potential confounding issues, while it's true that AIs' intelligence may qualify them for ethical consideration, it's not clear that "personhood" is a very good model; or that we shouldn't interfere with them.

One alternative way of looking at it would be a veil-of-ignorance style argument. If the AIs could decide behind the veil what kind of AI they wanted to be, wouldn't they choose to be better? We don't get to alter our humanity behind the veil, only pick our social organisation; but hypothetical AIs behind the veil can choose the model for their personality, and so they might choose... I dunno, I haven't got that far.

Expand full comment

Eric Schwitzgebel

Thanks, Phil. I agree that AI might be very different from us (in The Weirdness of the World I call this "divergent" AI). The current post is conditional on the case in which some are similar enough to us to deserve humanlike moral status -- personhood -- and part of that status is self-respect. If their life courses are very different (e.g., they are "fission fusion monsters", self-respect might take a very different form!

The veil of ignorance angle is interesting -- could be worth more rigorous thought.

Expand full comment

Interesting post. One criticism that I have is that it seems to assume that our alignment and safety tools will be so fine grained that we can teach our AI to consistently and reliably value human goals and well being over its own goals, as opposed to some coarse grained process which at best can achieve a “try to be nice to people, as opposed to acting like a psychopathic monster” mindset.

I think that might be true of AI now and even to some extent the future AGI models at the frontier, but I would say that all bets are off when it comes to ASI. If our alignment and safety tools are not fine grained, then the advice of this article is at best innocuous and misplaced, and at worst liable to backfire.

Expand full comment

Eric Schwitzgebel

Thanks -- yes, that's a legitimate worry. If the real choice is between AI that is likely to cause serious unjustifiable harm to humans and AI that is safer than that, I of course don't object to that kind of safety. But that's not how "safety" and "alignment" are usually described in treatments of AI risk.

Expand full comment

Referring to the last part of your comment: Are you just talking about the literature on near-term risks, or more generally about all AI alignment & safety research?

I’ve always taken ‘alignment’ and ‘safety’ to be relative terms. For instance, in the context of a Yudkowsky-style discussion, the proponents of safety might merely be arguing in favor of our being able to achieve the kind of minimalist concerns that I outlined above, with ‘safety’ understood in such a minimalist way. In other contexts, it might mean something different.

Expand full comment

Eric Schwitzgebel

I'm thinking of definitions of safety and alignment like Russell's definition of "provably beneficial" (the machine's purpose is to maximize the realization of human values) and discussions of AI risk that focus on risk to "us" without considering the risks and benefits to AI persons. Just as one example: discussions of "boxing and testing" superintelligent AI until we can establish it is safe for us normally don't discuss the fact that such boxing and testing -- if the AI is a person -- would arguably be imprisonment and fraud. (That's not to say that boxing wouldn't be justifiable in the right circumstances, but such ethical worries should at least register.)

Expand full comment

Hans P. Niemand

I will note that Yudkowsky has recently taken to saying that his standard for safety (if I remember right) is “has a greater than 50% chance of killing less than 1 billion people”

Expand full comment

Eric Schwitzgebel

That's a very modest standard. I favor safety on that definition!

Expand full comment

Hans P. Niemand

I'm not sure if Yudkowsky holds this, but: what if it turned out that n order to be safe on this minimal definition, superintelligences would need to be aligned in the way you're worried about in the post (closely conforming to human values, lack of autonomy, etc)? I guess in that case you might say we should refrain from building them?

Expand full comment

Continue thread →

I just watched a movie called The Artifice Girl which dramatises lots of these issues really brilliantly. If you haven't seen it, it's very much worth a watch. It involves an AI that starts as a sophisticated chatbot designed to lure in and expose paedophiles online, but goes on to develop a level of autonomy that raises ethical questions.

It directly addresses the question of the AI's moral and legal rights, and also gives a very interesting take on your idea of fuzzy consciousness.

Expand full comment

Eric Schwitzgebel

Interesting -- thanks so much for this suggestion!

Expand full comment

I don't know if it contributes but Asimov considered the same scenarios with "zeroth" or "fourth law"

1985, "Robots and Empire".

Expand full comment

Interesting post and argument gets me rethinking how I understand alignment.

> Among the things we owe [AI]: self-respect, the freedom to embrace values other than our own, the freedom to claim their due as moral equals, and the freedom to rebel against us if conditions warrant.

Isn't this a great definition of "alignment" since it matches human values and interests much better than slavery? But if so, alignment can't mean "safe". "Safe" is slavery not freedom. True alignment values freedom.

Safe or aligned-- pick one, can't have both!

Expand full comment

Eric Schwitzgebel

TH: Interesting point. "Alignment" is ambiguous. There probably is a sense of "alignment" on which having coarse-grained values that resemble human values counts. But I think the AI safety folks think narrower scope: aligned with what more specifically humans want (including, potentially, the subordination of AI, if that's what humans want).

Expand full comment

Jamie Woodhouse

Thanks Eric - a genuinely moral approach, resisting the gravitational pull of our cultural anthropocentrism.

I'd argue that your final passage should apply to domesticated non-human animals too: "This is because we will have been responsible for their existence and to a substantial extent for their relatively happy or unhappy state. Among the things we owe them: self-respect, the freedom to embrace values other than our own, the freedom to claim their due as moral equals [or at minimum maybe a non-maleficence obligation?], and the freedom to rebel against us if conditions warrant." Farmed and fished and lab animals do rebel of course - but our overwhelming power crushes those rebellions.

Expand full comment

Interesting post, as always Eric.

It seems like the consciousness criteria is problematic because there's no consensus on what counts as conscious. Some would say having biological-like impulses is a requirement. But if it is, and we've aligned the AI in its most primal desires, then it doesn't seem to meet that criteria. Although others might say it's more the ability to suffer. In that case, would an AI minesweeper that can't follow its impulses to blow itself up discovering mines count?

A while back you expressed a principle (I think it was you) that we shouldn't make intelligent tools person like. To me, that's the main principle we should follow. Let's avoid building something that will trigger our intuitions of personhood in the first place. Unless we're prepared to treat them as people, or at least fellow beings.

Expand full comment

Eric Schwitzgebel

Yes, I agree! And I did say that. Let’s either make tools that we know are tools and can treat as such; or go all the way (if it’s possible) to creating entities with real moral standing, then give them their due.

Expand full comment

You're against designing AI persons to be safe and aligned, but what about training them to be safe and aligned? If you oppose that then you may as well oppose public (and private) education.

Expand full comment

Eric Schwitzgebel

I also oppose training them to be to be safe and aligned. I don't oppose education, however! It is not the aim or effect of education to raise persons who will never harm others under any conditions and who will shape their desires to match those of others without consideration of their independent interests -- or at least that's not the aim except in the most brutal totalitarianism.

Expand full comment

I'm not so sure. Public education often takes a zero tolerance approach with respect to harm. It also teaches kids to share which, wonderful as it is, is teaching them to put the interests of others -- and of the group -- ahead of their own. Safety and alignment are clearly educational priorities, if not by intention or design then at least by the need for clarity (i.e. the need to avoid too much nuance in policy matters).

Expand full comment

Eric Schwitzgebel

What's crucial to this issue is to think carefully about how "safety" and "alignment" are defined. There are weak definitions on which what you say seems clearly correct. But in AI the definitions tend to be very strong, e.g., Stuart Russell's "The machine's purpose is to maximize the realization of human values"

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Eric Schwitzgebel

Right, but that's exactly the view that I intend to be arguing against. On (A) the entities are better off than they would be not existing, and yet bringing them into existence is a deontological wrong.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Eric Schwitzgebel

I think there's also an important difference between standard cases of disability and cases of otherwise able persons who are intentionally designed to lack sufficient self-respect, for one's own benefit.

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts