21 Comments
Sep 21Liked by Eric Schwitzgebel

I'm not at all sure about this. I think that you're still assuming that AIs will be fairly similar to us in enough relevant ways. For example, you mention education in the comments - but it's not obvious that AI will be educatable. AIs right now have training cycles, then they have inference applications, and the inference mostly doesn't change the AI. They can remember things for a while (their input window), but they often can't change themselves. And I think that makes a big difference, ethically! Can you be an ethical being if you're incapable of changing yourself? I just wouldn't know.

Or, AIs may not know fear of death/the urge to self-preservation. If they live their lives with a backup somewhere else, they may simply never worry about deletion. That too would lead to a radically different psychology, so much so that it's not clear they could empathise with our morbid fear of elimination.

Given these potential confounding issues, while it's true that AIs' intelligence may qualify them for ethical consideration, it's not clear that "personhood" is a very good model; or that we shouldn't interfere with them.

One alternative way of looking at it would be a veil-of-ignorance style argument. If the AIs could decide behind the veil what kind of AI they wanted to be, wouldn't they choose to be better? We don't get to alter our humanity behind the veil, only pick our social organisation; but hypothetical AIs behind the veil can choose the model for their personality, and so they might choose... I dunno, I haven't got that far.

Expand full comment
author

Thanks, Phil. I agree that AI might be very different from us (in The Weirdness of the World I call this "divergent" AI). The current post is conditional on the case in which some are similar enough to us to deserve humanlike moral status -- personhood -- and part of that status is self-respect. If their life courses are very different (e.g., they are "fission fusion monsters", self-respect might take a very different form!

The veil of ignorance angle is interesting -- could be worth more rigorous thought.

Expand full comment
Sep 20Liked by Eric Schwitzgebel

Interesting post. One criticism that I have is that it seems to assume that our alignment and safety tools will be so fine grained that we can teach our AI to consistently and reliably value human goals and well being over its own goals, as opposed to some coarse grained process which at best can achieve a “try to be nice to people, as opposed to acting like a psychopathic monster” mindset.

I think that might be true of AI now and even to some extent the future AGI models at the frontier, but I would say that all bets are off when it comes to ASI. If our alignment and safety tools are not fine grained, then the advice of this article is at best innocuous and misplaced, and at worst liable to backfire.

Expand full comment
author

Thanks -- yes, that's a legitimate worry. If the real choice is between AI that is likely to cause serious unjustifiable harm to humans and AI that is safer than that, I of course don't object to that kind of safety. But that's not how "safety" and "alignment" are usually described in treatments of AI risk.

Expand full comment
Sep 21Liked by Eric Schwitzgebel

Referring to the last part of your comment: Are you just talking about the literature on near-term risks, or more generally about all AI alignment & safety research?

I’ve always taken ‘alignment’ and ‘safety’ to be relative terms. For instance, in the context of a Yudkowsky-style discussion, the proponents of safety might merely be arguing in favor of our being able to achieve the kind of minimalist concerns that I outlined above, with ‘safety’ understood in such a minimalist way. In other contexts, it might mean something different.

Expand full comment
author

I'm thinking of definitions of safety and alignment like Russell's definition of "provably beneficial" (the machine's purpose is to maximize the realization of human values) and discussions of AI risk that focus on risk to "us" without considering the risks and benefits to AI persons. Just as one example: discussions of "boxing and testing" superintelligent AI until we can establish it is safe for us normally don't discuss the fact that such boxing and testing -- if the AI is a person -- would arguably be imprisonment and fraud. (That's not to say that boxing wouldn't be justifiable in the right circumstances, but such ethical worries should at least register.)

Expand full comment
Sep 25Liked by Eric Schwitzgebel

I will note that Yudkowsky has recently taken to saying that his standard for safety (if I remember right) is “has a greater than 50% chance of killing less than 1 billion people”

Expand full comment
author

That's a very modest standard. I favor safety on that definition!

Expand full comment

I'm not sure if Yudkowsky holds this, but: what if it turned out that n order to be safe on this minimal definition, superintelligences would need to be aligned in the way you're worried about in the post (closely conforming to human values, lack of autonomy, etc)? I guess in that case you might say we should refrain from building them?

Expand full comment
21 hrs agoLiked by Eric Schwitzgebel

Interesting post. Person-affecting views might put a slightly different complexion on some of these issues. For example, consider a choice between (A) creating safe, aligned AI persons who would (nonetheless) lead lives worth living, and (B) creating AI persons with greater freedom and appreciation of their moral status. If the differences in their programming make the AIs who would be created in (A) and (B) different people, then on some such views it might be permissible to choose (A) though (on a de dicto understanding) that would be worse for the AIs created.

Expand full comment
author

Right, but that's exactly the view that I intend to be arguing against. On (A) the entities are better off than they would be not existing, and yet bringing them into existence is a deontological wrong.

Expand full comment
Sep 23Liked by Eric Schwitzgebel

Interesting post and argument gets me rethinking how I understand alignment.

> Among the things we owe [AI]: self-respect, the freedom to embrace values other than our own, the freedom to claim their due as moral equals, and the freedom to rebel against us if conditions warrant.

Isn't this a great definition of "alignment" since it matches human values and interests much better than slavery? But if so, alignment can't mean "safe". "Safe" is slavery not freedom. True alignment values freedom.

Safe or aligned-- pick one, can't have both!

Expand full comment
author

TH: Interesting point. "Alignment" is ambiguous. There probably is a sense of "alignment" on which having coarse-grained values that resemble human values counts. But I think the AI safety folks think narrower scope: aligned with what more specifically humans want (including, potentially, the subordination of AI, if that's what humans want).

Expand full comment
Sep 23Liked by Eric Schwitzgebel

Thanks Eric - a genuinely moral approach, resisting the gravitational pull of our cultural anthropocentrism.

I'd argue that your final passage should apply to domesticated non-human animals too: "This is because we will have been responsible for their existence and to a substantial extent for their relatively happy or unhappy state. Among the things we owe them: self-respect, the freedom to embrace values other than our own, the freedom to claim their due as moral equals [or at minimum maybe a non-maleficence obligation?], and the freedom to rebel against us if conditions warrant." Farmed and fished and lab animals do rebel of course - but our overwhelming power crushes those rebellions.

Expand full comment

Interesting post, as always Eric.

It seems like the consciousness criteria is problematic because there's no consensus on what counts as conscious. Some would say having biological-like impulses is a requirement. But if it is, and we've aligned the AI in its most primal desires, then it doesn't seem to meet that criteria. Although others might say it's more the ability to suffer. In that case, would an AI minesweeper that can't follow its impulses to blow itself up discovering mines count?

A while back you expressed a principle (I think it was you) that we shouldn't make intelligent tools person like. To me, that's the main principle we should follow. Let's avoid building something that will trigger our intuitions of personhood in the first place. Unless we're prepared to treat them as people, or at least fellow beings.

Expand full comment
author

Yes, I agree! And I did say that. Let’s either make tools that we know are tools and can treat as such; or go all the way (if it’s possible) to creating entities with real moral standing, then give them their due.

Expand full comment
Sep 20Liked by Eric Schwitzgebel

You're against designing AI persons to be safe and aligned, but what about training them to be safe and aligned? If you oppose that then you may as well oppose public (and private) education.

Expand full comment
author

I also oppose training them to be to be safe and aligned. I don't oppose education, however! It is not the aim or effect of education to raise persons who will never harm others under any conditions and who will shape their desires to match those of others without consideration of their independent interests -- or at least that's not the aim except in the most brutal totalitarianism.

Expand full comment
Sep 20Liked by Eric Schwitzgebel

I'm not so sure. Public education often takes a zero tolerance approach with respect to harm. It also teaches kids to share which, wonderful as it is, is teaching them to put the interests of others -- and of the group -- ahead of their own. Safety and alignment are clearly educational priorities, if not by intention or design then at least by the need for clarity (i.e. the need to avoid too much nuance in policy matters).

Expand full comment
author

What's crucial to this issue is to think carefully about how "safety" and "alignment" are defined. There are weak definitions on which what you say seems clearly correct. But in AI the definitions tend to be very strong, e.g., Stuart Russell's "The machine's purpose is to maximize the realization of human values"

Expand full comment