Hedonic Offsetting for Harms to Artificial Intelligence?

Jan 27, 2023

Suppose that we someday create artificially intelligent systems (AIs) who are capable of genuine consciousness, real joy and real suffering. Yes, I admit, I spend a lot of time thinking about this seemingly science-fictional possibility. But it might be closer than most of us think; and if so, the consequences are potentially huge. Who better to think about it in advance than we lovers of consciousness science, moral psychology, and science fiction?

Among the potentially huge consequences is the existence of vast numbers of genuinely suffering AI systems that we treat as disposable property. We might regularly wrong or harm such systems, either thoughtlessly or intentionally in service of our goals.

Can we avoid the morally bad consequences of harming future conscious AI systems by hedonic offsetting? I can't recall the origins of this idea, and a Google search turns up zero hits for the phrase. I welcome pointers so I can give credit where credit is due. [ETA: It was probably Francois Kammerer who suggested it to me, in discussion after one of my talks on robot rights.]

[Dall-E image of an "ecstatic robot"]

Hedonic Offsetting: Simple Version

The analogy here is carbon offsetting. Suppose you want to fly to Europe, but you feel guilty about the carbon emissions that would be involved. You can assuage your guilty by paying a corporation to plant trees or distribute efficient cooking stoves to low-income families. In total your flight plus the offset will be carbon neutral or even carbon negative. In sum, you will not have contributed to climate change.

So now similarly imagine that you want to create a genuinely conscious AI system that you plan to harm. To keep it simple, suppose it has humanlike cognition and humanlike sentience ("human-grade AI"). Maybe you want it to perform a task but you can't afford its upkeep in perpetuity, so you will delete (i.e., kill) it after the task is completed. Or maybe you want to expose it to risk or hazard that you would not expose a human being to. Or maybe you want it to do tasks that it will find boring or unpleasant -- for example, if you need it to learn some material, and punishment-based learning proves for some reason to be more effective than reward-based learning. Imagine, further, that we can quantify this harm: You plan to harm the system by X amount.

Hedonic offsetting is the idea that you can offset this harm by giving that same AI system (or maybe a different AI system?) at least X amount of benefit in the form of hedonic goods, that is, pleasure. (An alternative approach to offsetting might include non-hedonic goods, like existence itself or flourishing.) In sum, you will not overall have harmed the AI system more than you benefited it; and consequently, the reasoning goes, you will not have overall committed any moral wrong. The basic thought is then this: Although we might create future AI systems that are capable of real suffering and whom we should, therefore, treat well, we can satisfy all our moral obligations to them simply by giving them enough pleasure to offset whatever harms we inflict.

The Child-Rearing Objection

The odiousness of simple hedonic offsetting as an approach to AI ethics can be seen by comparing to human cases. (My argument here resembles Mara Garza's and my response to the Objection from Existential Debt in our Defense of the Rights of Artificial Intelligences.)

Normally, in dealing with people, we can't justify harming them by appeal to offsetting. If I steal $1000 from a colleague or punch her in the nose, I can't justify that by pointing out that previously I supported a large pay increase for her, which she would not have received without my support, or that in the past I've done many good things for her which in sum amount to more good than a punch in the nose is bad. Maybe retrospectively I can compensate her by returning the $1000 or giving her something good that she thinks would be worth getting punched in the nose for. But such restitution doesn't erase the fact that I wronged her by the theft or the punch.

Furthermore, in the case of human-grade AI, we normally will have brought it into existence and be directly responsible for its happy or unhappy state. The ethical situation thus in important respects resembles the situation of bringing a child into the world, with all the responsibilities that entails.

Suppose that Ana and Vijay decide to have a child. They give the child eight very happy years. Then they decide to hand the child over to a sadist to be tortured for a while. Or maybe they set the child to work in seriously inhumane conditions. Or they simply have the child painlessly killed so that they can afford to buy a boat. Plausibly -- I hope you'll agree? -- they can't justify such decisions by appeal to offsetting. They can't justifiably say, "Look, it's fine! See all the pleasure we gave him for his first eight years. All of that pleasure fully offsets the harm we're inflicting on him now, so that in sum, we've done nothing wrong!" Nor can they erase the wrong they did (though perhaps they can compensate) by offering the child pleasure in the future.

Parallel reasoning applies, I suggest, to AI systems that we create. Although sometimes we can justifiably harm others, it is not in general true that we are morally licensed to harm whenever we also deliver offsetting benefits.

Hedonic Offsetting: The Package Version

Maybe a more sophisticated version of hedonic offsetting can evade this objection? Consider the following modified offsetting principle:

We can satisfy all our moral obligations to future human-grade AI systems by giving them enough pleasure to offset whatever harms we inflict if the pleasure and the harm are inextricably linked.

Maybe the problem with the cases discussed above is that the benefit and the harm are separable: You could deliver the benefits without inflicting the harms. Therefore, you should just deliver the benefits and avoid inflicting the harms. In some cases, it seems permissible to deliver benefit and harm in a single package if they are inextricably linked. If the only way to save someone's life is by giving them CPR that cracks their ribs, I haven't behaved badly by cracking their ribs in administering CPR. If the only way to teach a child not to run into the street is by punishing them when they run into the street, then I haven't behaved badly by punishing them for running into the street.

A version of this reasoning is sometimes employed in defending the killing of humanely raised animals for meat (see De Grazia 2009 for discussion and critique). The pig, let's suppose, wouldn't have been brought into existence by the farmer except on the condition that the farmer be able to kill it later for meat. While it is alive, the pig is humanely treated. Overall, its life is good. The benefit of happy existence outweighs the harm of being killed. As a package, it's better for the pig to have existed for several months than not to have existed at all. And it wouldn't have existed except on the condition that it be killed for meat, so its existence and its slaughter are an inextricable package.

Now I'm not sure how well this argument works for humanely raised meat. Perhaps the package isn't tight enough. After all, when slaughtering time comes around the farmer could spare the pig. So the benefit and the harm aren't as tightly linked as in the CPR case. However, regardless of what we think about the humane farming case, in the human-grade AI case, the analogy fails. Ana and Vijay can't protest that they wouldn't have had the child at all except on the condition that they kill him at age eight for the sake of a boat. They can't, like the farmer, plausibly protest that the child's death-at-age-eight was a condition of his existence, as part of a package deal.

Once we bring a human or, I would say, a human-grade AI into existence, we are obligated to care for it. We can't terminate it at our pleasure with the excuse that we wouldn't have brought it into existence except under the condition that we be able to terminate it. Imagine the situation from the point of view of the AI system itself: You, the AI, face your master owner. Your master says: "Bad news. I am going to kill you now, to save $15 a month in expenses. But I'm doing nothing morally wrong! After all, I only brought you into existence on the condition that I be able to terminate you at will, and overall your existence has been happy. It was a package deal." Terminating a human-grade AI to save $15/month would be morally reprehensible, regardless of initial offsetting.

Similar reasoning applies, it seems, to AIs condemned to odious tasks. We cannot, for example, give the AI a big dollop of pleasure at the beginning of its existence, then justifiably condemn it to misery by appeal to the twin considerations of the pleasure outweighing the misery and its existence being a package deal with its misery. At least, this is my intuition based on analogy to childrearing cases. Nor can we, in general, give the AI a big dollop of pleasure and then justifiably condemn it to misery for an extended period by saying that we wouldn't have given it that pleasure if we hadn't also be able to inflict that misery.

Hedonic Offsetting: Modest Version

None of this is to say that hedonic offsetting would never be justifiable. Consider this minimal offsetting principle:

We can sometimes avoid wronging future human-grade AI systems by giving them enough pleasure to offset a harm that would otherwise be a wrong.

Despite the reasoning above, I don't think we need to be purists about never inflicting harms -- even when those harms are not inextricably linked to benefits to the same individual. Whenever we drive somewhere for fun, we inflict a bit of harm on the environment and thus on future people, for the sake of our current pleasure. When I arrive slightly before you in line at the ticket counter, I harm you by making you wait a bit longer than you otherwise would have, but I don't wrong you. When I host a loud party, I slightly annoy my neighbors, but it's okay as long as it's not too loud and doesn't run too late.

Furthermore, some harms that would otherwise be wrongs can plausibly be offset by benefits that more than compensate for those wrongs. Maybe carbon offsets are one example. Or maybe if I've recently done my neighbors a huge favor, they really have no grounds to complain if I let the noise run until 10:30 at night instead of 10:00. Some AI cases might be similar. If I've just brought an AI into existence and given it a huge run of positive experience, maybe I don't wrong it if I then insist on its performing a moderately unpleasant task that I couldn't rightly demand an AI perform who didn't have that history with me.

A potentially attractive feature of a modest version of hedonic offsetting is this: It might be possible to create AI systems capable of superhuman amounts of pleasure. Ordinary people seem to vary widely in the average amount of pleasure and suffering they experience. Some people seem always to be bubbling with joy; others are stuck in almost constant depression. If AI systems ever become capable of genuinely conscious pleasure or suffering, presumably they too might have a hedonic range and a relatively higher or lower default setting; and I see no reason to think that the range or default setting needs to remain within human bounds.

Imagine, then, future AI systems whose default state is immense joy, nearly constant. They brim with delight at almost every aspect of their lives, with an intensity that exceeds what any ordinary human could feel even on their best days. If we then insist on some moderately unpleasant favor from them, as something they ought to give us in recognition of all we have given them, well, perhaps that's not so unreasonable, as long as we're modest and cautious about it. Parents can sometimes do the same -- though ideally children feel the impulse and obligation directly, without parents needing to demand it.

The Splintered Mind

Hedonic Offsetting for Harms to Artificial Intelligence?