Humble Superintelligence

Jan 09, 2026

I’m enjoying -- well, maybe enjoying isn’t the right word -- Yudkowsky and Soares’ If Anyone Builds It Everyone Dies. I agree with them that if we build superintelligent AI, there’s a significant chance that it will cause the extinction of humanity. They seem to think our destruction would be almost certain. I don’t share their certainty, for two reasons:

First, it’s possible that superintelligent AI would be humanity, or at least much of what’s worth preserving in humanity, though maybe called “transhuman” or “posthuman” -- our worthy descendants.

Second -- what I’ll focus on today -- I think we might design superintelligent AI to be humble, cautious, and multilateral. Humble superintelligence is something we can and should aim for if we want to reduce existential risk.

Humble: If you and I disagree, of course I think I’m right and you’re wrong. That follows from the fact that we disagree. But if I’m humble, I recognize a significant chance that you’re right and I’m wrong. Intellectual humility is metacognitive attitude: one of uncertainty, openness to evidence, and respect for dissenting opinions.

Superintelligent AI could probably be designed to be humble in this sense. Note that intellectual humility is possible even when one is surrounded by less skilled and knowledgeable interlocutors.

Consider a philosophy professor teaching Kant. The professor knows far more about Kant and philosophy than their undergraduates. They can arrogantly insist upon their interpretation of Kant, or they can humbly allow that they might be mistaken and that a less philosophically trained undergraduate could be right on some point of interpretation, even if the professor could argue circles around the student. One way to sustain this humility is to imagine an expert philosopher who disagrees. A superintelligent AI could similarly imagine another actual or future superintelligent AI with a contrary view.

Cautious: Caution is often a corollary of humility, though it could probably also be instilled directly. Minimize disruption. Even if you think a particular intervention would be best, don’t simply plow ahead. Test it cautiously first. Seek the approval and support of others first. Take a baby step in that direction, then pause and see what unfolds and how others react. Wait awhile, then reassess.

One fundamental problem with standard consequentialist and decision-theoretic approaches to ethics is that they implicitly make everyone a decider for the world. If by your calculation, outcome A is better than outcome B, you should ensure that A occurs. The result can be substantial risk amplification. If A requires only one person’s action, then even if 99% of people think B is better, the one dissenter who thinks that A is better can bring it about.

A principle of caution entails often not doing what one thinks is for the best, when doing so would be disruptive.

Multilateral: Humility and caution invite multilaterality, though multilaterality too might be instilled directly. A multilateral decision maker will not act alone. Like the humble and cautious agent, they do not simply pursue what they think is best. Instead, they seek the support and approval of others first. These others could include both human beings and other superintelligent AI systems designed along different lines or with different goals.

Discussions of AI risk often highlight opinion manipulation: an AI swaying human opinion toward its goals even if those goals conflict with human interests. Genuine multilaterality rejects manipulation. A multilateral AI might present information and arguments to interlocutors, but it would do so humbly and noncoercively -- again like the philosophy professor who approaches Kant interpretation humbly. Both sides of an argument can be presented evenhandedly. Even better, other superintelligent AI systems with different views can be included in the dialogue.

One precedent is Burkean conservativism. Reacting to the French Revolution, Edmund Burke emphasized that existing social institutions, though imperfect, had been tested by time. Sudden and radical change has wide, unforeseeable consequences and risks making things far worse. Thus, slow, incremental change is usually preferable.

In a social world with more than one actual or possible superintelligent AI, even a superintelligent AI will often be unable to foresee all the important consequences of intervention. To predict what another superintelligent AI would do, one would need to model the other system’s decision processes -- and there might be no shortcut other than to actually implement all of that other system’s anticipated reasoning. If each AI is using their full capacity, especially in dynamic response to the other, the outcome will often not be in principle foreseeable in real time by either party.

Thus, humility and caution encourage multilaterality, and multilaterality encourages humility and caution.

Another precedent is philosophical Daoism. As I interpret the ancient Daoists, the patterns of the world, including life and death, are intrinsically valuable. The world defies rigid classification and the application of finitely specifiable rules. We should not confidently trust our sense of what is best, nor should we assertively intrude on others. Better is quiet appreciation, letting things be, and non-disruptively adding one’s small contribution to the flow of things.

One might imagine a Daoist superintelligence viewing humans much as a nature lover views wild animals: valuing the untamed processes for their own sake and letting nature take its sometimes painful course rather than intervening either selfishly for one’s own benefit or paternalistically for the supposed benefit of the animals.

hn.cbp

Jan 24

What strikes me is that the proposal here isn’t really about making superintelligence safer by adding virtues, but about quietly retreating from a traditional model of agency itself.

Humility, caution, and multilaterality all function as ways of withholding authorship, rather than perfecting it. That suggests the core risk isn’t intelligence per se, but the assumption that any sufficiently capable system should be allowed to “decide for the world” at all.

I explored a related tension recently around the idea that behavioral sophistication doesn’t automatically warrant full agency attribution — and that many alignment failures stem from collapsing the two.

Kenny Easwaran

Jan 10

I like these points, and I like to think a sophisticated decision theoretic consequentialism should be able to incorporate them! I don’t think decision theory requires anyone to do the act that they calculate as having highest expected value - it requires that they prefer the act with highest expected value, regardless of what they calculate. If there are serious worries that one might be calculating incorrectly (in whatever sense “incorrect” can be understood here) then there may be other policies that are much better than calculating or reasoning explicitly. And if one has the kind of “functional decision theory” point of view that Soares and Yudkowsky do, then one should recognize that choosing a policy isn’t just about choosing one’s own policy, but choosing a policy for all agents relatively like you, many of whom will be making what you regard as mistakes. So it should be able to justify the relevant sorts of humility and caution.

1 reply by Eric Schwitzgebel

5 more comments...

The Splintered Mind

Discussion about this post

Ready for more?