Discussion about this post

User's avatar
hn.cbp's avatar

What strikes me is that the proposal here isn’t really about making superintelligence safer by adding virtues, but about quietly retreating from a traditional model of agency itself.

Humility, caution, and multilaterality all function as ways of withholding authorship, rather than perfecting it. That suggests the core risk isn’t intelligence per se, but the assumption that any sufficiently capable system should be allowed to “decide for the world” at all.

I explored a related tension recently around the idea that behavioral sophistication doesn’t automatically warrant full agency attribution — and that many alignment failures stem from collapsing the two.

Kenny Easwaran's avatar

I like these points, and I like to think a sophisticated decision theoretic consequentialism should be able to incorporate them! I don’t think decision theory requires anyone to do the act that they calculate as having highest expected value - it requires that they prefer the act with highest expected value, regardless of what they calculate. If there are serious worries that one might be calculating incorrectly (in whatever sense “incorrect” can be understood here) then there may be other policies that are much better than calculating or reasoning explicitly. And if one has the kind of “functional decision theory” point of view that Soares and Yudkowsky do, then one should recognize that choosing a policy isn’t just about choosing one’s own policy, but choosing a policy for all agents relatively like you, many of whom will be making what you regard as mistakes. So it should be able to justify the relevant sorts of humility and caution.

5 more comments...

No posts

Ready for more?