Against Utilitarianism
Contents
Since it was first formalized in the mid1800s, Utilitarianism has dominated discussions of human behavior in philosophical ethics and in economics. In particular, it acts as the foundation of “rationality” that defines Homo economicus, the idealized human being in economics which forms the benchmark against which actual human behaviors are compared, because it is (purported to be) the only ethical framework which is “rational”. Specifically, Utilitarianism is “VNMrational”, which means that it’s the best possible system that satisfies the axioms of rational behavior defined by the Von NeumannMorgenstern utility theorem.
In this article, I purport to demonstrate that Utilitarianism does not actually make sense in relation to human values and is not actually a description of rational behavior, and I formally propose an alternative consequentialist theory that escapes the VNM theorem’s erroneous axioms.
NB: This post benefits from an understanding of as many of the following fields as you can manage: philosophy (emphasis on normative ethics), economics (emphasis on Game Theory), and computer science. Unfortunately, these three fields do not come into close contact very often, which is (I believe) why the ideas I’m introducing in this post are arriving about 100 years later than they should’ve.
It takes crossdisciplinary understanding to truly make sense of the world.
Remember: the idea of sorting a dictionary in “alphabetical order” — by comparing the first letters of a pair of words, then the second letters, and so on — had to be invented, in 1604 by Robert Cawdry, and then taught to people. It was originally a type of unique knowledge specific to a single discipline of study, not known outside of its esoteric specialty, but you can hardly imagine an adult today who doesn’t know how to sort a list of words, or how to look up a word in a dictionary, now can you?
If I’ve explained anything in a way that doesn’t click for you, please consider reading through these Reddit comments on r/Ethics, and feel free to DM or tweet me on Twitter.
This post is part 2 of an ongoing series.

Previous post: Human Morality and AI

Series: Building AGI
What is Utilitarianism?
Utilitarianism is a family of normative ethical theories that prescribe actions that maximize happiness and wellbeing for all affected individuals.
Utilitarianism developed out of the protoeconomist ethical philosophers of England and Scotland during the late 18th and earlytomid 19th centuries, most famously Bentham and Mills. It was closely related to the Enlightenment movement, which was an enormous influence on the founding ideologies of the United States of America and on that country’s current system of government, and relied strongly on the concept of eudaimonia (“good/true spirit”) from Classical Greek philosophy: the idea that there is a single thing called “goodness” and that all you have to do is choose it.
To use modern vocabulary and concepts, Utilitarianism holds that there is some quantity, called utility, which represents the sum of all moral good produced by the outcome of an action, and that these utility quantities can be ordered relative to each other, so that there is a best possible outcome.
(Or technically a set of best possible outcomes, if some of the outcomes are equal under the utility relation.)
Mathematically, this means we can assign to each outcome a member of ℝ (the
set of real numbers) and create a function out of it, named U
, so that we
get either U(x)<U(y)
, U(x)=U(y)
, or U(x)>U(y)
for each and every
pair of possible outcomes “x” and “y”.
This comes with a caveat, however: since we’ve only defined U
in terms of
the order of the resulting numbers, U
itself is subject to two
isomorphisms: uniform translation (adding a constant), and uniform
positive scaling (multiplying by a constant). Basically, if someone hands you
a card with a formula for U(x)
written on it, and another card with a
formula for a*U(x)+b
written on it, you can’t tell which one is which. This
means that absolute values of U
are undefined, and it also means that the
scale of U
is similarly undefined.
That’s worth reiterating: it’s total nonsense to ask if U(x)≥0
, or to
say that U(x)=U(y)+1
. It’s like trying to ask how many kilograms of the
color blue can be extracted from the abstract concept of “love”. It’s not
that you can just pick an arbitrary answer to those questions and see if it
works out. It’s that, if you try, you get paradoxes — you can prove that some
things are both true and false at the same time — because U(x)
and
a*U(x)+b
are exactly the same thing. If one thing is isomorphic to
another, it isn’t just “hard to tell them apart”, it means there is only one
thing but there are multiple, equivalent ways to write it down, and you
can use whichever one you like and even change your mind in the middle of
doing so.
A lot of this blog post is going to be about those two isomorphisms, and why
they mean that Utilitarianism is nonsense. Especially that pesky + b
.
Beyond that, Utilitarianism also holds that there are no additional moral considerations for any action except those of the action’s outcome. This assumption is called “consequentialism”, and I agree with it, so I won’t delve any further into it. My proposed replacement for Utility, to be introduced shortly, is also a consequentialist theory.
In philosophical ethics, this numerical quantification of “utility” is most
often defined to be directly proportional to the total human happiness in a
given configuration of the world or summed up along a given timeline, with
higher utility indicating higher total human happiness. The most rational
policy of action, then, is to take those actions which lead to the outcomes
with the highest possible utilities: that is, to label outcomes by the actions
that lead to them, to cast utility as a true mathematical function U(action)
with a domain consisting of “the set of all possible actions” and a codomain
of ℝ, and then to maximize that function using the methods of calculus.
In economics, the numerical quantification of “utility” is traditionally
defined in a slightly different way: “utility” is held to be directly
proportional to the happiness of the individual making the decision in a
given outcome, while taking into account that the individual’s personal
happiness may depend in some way on the happiness of others. The formulation
is otherwise identical, with each individual person casting their own utility
as U_person(action)
and choosing the action or sequence of actions which
leads to the outcomes with maximum U_person(action)
.
What are some existing critiques of Utilitarianism?
The formulation of Utilitarianism that is used in philosophical ethics has some nightmarish thought experiments that make it clear that it breaks down under certain limits. These have often been used by critics of Utilitarianism (primarily deontologists) as supposed examples of why consequentialism itself is monstrous.
Suppose, as seems to be the case, that there is an upper limit on how happy a person can be in a given moment. (We will show shortly that it gets worse for Utilitarianism if the contrary is true.)
Furthermore, suppose that each person in the world has a happiness function
U_person(action)
— with alternative spelling U(personaction)
— that
captures that person’s happiness over a worldline as a real number.
Then the most obvious formulation of utility follows as:
U_sum(x) = k Σ [p ∈ all living people] U(px)
That is, we might then define the total happiness of all people in the worldline stemming from an action to be the arithmetic sum of the happiness of each individual person in that worldline.
Under U_sum
, the optimal world is one where there are infinitely many
people… even if those people are living lives with barely any happiness in
them, so long as the expected value of U(personworld)
is any arbitrarily
small positive real value. According to U_sum
, overpopulation to the edge
of universal starvation is a good thing.
But wait! We said earlier that U(x)
and a*U(x)+b
are the same thing
(“isomorphic”)! This demonstrates that U_sum
has violated our requirement
that utility functions have translation isomorphism, since we don’t have any
meaningful way to say what the “zero point” of the utility function is; it can
only be used for relative ordering.
Well, the other obvious formulation of utility is:
U_avg(x) = k 1/(# living people) Σ [p ∈ all living people] U(px)
That is, we might instead define total happiness to be the arithmetic mean of each individual’s happiness.
Under U_avg
, it follows that we should kill those people who bring the
average happiness down, or otherwise remove them from consideration as part of
the world such that their utility no longer contributes to the function
output, so long as we can do so without any additional suffering. It also
follows that if we obtain the ability to modify people to increase their
happiness artificially, e.g. by giving them drugs or by performing surgery on
their brains (a practice called “wireheading”), then we should do so until all
people are maximally happy at every moment of their lives.
But all we’ve really done is normalize U_sum
so that the utility numbers
fall into the range from [H,+H]
for maximum happinessperperson +H
and
minimum happinessperperson H
; we didn’t actually get rid of that pesky
violation of translation isomorphism, because zero is still special.
The economic formulation suffers both of the above problems, and more beside,
depending on how each individual defines their personal utility function
U_person
. Some individuals might gain happiness from seeing another
person’s happiness go down, and the economic formulation provides no
particular reason why we should want to prevent that. (This post doesn’t
cover that scenario, but I’m planning to address it in the third post in the
series.)
All three formulations also admit utility monsters, which are agents who
define their U_person
with a different scaling factor, such that their own
happiness is objectively more important than the happiness of others. Thus we
see that both isomorphisms, uniform translation and uniform scaling, are
being violated by our attempts to aggregate utility between people, despite
those isomorphisms being a fundamental requirement of how weak our starting
assumptions were.
Some authors, motivated by the economic formulation and attempting to fix the
“kill ’em all” conclusion of U_avg
, have also attempted to include dead
people in the calculation, but that leads to additional violations of common
sense morality and other absurdities. We will explore this in greater depth
momentarily.
But there’s something else that’s wrong with Utilitarianism
The deepest flaw of Utilitarianism is this: it presupposes that there is some
function U_person
from actions to real numbers in the first place.
A mathematical function is, at its core, a mapping from elements of an input set (the “domain”) to elements of an output set (the “codomain”). Traditionally, the input and output sets are both ℝ, the set of real numbers, but you can do calculus on any function that is continuous, and we would like to do calculus because it’s very hard to figure out how to maximize or minimize a function if calculus is offlimits.
To be “continuous”, you need the following properties for your function:

The function must be total: the function must be welldefined for each and every possible input
x
such that there is exactly one outputf(x)
. 
Both the input and output sets must come with an attached distance measurement, called a metric. A set with a defined metric is a “metric space”. There are some rules that the metric must obey, most importantly the triangle inequality (
A+B≥C
for all distances A, B, C), but the metric does not need to be Euclidean. 
The distance between two output points
f(x)
andf(x+Δx)
must approach 0 asΔx
approaches 0. This implies that, for any two set elements, there must be an infinite number of set elements between them. We’ll call any metric space with an infinite number of elements between any two elements a “continuous space”.
With regards to U_person
, we can safely assume that the set of all actions
is a continuous space, because it doesn’t help Utilitarianism to assume
otherwise. And we know that ℝ is a continuous space, because the real numbers
are the simplest possible continuous metric space.
However, that is not enough to make U_person
a continuous function!
U_person
is not total because there are discontinuities in it.
In particular, some outcome inputs do not define a value for U_person
because the person under consideration is dead. It does not make sense to ask
U_Dave
, “How happy is Dave?”, in outcomes where Dave is dead. If Dave is
dead, then Dave is neither happy nor unhappy because his corpse is not an
agent and therefore is not capable of happiness or unhappiness. We might be
happy or sad about Future Dave’s death, and Present Dave might be happy or
sad about Future Dave’s death, but Future Dave himself does not care either
way because corpses do not care about anything. Only agents care, but
corpses have no agency.
Sometimes you can fix a discontinuous function to make a continuous function, but most such “fixes” break analytic properties like smoothness that are important for performing calculus.
(Remember: our longterm goal is to maximize U_person
or U
using calculus!
We care about smoothness and other such analytic properties, because otherwise
we will have a very hard time figuring out which actions to recommend. If we
fix U
in a way that breaks its analytic properties, we might have a working
set of equations, but it won’t be very useful in the real world because we
won’t be able to ask it questions about what to do next.)
Sometimes, the discontinuity exists at only a single point, called a
“singularity”. A classic example of such a function is sinc(x)=sin(x)÷x
,
which is continuous and smooth everywhere except at x=1
where it is
undefined. But the limit as you approach x=1
is 1
, no matter whether you
approach the singularity from above or from below, so simply declaring by
fiat that:


is sufficient to create a sinc
like function that is both continuous and
smooth.
U_person
is not quite like that, however. Adjacent to each outcome where
Dave is dead are an infinity of other outcomes where Dave is also dead; Dave’s
death is a cutoff line — or plane, or hyperplane, depending on the structure
of the outcome space — beyond which U_Dave
is undefined. Depending on the
series of events that leads to any one particular death, Dave might assign to
that cutoff a limit of positive infinity, negative infinity, some finite real
value, or no value at all, because (again!) our definition of “utility” was so
minimal that we agreed to never ask questions that could distinguish U(x)
from a*U(x)+b
(our isomorphism requirements). Our utility functions are
only identified by the order in which they place outcomes, not on the
numerical value for any one outcome.
The fact that we are being forced to assign a numerical value at all means
that our formalism has already broken down, possibly beyond repair. There are
now situations where Utilitarianism can no longer uniquely order two arbitrary
actions, because there is no one unique way to remove the discontinuity from
the U_Dave
function and repair the formalism.
Fundamentally, what’s happening can be traced back to the very origins of Utilitarianism:
Nature has placed mankind under the governance of two sovereign masters, pain and pleasure. It is for them alone to point out what we ought to do.
— Jeremy Bentham, “An Introduction to the Principles of Morals and Legislation”
Why should we expect that a single continuous mathematical function U
should
be able to capture both pain and pleasure, weighing them against each
other and measuring them on the same scale, and yet somehow not depending on
which arbitrary scale we use to quantify either one?
And what if there are two or more types of pleasure, or two or more types of pain, such that no amount of pleasurekind Pleasure1 is worth even a tiny amount of pleasurekind Pleasure2, or avoiding any amount of painkind Pain1 is worth enduring any amount of painkind Pain2?
A brief detour into computer science
Computer science is a branch of mathematics that deals with what is computable. That is, computer science is about discovering which questions are exactly solvable by mathematicians using pen and paper. The concept of computability actually predates the existence of the first working computers, as it was described independently by Alonzo Church and Alan Turing using different but equivalent constructions in the mid1930s roughly a decade before the construction of the first electromechanical computer, ENIAC.
Computers are a practical result of computer science, not the motivation of it.
It is intimately related to the mathematical philosophy known as Formalism, which was David Hilbert’s pet project brought to its peak by Bertrand Russell and Alfred North Whitehead in Principia Mathematica, a project which ultimately ended in failure when it was proven that there were certain mathematical truths that could never be proven (in a famous pair of theorems by Kurt Gödel). A duality was later established between the proving of theorems and the computation of algorithms, firmly connecting Gödel’s work to that of Church and Turing in both directions.
Algorithms are theorems, and theorems are algorithms.
In the centuryish since its origins, computer science has taken the concept
of a function into realms that were previously considered hopeless for
mathematicians to understand, giving mathematicians a language for describing
very sophisticated functions that are neither continuous nor smooth, yet are
straightforward to study using discrete mathematics, a field that was mostly
ignored prior to the 20th century except for the subbranch known as Number
Theory. We have already seen a very primitive version of such a function
above, with our surgery on the sinc
function: we split the input space into
two subsets, and then computed the overall function as one of two subordinate
functions depending on which subset contained the current input element. In
the language of computer science, we might represent such a function using the
following syntax:


A different kind of function
I previously mentioned that Utilitarianism is undone by conflating pain and pleasure, all kinds of pain and all kinds of pleasure, onto the same scale: the Greek mistake of eudaimonia. In this section I will explain how to correct that.
We will presently derive a new formalism — a new objective ethical framework for comparing two outcomes and selecting the ideal, with each person responsible for placing their own subjective moral values into the calculation — that remains consequentialist but that does not suffer from the failure modes caused by reducing outcomes to a single real number.
For the moment, I will reduce the number of moral values to two, one positive
(“pleasurelike”) and one negative (“painlike”). However, the framework
generalizes to any number of them, and the user of the framework is free to
place them into any order they see fit, including to interleave them so that
we might have Pain1 > Pleasure2 > Pain3
and/or Pleasure1 > Pain2 > Pleasure3
.
Let S_person(action)
represent the degree of pleasure that a given person
experiences while living on a worldline leading to a given outcome,
quantified as a real number such that increasing numerical value represents
increasing satisfaction. (Positive is good.)
Let T_person(action)
represent the degree of pain that a given person
experiences while living on a worldline leading to a given outcome, again
quantified as a real number such that increasing numerical value represents
increasing pain. (Positive is bad.)
(In both cases, we will soon define a sensible meaning for the zero value, which will eliminate that pesky assumption of translation isomorphism. This will help us later, when we want to identify the actions that maximize our “pleasure” functions and minimize our “pain” functions.)
Now define P_person(a0, a1)
to be the following function that maps from
elements of the Cartesian set product Action × Action
to elements of the
finite set {1, 0, 1}
:


This function can now be used to represent the given person’s relative preferences between two outcomes, using capabilities that extend far beyond what a traditional algebraic function can offer.
We can then construct an algorithm that, given a list of n
possible
outcomes, efficiently calculates the set of outcomes which are preferred above
all others but equally preferable amongst themselves.


As we mentioned above, it is entirely possible that P_person
would better
approximate the actual moral preferences of a real individual by using more
than two algebraic functions, flipping the sign from “positive is bad” to
“positive is good” each time we need to. The actual content of our best
function doesn’t change at all when we do this.
Spreadsheet Sort
If you’re having trouble following along because you’re not versed well enough in computer science, there’s actually a very easy realworld explanation for this algorithm: we just reinvented multicolumn spreadsheet sorting! All we’re doing is putting one outcome in each row in the spreadsheet, with columns “A” through “Z” representing our individual moral preferences for that outcome, and then sorting so that each column takes sorting priority over the columns to the right of it.
If your moral values are as simple as “I will sacrifice any amount of pleasure to avoid pain”, then “Pain” is in your personal column A, and “Pleasure” is in your personal column B, and all we do is sort by Pain ascending, then by Pleasure descending, and then pick the row at the top of the spreadsheet. Boom, that’s our action. It’s not actually that much more complicated than a utility function, but it’s impossible to write it down using algebra, so we had to get into some fancier math than you’re used to. That’s all.
Why is this better?
First and foremost, from a mathematical analysis perspective, this resolves
several of the absurdities that previously cropped up when comparing or
aggregating utility quantities. In particular, we no longer have to worry
about that pesky uniform translation isomorphism, because now we only
ever care about comparisons between two outcomes at a time; we’ve hidden the
actual use of S(personaction)
and T(personaction)
deep inside
P(personaction,action)
, which only ever compares the relative values, so we
are no longer tempted to ask questions that don’t have reasonable answers.
Additionally, the uniform scaling isomorphism is gone: we don’t care about the
order of S_person(x)
vs S_person(y)
itself anymore, like we did for
U_person
, but instead we care about the order of the quantity
(S_person(x)S_person(y))
. This lets us define the meaning of the numeric
values a little more tightly than what utility functions allow, and as a
consequence we can now use different uniform scaling constants for
S_person
vs T_person
vs all our other moral preference functions. Because
of this, we are no longer required to invent answers to absurd questions like
“if the joy of drinking a milkshake is J
, and the horror of a child being
tortured to death is H
, how many milkshakes need to be added to the world to
cancel out the torture of the child, such that H=k*J
for some positive
constant k
?”. Utilitarianism posits that there is a genuine, rational
answer to that question, but we can now see that it’s nonsense: the scaling
isomorphism now exists for individual preference functions, instead of only
for the aggregate sum of them. We no longer expect there to exist any such
finite constant that can convert units of one function to units of another,
because each one has its own independent scale.
As a bonus, we have also eliminated the absurdity that was previously inherent in some outcomes having “negative absolute utility”. An individual moral preference function can have a negative value, but that just means that the outcome under consideration is less preferred than those outcomes for which the preference function is larger. If we want to, we can pick a value of zero that makes sense for each function in isolation, and then negative values are simply outcomes that are worse than the zero outcome. When we were trying to force everything into the utility function framework, we found ourselves forced to pick a universal ZERO™ that makes sense for all of the components of our utility function, all at the same time.
And since we can now define each function’s zero point sensibly without
disturbing the actual ordering provided by P(personaction,action)
, we can
now use those zero values to represent outcomes in which the world has never
contained any agents and never will. This does not describe our own world or
any future timeline of it, but it does describe worlds that could have
existed as branches off from past worlds. Doing so helps us fix our analysis
problem — letting us pick actions that minimize or maximize our moral
preferences because, remember, there are actually an infinite space of
possible actions, not just the finite lists that we’ve been dealing with so
far — and we can then use these “dead worlds” as the limit of those actual
future worlds where all people are dead, but died in a way that caused neither
happiness nor suffering (nor any other moral objection that our preference
functions might care about). After all, physics says that the universe must
end in a dead world, one way or another. There’s no such thing as living
forever, so we can’t just say “boo death, all deaths bad!”, or else we’ll
end up with every function going to negative infinity if you take a long
enough view of the timeline.
And since all of these fixes come automatically from using the correct
mathematical formalism, the moral paradoxes that were produced by violating
Utilitarianism’s required isomorphisms are completely gone. No more
distinction between U_sum
vs U_avg
, no more Judgement of Solomon where we
are forced to pick which half of the baby we get to keep. The answers just
make sense.
Conclusions
Do humans actually follow this system?
I believe that most do. Furthermore, I believe that those who do not are objectively incorrect, in the sense that they are making a logical mistake. No matter how each person defines their personal, individual, and subjective moral preference functions, this is the mathematically correct way to combine those moral preferences in order to choose actions that satisfy them best.
(I also believe that the moral preference functions are mostly, but not entirely, objective as well, because they derive from the frequently recurring Game Theoretic scenarios that evolution places on social animals, particularly on those animals that parent and nurture their children. At the very least, I expect all evolved animals to obey them automatically, even aliens. But that argument is detailed in the previous post of this series.)
Beyond using this ethical framework to try to understand how realworld humans do their ethical calculations, I believe this framework also explains why Paperclip Maximizers show up so often in the study of economics and in the study of artifical intelligence: most artificial agents, intelligent or not, are built around maximizing some sort of utility function. I argue this: All Utilitarians Are Paperclip Maximizers, and All Paperclip Maximizers Are Utilitarians. There is no such thing as a utilitarian agent that does not maximize paperclips.
It’s impossible for someone who follows this ethical framework to successfully negotiate with a utilitarian AI, but only because it’s impossible for anyone to successfully negotiate with a Utilitarian AI — even if you yourself are another Utilitarian AI. There will always be scenarios where a Utilitarian finds the prospect of “cheating” to be too good to pass up. By contrast, an agent following this set of rules is capable of restraining itself voluntarily, with no extra Decision Theory stapled on afterthefact. In my third post, I intend to show how parties that subscribe to this ethical framework can sometimes build a shared moral framework amongst themselves, even if their own personal moral preferences do not exactly align (so long as they aren’t directly opposed).
A quick note on rationality
Back at the start of the article I mentioned that, in economics, it’s commonly asserted that “rationality” is the same thing as “VNMrationality”, i.e. that anything you call “rationality” must satisfy the VNM axioms, and that the provably best solution to the situations described by the VNM axioms is Utilitarianism.
I reject Axiom 3 (Continuity), which claims that, if some trio of random
lotteries “L”, “M”, and “N” are preferred in order L < M < N
by a rational
actor, then there is necessarily some probability “p” so that pL + (1  p)N ≈ M
.
In English: if you make a metalottery “L+N” where sometimes you get a result from lottery “L” and sometimes you get a result from lottery “N”, and “L” is strictly worse than “M” while “M” is strictly worse than “N”, then there is some way to weight “L+N" so that “L” is rare enough and “N” is enticing enough that you’re forced to conclude that the metalottery is just as good as “M”. And if you don’t, it’s because you’re being irrational.
This is patently absurd. We can see this if “L” includes the possibility of, say, being tortured to death, but “M” and “N” do not. I reject the very idea that economic “rationality”, VNMrationality, is actually in any way rational, or that it is a good model for actual human behavior (even in the rational Homo economicus limit). Importantly, my proposed ethical framework does not obey VNM Axiom 3, and is therefore the VNM theorem’s conclusion does not apply to it.