Contents

Since it was first formalized in the mid-1800s, Utilitarianism has dominated discussions of human behavior in philosophical ethics and in economics. In particular, it acts as the foundation of “rationality” that defines Homo economicus, the idealized human being in economics which forms the benchmark against which actual human behaviors are compared, because it is (purported to be) the only ethical framework which is “rational”. Specifically, Utilitarianism is “VNM-rational”, which means that it’s the best possible system that satisfies the axioms of rational behavior defined by the Von Neumann-Morgenstern utility theorem.

In this article, I purport to demonstrate that Utilitarianism does not actually make sense in relation to human values and is not actually a description of rational behavior, and I formally propose an alternative consequentialist theory that escapes the VNM theorem’s erroneous axioms.

NB: This post benefits from an understanding of as many of the following fields as you can manage: philosophy (emphasis on normative ethics), economics (emphasis on Game Theory), and computer science. Unfortunately, these three fields do not come into close contact very often, which is (I believe) why the ideas I’m introducing in this post are arriving about 100 years later than they should’ve.

It takes cross-disciplinary understanding to truly make sense of the world.

Remember: the idea of sorting a dictionary in “alphabetical order” — by comparing the first letters of a pair of words, then the second letters, and so on — had to be invented, in 1604 by Robert Cawdry, and then taught to people. It was originally a type of unique knowledge specific to a single discipline of study, not known outside of its esoteric specialty, but you can hardly imagine an adult today who doesn’t know how to sort a list of words, or how to look up a word in a dictionary, now can you?

If I’ve explained anything in a way that doesn’t click for you, please consider reading through these Reddit comments on r/Ethics, and feel free to DM or tweet me on Twitter.


This post is part 2 of an ongoing series.


What is Utilitarianism?

Utilitarianism is a family of normative ethical theories that prescribe actions that maximize happiness and well-being for all affected individuals.

Wikipedia, “Utilitarianism”

Utilitarianism developed out of the proto-economist ethical philosophers of England and Scotland during the late 18th and early-to-mid 19th centuries, most famously Bentham and Mills. It was closely related to the Enlightenment movement, which was an enormous influence on the founding ideologies of the United States of America and on that country’s current system of government, and relied strongly on the concept of eudaimonia (“good/true spirit”) from Classical Greek philosophy: the idea that there is a single thing called “goodness” and that all you have to do is choose it.

To use modern vocabulary and concepts, Utilitarianism holds that there is some quantity, called utility, which represents the sum of all moral good produced by the outcome of an action, and that these utility quantities can be ordered relative to each other, so that there is a best possible outcome.

(Or technically a set of best possible outcomes, if some of the outcomes are equal under the utility relation.)

Mathematically, this means we can assign to each outcome a member of ℝ (the set of real numbers) and create a function out of it, named U, so that we get either U(x)<U(y), U(x)=U(y), or U(x)>U(y) for each and every pair of possible outcomes “x” and “y”.

This comes with a caveat, however: since we’ve only defined U in terms of the order of the resulting numbers, U itself is subject to two isomorphisms: uniform translation (adding a constant), and uniform positive scaling (multiplying by a constant). Basically, if someone hands you a card with a formula for U(x) written on it, and another card with a formula for a*U(x)+b written on it, you can’t tell which one is which. This means that absolute values of U are undefined, and it also means that the scale of U is similarly undefined.

That’s worth reiterating: it’s total nonsense to ask if U(x)≥0, or to say that U(x)=U(y)+1. It’s like trying to ask how many kilograms of the color blue can be extracted from the abstract concept of “love”. It’s not that you can just pick an arbitrary answer to those questions and see if it works out. It’s that, if you try, you get paradoxes — you can prove that some things are both true and false at the same time — because U(x) and a*U(x)+b are exactly the same thing. If one thing is isomorphic to another, it isn’t just “hard to tell them apart”, it means there is only one thing but there are multiple, equivalent ways to write it down, and you can use whichever one you like and even change your mind in the middle of doing so.

A lot of this blog post is going to be about those two isomorphisms, and why they mean that Utilitarianism is nonsense. Especially that pesky + b.

Beyond that, Utilitarianism also holds that there are no additional moral considerations for any action except those of the action’s outcome. This assumption is called “consequentialism”, and I agree with it, so I won’t delve any further into it. My proposed replacement for Utility, to be introduced shortly, is also a consequentialist theory.

In philosophical ethics, this numerical quantification of “utility” is most often defined to be directly proportional to the total human happiness in a given configuration of the world or summed up along a given timeline, with higher utility indicating higher total human happiness. The most rational policy of action, then, is to take those actions which lead to the outcomes with the highest possible utilities: that is, to label outcomes by the actions that lead to them, to cast utility as a true mathematical function U(action) with a domain consisting of “the set of all possible actions” and a co-domain of ℝ, and then to maximize that function using the methods of calculus.

In economics, the numerical quantification of “utility” is traditionally defined in a slightly different way: “utility” is held to be directly proportional to the happiness of the individual making the decision in a given outcome, while taking into account that the individual’s personal happiness may depend in some way on the happiness of others. The formulation is otherwise identical, with each individual person casting their own utility as U_person(action) and choosing the action or sequence of actions which leads to the outcomes with maximum U_person(action).

What are some existing critiques of Utilitarianism?

The formulation of Utilitarianism that is used in philosophical ethics has some nightmarish thought experiments that make it clear that it breaks down under certain limits. These have often been used by critics of Utilitarianism (primarily deontologists) as supposed examples of why consequentialism itself is monstrous.

Suppose, as seems to be the case, that there is an upper limit on how happy a person can be in a given moment. (We will show shortly that it gets worse for Utilitarianism if the contrary is true.)

Furthermore, suppose that each person in the world has a happiness function U_person(action) — with alternative spelling U(person|action) — that captures that person’s happiness over a world-line as a real number.

Then the most obvious formulation of utility follows as:

U_sum(x) = k Σ [p ∈ all living people] U(p|x)

That is, we might then define the total happiness of all people in the world-line stemming from an action to be the arithmetic sum of the happiness of each individual person in that world-line.

Under U_sum, the optimal world is one where there are infinitely many people… even if those people are living lives with barely any happiness in them, so long as the expected value of U(person|world) is any arbitrarily small positive real value. According to U_sum, overpopulation to the edge of universal starvation is a good thing.

But wait! We said earlier that U(x) and a*U(x)+b are the same thing (“isomorphic”)! This demonstrates that U_sum has violated our requirement that utility functions have translation isomorphism, since we don’t have any meaningful way to say what the “zero point” of the utility function is; it can only be used for relative ordering.

Well, the other obvious formulation of utility is:

U_avg(x) = k 1/(# living people) Σ [p ∈ all living people] U(p|x)

That is, we might instead define total happiness to be the arithmetic mean of each individual’s happiness.

Under U_avg, it follows that we should kill those people who bring the average happiness down, or otherwise remove them from consideration as part of the world such that their utility no longer contributes to the function output, so long as we can do so without any additional suffering. It also follows that if we obtain the ability to modify people to increase their happiness artificially, e.g. by giving them drugs or by performing surgery on their brains (a practice called “wireheading”), then we should do so until all people are maximally happy at every moment of their lives.

But all we’ve really done is normalize U_sum so that the utility numbers fall into the range from [-H,+H] for maximum happiness-per-person +H and minimum happiness-per-person -H; we didn’t actually get rid of that pesky violation of translation isomorphism, because zero is still special.

The economic formulation suffers both of the above problems, and more beside, depending on how each individual defines their personal utility function U_person. Some individuals might gain happiness from seeing another person’s happiness go down, and the economic formulation provides no particular reason why we should want to prevent that. (This post doesn’t cover that scenario, but I’m planning to address it in the third post in the series.)

All three formulations also admit utility monsters, which are agents who define their U_person with a different scaling factor, such that their own happiness is objectively more important than the happiness of others. Thus we see that both isomorphisms, uniform translation and uniform scaling, are being violated by our attempts to aggregate utility between people, despite those isomorphisms being a fundamental requirement of how weak our starting assumptions were.

Some authors, motivated by the economic formulation and attempting to fix the “kill ’em all” conclusion of U_avg, have also attempted to include dead people in the calculation, but that leads to additional violations of common sense morality and other absurdities. We will explore this in greater depth momentarily.

But there’s something else that’s wrong with Utilitarianism

The deepest flaw of Utilitarianism is this: it presupposes that there is some function U_person from actions to real numbers in the first place.

A mathematical function is, at its core, a mapping from elements of an input set (the “domain”) to elements of an output set (the “co-domain”). Traditionally, the input and output sets are both ℝ, the set of real numbers, but you can do calculus on any function that is continuous, and we would like to do calculus because it’s very hard to figure out how to maximize or minimize a function if calculus is off-limits.

To be “continuous”, you need the following properties for your function:

  1. The function must be total: the function must be well-defined for each and every possible input x such that there is exactly one output f(x).

  2. Both the input and output sets must come with an attached distance measurement, called a metric. A set with a defined metric is a “metric space”. There are some rules that the metric must obey, most importantly the triangle inequality (A+B≥C for all distances A, B, C), but the metric does not need to be Euclidean.

  3. The distance between two output points f(x) and f(x+Δx) must approach 0 as Δx approaches 0. This implies that, for any two set elements, there must be an infinite number of set elements between them. We’ll call any metric space with an infinite number of elements between any two elements a “continuous space”.

With regards to U_person, we can safely assume that the set of all actions is a continuous space, because it doesn’t help Utilitarianism to assume otherwise. And we know that ℝ is a continuous space, because the real numbers are the simplest possible continuous metric space.

However, that is not enough to make U_person a continuous function! U_person is not total because there are discontinuities in it.

In particular, some outcome inputs do not define a value for U_person because the person under consideration is dead. It does not make sense to ask U_Dave, “How happy is Dave?”, in outcomes where Dave is dead. If Dave is dead, then Dave is neither happy nor unhappy because his corpse is not an agent and therefore is not capable of happiness or unhappiness. We might be happy or sad about Future Dave’s death, and Present Dave might be happy or sad about Future Dave’s death, but Future Dave himself does not care either way because corpses do not care about anything. Only agents care, but corpses have no agency.

Sometimes you can fix a discontinuous function to make a continuous function, but most such “fixes” break analytic properties like smoothness that are important for performing calculus.

(Remember: our long-term goal is to maximize U_person or U using calculus! We care about smoothness and other such analytic properties, because otherwise we will have a very hard time figuring out which actions to recommend. If we fix U in a way that breaks its analytic properties, we might have a working set of equations, but it won’t be very useful in the real world because we won’t be able to ask it questions about what to do next.)

Sometimes, the discontinuity exists at only a single point, called a “singularity”. A classic example of such a function is sinc(x)=sin(x)÷x, which is continuous and smooth everywhere except at x=1 where it is undefined. But the limit as you approach x=1 is 1, no matter whether you approach the singularity from above or from below, so simply declaring by fiat that:

1
sinc_fixed(x) = {x if x = 1, sinc(x) if x ≠ 1}

is sufficient to create a sinc-like function that is both continuous and smooth.

U_person is not quite like that, however. Adjacent to each outcome where Dave is dead are an infinity of other outcomes where Dave is also dead; Dave’s death is a cutoff line — or plane, or hyperplane, depending on the structure of the outcome space — beyond which U_Dave is undefined. Depending on the series of events that leads to any one particular death, Dave might assign to that cutoff a limit of positive infinity, negative infinity, some finite real value, or no value at all, because (again!) our definition of “utility” was so minimal that we agreed to never ask questions that could distinguish U(x) from a*U(x)+b (our isomorphism requirements). Our utility functions are only identified by the order in which they place outcomes, not on the numerical value for any one outcome.

The fact that we are being forced to assign a numerical value at all means that our formalism has already broken down, possibly beyond repair. There are now situations where Utilitarianism can no longer uniquely order two arbitrary actions, because there is no one unique way to remove the discontinuity from the U_Dave function and repair the formalism.

Fundamentally, what’s happening can be traced back to the very origins of Utilitarianism:

Nature has placed mankind under the governance of two sovereign masters, pain and pleasure. It is for them alone to point out what we ought to do.

Jeremy Bentham, “An Introduction to the Principles of Morals and Legislation”

Why should we expect that a single continuous mathematical function U should be able to capture both pain and pleasure, weighing them against each other and measuring them on the same scale, and yet somehow not depending on which arbitrary scale we use to quantify either one?

And what if there are two or more types of pleasure, or two or more types of pain, such that no amount of pleasure-kind Pleasure-1 is worth even a tiny amount of pleasure-kind Pleasure-2, or avoiding any amount of pain-kind Pain-1 is worth enduring any amount of pain-kind Pain-2?

A brief detour into computer science

Computer science is a branch of mathematics that deals with what is computable. That is, computer science is about discovering which questions are exactly solvable by mathematicians using pen and paper. The concept of computability actually pre-dates the existence of the first working computers, as it was described independently by Alonzo Church and Alan Turing using different but equivalent constructions in the mid-1930s roughly a decade before the construction of the first electromechanical computer, ENIAC.

Computers are a practical result of computer science, not the motivation of it.

It is intimately related to the mathematical philosophy known as Formalism, which was David Hilbert’s pet project brought to its peak by Bertrand Russell and Alfred North Whitehead in Principia Mathematica, a project which ultimately ended in failure when it was proven that there were certain mathematical truths that could never be proven (in a famous pair of theorems by Kurt Gödel). A duality was later established between the proving of theorems and the computation of algorithms, firmly connecting Gödel’s work to that of Church and Turing in both directions.

Algorithms are theorems, and theorems are algorithms.

In the century-ish since its origins, computer science has taken the concept of a function into realms that were previously considered hopeless for mathematicians to understand, giving mathematicians a language for describing very sophisticated functions that are neither continuous nor smooth, yet are straightforward to study using discrete mathematics, a field that was mostly ignored prior to the 20th century except for the sub-branch known as Number Theory. We have already seen a very primitive version of such a function above, with our surgery on the sinc function: we split the input space into two subsets, and then computed the overall function as one of two subordinate functions depending on which subset contained the current input element. In the language of computer science, we might represent such a function using the following syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
;; This syntax is called Scheme, which is a dialect of LISP.  LISP was
;; inspired by Alonzo Church's preferred formalism for algorithms, called the
;; Lambda Calculus.  However, while LISP and Lambda Calculus are equivalent,
;; LISP is easier for humans to understand, so I use it here.

;; sinc(x) = sin(x) / x
;;
;; NB: "(sin x)" is LISP for "sin(x)", and "(/ a b)" is LISP for "a ÷ b".
;;
(define (sinc x)
  (/
    (sin x)
    x
  )
)

;; sinc_fixed(x) = {x if x = 1, sinc(x) if x ≠ 1}
(define (sinc_fixed x)
  (if (= x 1)
      x
      (sinc x)
  )
)

A different kind of function

I previously mentioned that Utilitarianism is undone by conflating pain and pleasure, all kinds of pain and all kinds of pleasure, onto the same scale: the Greek mistake of eudaimonia. In this section I will explain how to correct that.

We will presently derive a new formalism — a new objective ethical framework for comparing two outcomes and selecting the ideal, with each person responsible for placing their own subjective moral values into the calculation — that remains consequentialist but that does not suffer from the failure modes caused by reducing outcomes to a single real number.

For the moment, I will reduce the number of moral values to two, one positive (“pleasure-like”) and one negative (“pain-like”). However, the framework generalizes to any number of them, and the user of the framework is free to place them into any order they see fit, including to interleave them so that we might have Pain-1 > Pleasure-2 > Pain-3 and/or Pleasure-1 > Pain-2 > Pleasure-3.

Let S_person(action) represent the degree of pleasure that a given person experiences while living on a world-line leading to a given outcome, quantified as a real number such that increasing numerical value represents increasing satisfaction. (Positive is good.)

Let T_person(action) represent the degree of pain that a given person experiences while living on a world-line leading to a given outcome, again quantified as a real number such that increasing numerical value represents increasing pain. (Positive is bad.)

(In both cases, we will soon define a sensible meaning for the zero value, which will eliminate that pesky assumption of translation isomorphism. This will help us later, when we want to identify the actions that maximize our “pleasure” functions and minimize our “pain” functions.)

Now define P_person(a0, a1) to be the following function that maps from elements of the Cartesian set product Action × Action to elements of the finite set {-1, 0, 1}:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
;; signum
;;
;; This function takes a number and returns just the sign: +1 for positive
;; numbers, -1 for negative numbers, and 0 for zero.
;;
(define (signum x)
  (if (= x 0)
      0
      (if (> x 0)
          +1
          -1
      )
  )
)

;; P_person
;;
;; This function compares two outcomes, a0 and a1, and returns some
;; person's relative preference between them: -1 if a0 > a1, +1 if
;; a0 < a1, or 0 if a0 = a1.
;;
;; Mnemonic:   because -1 (first) comes before  0 (second),
;;           therefore a0 (first) comes before a1 (second).
;;             because +1  (last) comes after   0 (second),
;;           therefore a0  (last) comes after  a1 (second).
;;
(define (P_person a0 a1)
  (let
    (
      ;; ΔS is S(person|a0) - S(person|a1)
      ;;
      ;; Note that positive ΔS means that a1 has the larger S,
      ;; and also that larger S is better.
      ;;
      (ΔS (- (S_person a0) (S_person a1)))

      ;; ΔT is T(person|a0) - T(person|a1)
      ;;
      ;; Note that positive ΔT means that a1 has the larger T,
      ;; and also that larger T is worse.
      ;;
      (ΔT (- (T_person a0) (T_person a1)))
    )
    (signum
      (if (= ΔT 0)

          ;; if ΔT is 0, then we use ΔS to compare a0 vs a1
          (- 0 ΔS)

          ;; if ΔT is not 0, then we use ΔT to compare a0 vs a1
          ΔT
      )
    )
  )
)

This function can now be used to represent the given person’s relative preferences between two outcomes, using capabilities that extend far beyond what a traditional algebraic function can offer.

We can then construct an algorithm that, given a list of n possible outcomes, efficiently calculates the set of outcomes which are preferred above all others but equally preferable amongst themselves.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113

;; best
;;
;; Given a list of actions “as” that represents the ones under consideration,
;; and a function “p” that compares the outcomes of two actions, "(best p as)"
;; returns a list of actions that are better than any other action not in the
;; list, according to “p”.
;;
;; If “as” contains N items, this algorithm runs in O(N) time,
;; and it makes exactly (N - 1) calls to “p”.  This is pretty efficient,
;; because it means we don't need to compare each of the O(N * N) pairwise
;; combinations of two worlds.
;;
(define (best p as)
  (best-impl p '() as)
)

;; best-impl
;;
;; A helper function used by “best” that takes a partially built list “rs” of
;; the best possible actions (given only those that have *already* been
;; considered), plus the partially deconstructed list “as” of actions
;; *not yet* considered.
;;
;; Again, best-impl runs in O(N) time and makes no more than N calls to “p”.
;;
;; Notes for those unfamiliar with LISP notation:
;;
;;    '()
;;      This is the nil element, which indicates an empty list.
;;
;;    (cons head tail)
;;      This is a function which, given a proposed element “head” and an
;;      existing list “tail”, creates a new list consisting of “head” as the
;;      first element and the elements of “tail” as the remaining elements.
;;
;;      It's how you push an item onto the beginning of a list.
;;
;;    (car list)
;;      This returns the first (“head”) element of a non-empty list.
;;
;;    (cdr list)
;;      This returns the list containing all elements of the given non-empty
;;      list *except* for the first element (the “tail”).
;;
(define (best-impl p rs as)
  (if
    (eq? as '())

    ;; if there are no further actions to consider, then we have finished
    ;; constructing rs.
    rs

    ;; otherwise, there is at least one more action to consider, (car as).
    (if
      (eq? rs '())

      ;; if rs is empty, then we do not need to make a comparison because
      ;; (car as) is already the best action that we have considered so far.
      ;; if a list has only one item, that item has to be the best, after all.
      ;;
      ;; we use recursion to continue the computation.
      (best-impl
        p
        (cons (car as) '())
        (cdr as)
      )

      ;; compare the two actions, the action under consideration (car as) and
      ;; one example of the best actions we've seen so far (car rs).
      (let (
          (cmp (p (car rs) (car as)))
        )
        (if
          (< cmp 0)

          ;; if (car rs) is more preferred than (car as), then we ignore
          ;; (car as) and continue the computation with rs unchanged.
          (best-impl
            p
            rs
            (cdr as)
          )

          ;; otherwise, (car rs) is either equally preferred to (car as) or
          ;; less preferred.
          (if
            (= cmp 0)

            ;; if (car rs) is equally preferred to (car as), then push
            ;; (car as) onto the beginning of rs, and then continue the
            ;; computation.
            (best-impl
              p
              (cons (car as) rs)
              (cdr as)
            )

            ;; if (car rs) is less preferred than (car as), then throw away
            ;; the entire list rs because we found something better.  Instead,
            ;; construct a new one-item list that contains only (car as),
            ;; and then continue the computation.
            (best-impl
              p
              (cons (car as) '())
              (cdr as)
            )
          )
        )
      )
    )
  )
)

As we mentioned above, it is entirely possible that P_person would better approximate the actual moral preferences of a real individual by using more than two algebraic functions, flipping the sign from “positive is bad” to “positive is good” each time we need to. The actual content of our best function doesn’t change at all when we do this.

Spreadsheet Sort

If you’re having trouble following along because you’re not versed well enough in computer science, there’s actually a very easy real-world explanation for this algorithm: we just reinvented multi-column spreadsheet sorting! All we’re doing is putting one outcome in each row in the spreadsheet, with columns “A” through “Z” representing our individual moral preferences for that outcome, and then sorting so that each column takes sorting priority over the columns to the right of it.

If your moral values are as simple as “I will sacrifice any amount of pleasure to avoid pain”, then “Pain” is in your personal column A, and “Pleasure” is in your personal column B, and all we do is sort by Pain ascending, then by Pleasure descending, and then pick the row at the top of the spreadsheet. Boom, that’s our action. It’s not actually that much more complicated than a utility function, but it’s impossible to write it down using algebra, so we had to get into some fancier math than you’re used to. That’s all.

Why is this better?

First and foremost, from a mathematical analysis perspective, this resolves several of the absurdities that previously cropped up when comparing or aggregating utility quantities. In particular, we no longer have to worry about that pesky uniform translation isomorphism, because now we only ever care about comparisons between two outcomes at a time; we’ve hidden the actual use of S(person|action) and T(person|action) deep inside P(person|action,action), which only ever compares the relative values, so we are no longer tempted to ask questions that don’t have reasonable answers.

Additionally, the uniform scaling isomorphism is gone: we don’t care about the order of S_person(x) vs S_person(y) itself anymore, like we did for U_person, but instead we care about the order of the quantity (S_person(x)-S_person(y)). This lets us define the meaning of the numeric values a little more tightly than what utility functions allow, and as a consequence we can now use different uniform scaling constants for S_person vs T_person vs all our other moral preference functions. Because of this, we are no longer required to invent answers to absurd questions like “if the joy of drinking a milkshake is J, and the horror of a child being tortured to death is H, how many milkshakes need to be added to the world to cancel out the torture of the child, such that H=-k*J for some positive constant k?”. Utilitarianism posits that there is a genuine, rational answer to that question, but we can now see that it’s nonsense: the scaling isomorphism now exists for individual preference functions, instead of only for the aggregate sum of them. We no longer expect there to exist any such finite constant that can convert units of one function to units of another, because each one has its own independent scale.

As a bonus, we have also eliminated the absurdity that was previously inherent in some outcomes having “negative absolute utility”. An individual moral preference function can have a negative value, but that just means that the outcome under consideration is less preferred than those outcomes for which the preference function is larger. If we want to, we can pick a value of zero that makes sense for each function in isolation, and then negative values are simply outcomes that are worse than the zero outcome. When we were trying to force everything into the utility function framework, we found ourselves forced to pick a universal ZERO™ that makes sense for all of the components of our utility function, all at the same time.

And since we can now define each function’s zero point sensibly without disturbing the actual ordering provided by P(person|action,action), we can now use those zero values to represent outcomes in which the world has never contained any agents and never will. This does not describe our own world or any future timeline of it, but it does describe worlds that could have existed as branches off from past worlds. Doing so helps us fix our analysis problem — letting us pick actions that minimize or maximize our moral preferences because, remember, there are actually an infinite space of possible actions, not just the finite lists that we’ve been dealing with so far — and we can then use these “dead worlds” as the limit of those actual future worlds where all people are dead, but died in a way that caused neither happiness nor suffering (nor any other moral objection that our preference functions might care about). After all, physics says that the universe must end in a dead world, one way or another. There’s no such thing as living forever, so we can’t just say “boo death, all deaths bad!”, or else we’ll end up with every function going to negative infinity if you take a long enough view of the timeline.

And since all of these fixes come automatically from using the correct mathematical formalism, the moral paradoxes that were produced by violating Utilitarianism’s required isomorphisms are completely gone. No more distinction between U_sum vs U_avg, no more Judgement of Solomon where we are forced to pick which half of the baby we get to keep. The answers just make sense.

Conclusions

Do humans actually follow this system?

I believe that most do. Furthermore, I believe that those who do not are objectively incorrect, in the sense that they are making a logical mistake. No matter how each person defines their personal, individual, and subjective moral preference functions, this is the mathematically correct way to combine those moral preferences in order to choose actions that satisfy them best.

(I also believe that the moral preference functions are mostly, but not entirely, objective as well, because they derive from the frequently recurring Game Theoretic scenarios that evolution places on social animals, particularly on those animals that parent and nurture their children. At the very least, I expect all evolved animals to obey them automatically, even aliens. But that argument is detailed in the previous post of this series.)

Beyond using this ethical framework to try to understand how real-world humans do their ethical calculations, I believe this framework also explains why Paperclip Maximizers show up so often in the study of economics and in the study of artifical intelligence: most artificial agents, intelligent or not, are built around maximizing some sort of utility function. I argue this: All Utilitarians Are Paperclip Maximizers, and All Paperclip Maximizers Are Utilitarians. There is no such thing as a utilitarian agent that does not maximize paperclips.

It’s impossible for someone who follows this ethical framework to successfully negotiate with a utilitarian AI, but only because it’s impossible for anyone to successfully negotiate with a Utilitarian AI — even if you yourself are another Utilitarian AI. There will always be scenarios where a Utilitarian finds the prospect of “cheating” to be too good to pass up. By contrast, an agent following this set of rules is capable of restraining itself voluntarily, with no extra Decision Theory stapled on after-the-fact. In my third post, I intend to show how parties that subscribe to this ethical framework can sometimes build a shared moral framework amongst themselves, even if their own personal moral preferences do not exactly align (so long as they aren’t directly opposed).

A quick note on rationality

Back at the start of the article I mentioned that, in economics, it’s commonly asserted that “rationality” is the same thing as “VNM-rationality”, i.e. that anything you call “rationality” must satisfy the VNM axioms, and that the provably best solution to the situations described by the VNM axioms is Utilitarianism.

I reject Axiom 3 (Continuity), which claims that, if some trio of random lotteries “L”, “M”, and “N” are preferred in order L < M < N by a rational actor, then there is necessarily some probability “p” so that pL + (1 - p)N ≈ M.

In English: if you make a meta-lottery “L+N” where sometimes you get a result from lottery “L” and sometimes you get a result from lottery “N”, and “L” is strictly worse than “M” while “M” is strictly worse than “N”, then there is some way to weight “L+N" so that “L” is rare enough and “N” is enticing enough that you’re forced to conclude that the meta-lottery is just as good as “M”. And if you don’t, it’s because you’re being irrational.

This is patently absurd. We can see this if “L” includes the possibility of, say, being tortured to death, but “M” and “N” do not. I reject the very idea that economic “rationality”, VNM-rationality, is actually in any way rational, or that it is a good model for actual human behavior (even in the rational Homo economicus limit). Importantly, my proposed ethical framework does not obey VNM Axiom 3, and is therefore the VNM theorem’s conclusion does not apply to it.