Non-nashian solution for games in normal form
Last updated
Last updated
For now almost 14 years, I have been researching with my colleagues Jean-Pierre Dupuy and Stéphane Reiche what happens with alternate assumptions in game theory (I like to call this Non-Nashian Game Theory). Many of my friends heard about this over dinner and know about our results for games in extensive form (trees), the most prominent example of such games being Chess.
Recently, I was able to generalize our results to , and I would like to share some of the insights on a short post, and explain what exactly differentiates it from Nash equilibria (spoiler: this has to do with a definition of free will).
The high-level economic summary of this area of research is very simple to understand:
Rational agents who have integrity and wholesomeness, in the sense that they believe that they are an open book, when playing against each other, achieve better -- namely, Pareto-optimal -- results than those who do not.
Put your Vulcan ears on, because you will need a lot of purely logical thinking. Performing mind meld is also very welcome and aligned with what we are doing here, but optional.
There is nothing better than a concrete example to show what is going on. Let us consider the following game. In normal form, a game is represented as a matrix. We have a row player that can pick strategy A, B or C, and a column player that can pick strategy D, E or F.
When both players pick a strategy, say for example A and D, then this jointly selects a cell (AD) and this gives us payoffs. The left number (5) is what the row player gets ($5), and the right number is what the column player gets ($1).
We assume that both players would like to gain as much as possible. This is known by economists as "utility maximization", in plain words, rationality. We also assume that both players know what game they are playing. We also assume a couple of fancy properties such as common knowledge of what we have just said, impressive logical and reasoning skills, etc.
A model that tells us what players do, or should do, is called a solution concept. A mainstream solution concept for this kind of games is that of Nash equilibria. The one I would like to present here is called Perfectly Transparent Equilibrium.
Let us start with what John Nash suggested, back in the 1950s.
A very important assumption is that players can pick their strategies independently. Intuitively, you can think of the players' being in separate rooms.
The row player can reason as follows: if the column player picks D, then what should I pick, in other words: what is my best response to D? Well, if I pick A I get $5, if I pick B $7 and if I pick C $6. So my best response to D is B. Likewise, my best response to E is A, and my best response to F is A.
The column player does the same thing and finds out that her best response to A is F (gets her $4), her bets response to B is E (gets her $9), and her best response to C is D (gets her $6).
We get a Nash equilibrium if we find a cell such that the players pick "best responses to each other's choices of strategy." There is only one: AF. It appears in orange on both tables above, and in green below. Indeed, you can check that the best response to A is F and the best response to F is A.
So what happens here? A Nash Equilibrium fulfils a stability criterion: no player has interest to deviate from it. More precisely, no player has interest to deviate unilaterally from it, i.e., assuming that they can fix the opponent's strategy while reasoning on what they can do. This is why, in the above lines, we only reasoned across a row, or across a column. Being "stuck" on a row or column while reasoning is a typical Nashian feature.
But what disturbs us here? In fact, (4,4) is not optimal. There would have been opportunity to have (7,8) or (6,6), that would have been better for both, but the Nash equillibrium framework fails at capturing them.
As it turns out, many game theorists feel very strongly about this kind of reasoning, because what is hiding behind these unilateral deviations is a modelling of free will, namely: Players can make the decisions they desire, in a way that is fully independent from anybody else.
Fully independent means that the row player can take A, B or C independently of what the other player does. This is what entitles him to keep F as fixed (hypothetically chosen by the opponent), and to optimize his payoff across the cells AF, BF and CF without worrying about the impact of his choice on the opponent's strategy.
It is precisely this assumption that my colleagues and myself are challenging. Indeed, the whole point of game theory is to model the players as rational agents in order to predict their behavior. There is an apparent conflict between us predicting what they will do, and their free will defined as above: we can predict they do X, but they can do X, Y or Z as they see fit and possibly make the prediction wrong.
For example, Nash argues that AF is an equilibrium because, for example, if the column player deviated unilaterally with D (thus reaching AD) or with E (reaching AE), she would get worse payoffs. But the thing is: if the column player deviated to D, and the row player would have known it, then the row player would not have picked A in the first place. Assuming that the column player can unilaterally deviate to D while fixing A, in some alternate hypothetical world, entails that either the row player is not rational (because he sticks to the sub-optimal A), or the row player is not that good at anticipating his opponent's moves.
In other words, unilateral deviations are very opaque: why would the actual world be transparent to everybody (common knowledge of anything anybody can fancy), but alternate, hypothetical worlds would be opaque and players would act in non-optimal ways? What we want to achieve here is a level of transparency not only in the actual world, but in all possible worlds.
One way of reconciling this was introduced by Jean-Pierre Dupuy in 1992 and 2000, namely: we should assume that the prediction is correlated with what is being predicted. If the agent does X, we predicted that they would do X. Had they done Y instead, we would have predicted that they would have done Y. In all possible worlds, the prediction is correct. This is called Perfect Prediction. This leads to a slightly weaker definition of free will: the agents can still make their decisions freely, but they accept that their decision may be correlated with some prediction made in a separate room, or even in the past. This is a lot to digest, but it suffices to see that this is only a statistical relationship, not a causal one, and statistical relationships are symmetric, so the arrow of time is irrelevant.
Let us now go back to the game. The reasoning will now be performed "upside down": because we assume Perfect Prediction, the players know each other's strategies, and they also would have known each other's strategies even if they had played differently. We will now see that we can iteratively eliminate the cells of the game.
Here is how this is done.
For example, let us assume, hypothetically, that BE is the eventual outcome of the game. The row player gets $2, and the column player gets $9. But is this even possible, knowing that the players perfectly predict each other's behavior? Think of them having magical glasses that let them see the future: would they willingly let this future happen, or deviate from it? In practice, these magical glasses are simply their reasoning skills.
A quick look at the game shows that this (BE) is not a possible outcome: indeed, should the result be BE, then the row player would have known it, and he would have instead picked A rather than B, which guarantees him $4 no matter what the column would have done. This is reductio ad absurdum: BE cannot be the outcome of the game under Perfect Prediction.
We can generalize this and build a whole package of cells that cannot be possible, like so:
The row player considers each of his strategies A, B, C and asks himself: what's the worst that can happen? For A, the worst is $4 if the opponent picks F, for B, it's $1 if she picks F and for C, it's $3 if she picks F (note: it is only a coincidence that every time, the worse case is with the opponent's picking F).
Then, the row player takes the most favorable of these worse cases (this is called a maximin). The row player thus concludes that strategy A guarantees him $4 no matter what the opponent does. Consequently, any cell that gives him less than $4 can simply not happen, since he would have known and not let it happen in the first place, thanks to A.
We can do the same with the column player and see that strategy F guarantees her $4 no matter what the row player does. Consequently, any cell that gives her less than $4 can simply not happen, since she would have known and not let it happen in the first place, thanks to F.
We can thus eliminate any cell that gives either player less than $4, because these cells cannot be known in advance as the outcome of the game.
Game theorists actually have a name for the surviving cells: they are said to be individually rational. As you can see, the Nash Equillibrium (AF) was not eliminated, so everything is consistent and there is agreement that the grey cells cannot be Nash equilibria.
But the next step towards the Perfectly Transparent Equilibrium is to do a second round of elimination: this is where our path diverges from Nashian game theory and where the game theorists I know usually start feeling uncomfortable or strongly unconvinced.
The row player, knowing that the grey cells are not possible outcomes of the game, knows that strategy B guarantees him $7. Why? Because he knows that BE and BF cannot happen: were BE or BF the predicted outcome, then row player would have deviated to A (this is an alternate world within an alternate world: "Mise en abyme"). So, the laws of logics tell us that if in any possible world in which the row player picks strategy B, the column player picks D.
Likewise, the column player notices that strategy D guarantees her $6.
As a consequence, any cell where the row player gets less than $7, as well as any cell where the column player gets less than $6, are not possible outcomes of the game. This leaves us with only one cell left: BD.
The laws of logics thus tell us that, if we assume that the players are such good thinkers, and that they can perfectly predict each other's strategies, then the only outcome that "survives" this common knowledge is BD. This is the Perfectly Transparent Equilibrium. Even if the players have these magical glasses that allow them to see the future, if they see BD as the outcome of the game, they will not resist against it but willingly play towards it (think of the "Flashforward" series).
As it turns out, it is always the case that at most one cell is left (under the assumption of no ties, i.e., no two cells give identical payoffs to a player). For small games, almost 75% of the games do have one unique cell left.
What is interesting about this result is that the one cell that survives, if there is one, is always Pareto optimal, in that it is never suboptimal. Here, the Nash equilibrium (4,4) is suboptimal, because other outcomes (BD, CD) would give both players better payoffs. This never happens with the Perfectly Transparent Equilibrium.
Even though I mentioned magical glasses for pedagogical purposes, we do not need any such glasses in practice. Everything happens with the laws of logics, and the laws of logics alone. It suffices that the players believe that they can perfectly predict each other to reason accordingly and come to the conclusion that at most one cell is a possible outcome of the game.
This seems like a perfect fit for Vulcans, however, the fact that the result fulfils some economic criterion of optimality (Pareto) indicates that it is not a bad idea to think that way if you know that your opponent does as well. The idea to do so on normal form games actually dates back to 1983 (for symmetric games) and is attributable to Douglas Hofstadter. It is called superrationality (Hofstadter D (1983) Dilemmas for Superrational Thinkers, Leading Up to a Luring Lottery. Scientific American).