# Randomized switching in the two-envelope problem

Mark D. McDonnell, Derek Abbott

## Abstract

The two-envelope problem is a conundrum in decision theory that is subject to longstanding debate. It is a counterintuitive problem of decidability between two different states, in the presence of uncertainty, where a player’s payoff must be maximized in some fashion. The problem is a significant one as it impacts on our understanding of probability theory, decision theory and optimization. It is timely to revisit this problem, as a number of related two-state switching phenomena are emerging in physics, engineering and economics literature. In this paper, we discuss this wider significance, and offer a new approach to the problem. For the first time, we analyse the problem by adopting Cover’s switching strategy—this is where we randomly switch states with a probability that is a smoothly decreasing function of the observed value of one state. Surprisingly, we show that the player’s payoff can be increased by this strategy. We also extend the problem to show that a deterministic switching strategy, based on a thresholded decision once the amount in an envelope is observed, is also workable.

## 1. Introduction

The two-envelope problem has a long history, and it is sometimes called the exchange paradox (Zabell 1988) or the two-box paradox (Agnew 2004). It is a difficult, yet, important problem in probability theory that has intrigued mathematicians for decades and has evaded consensus on how it should be treated. The problem can be stated very simply and yet has a nasty sting in its tail. A player is offered two envelopes, one containing x dollars and the other 2x dollars. An envelope is selected at random, and the question is: Should the player keep that envelope or swap it for the other envelope, in order to maximize the payoff or gain? At first glance, the question appears innocent, but it can readily be seen that different lines of reasoning rapidly lead to contradictory positions, resulting in an apparent paradox.

### (a) Historical background

The problem was first posed in 1930 by Kraitchik in a slightly different form, dubbed the necktie paradox (Kraitchik 1930). See also Kraitchik (1943). Then in 1953, the mathematician J. E. Littlewood described a similar paradox based on playing cards and attributed it to the physicist Erwin Schrödinger (Bollobás 1997). Gardner (1982) modernized Kraitchik’s version, calling it the wallet game. To the best of our knowledge, the first paper to express the problem in its now-popular two-envelope format was by Zabell (1988). A number of authors have pointed out that the two-envelope problem shares counterintuitive properties with the St. Petersburg paradox (Arntzenius & McCarthy 1997; Broome 1995; Chalmers 2002), however, it should be noted that these two games are nevertheless clearly distinct from each other.

Another important distinction is that the two-envelope problem should not be confused with the Monty Hall problem (vos Savant 1991; Flitney & Abbott 2002a). While there are superficial similarities in the way these problems are constructed, the nature of the two-envelope problem is entirely different. While the Monty Hall problem is tractable to elementary analysis, the two-envelope problem requires advanced techniques akin to those found in McDonnell et al. (2008).

There have been a number of claims of resolutions to the two-envelope problem (Christensen & Utts 1992; McGrew et al. 1997), but this is mooted (Castell & Batens 1994; Clark & Shackel 2000; Meacham & Weisberg 2003), and the problem is still considered as being an open one (Albers et al. 2005). The area, in general, has attracted wide interest and is hotly debated (Jackson et al. 1994; Brams & Kilgour 1995; Rawling 1997; Blachman & Kilgour 2001; Agnew 2004; Dietrich & List 2005). Some authors consider that there is not yet a satisfactory solution to the two-envelope problem (Albers et al. 2005). The current debate in the literature is a witness to the lack of consensus.

### (b) Motivation

The two-envelope problem is of wide interest, as it impacts on various fields such as game theory, probability theory and decision theory (Langtree 2004). Another reason why it is timely to revisit the two-envelope problem is that it is paradigmatic of recent problems in physics, engineering and economics that involve probabilistic switching between two states. In the field of stochastic control theory, it is known that random switching between two unstable states can result in a stable condition (Allison & Abbott 2001). In game theory, Parrondo’s principle shows us that random switching between two losing games, in some circumstances, can result in a winning expectation (Harmer & Abbott 1999a,b, 2002; Abbott 2009). In economics, a number of models for switching between poor performing investments in order to produce gain are well known. For example, Maslov & Zhang (1998) demonstrate a model where switching between volatile assets and non-performing cash reserves produces an increase in net gain in a fashion not too dissimilar from Luenberger’s (1997)volatility pumping method. Volatility pumping, as described by Luenberger, has a deep similarity with the two-envelope problem in that it also exploits a process doubling and halving amounts of money. There are also closely related models, such as the excess growth model of Fernholz & Shay (1982) and Tom Cover’s universal portfolio (Cover & Ordentlich 1996). Another related form of two-state random switching is a new form of Parrondo’s paradox called the Allison mixture (Allison et al. 2007; Pearce et al. 2007). There are interesting open questions yet to be explored in systemizing these effects—the optimization approach we adopt in this paper may indeed lay a foundation for this.

As we will show in this paper, it is possible to view the two-envelope problem in a similar light to this recent two-state random switching theory, prompting the surprising result that it is possible for the player to produce a net gain. While this is initially counterintuitive, one can begin to see by analogy with effects, such as Luenberger’s volatility pumping and Parrondo’s paradox, that one might indeed expect a winning strategy to exist for a two-envelope player.

This type of study is also significant from a physical viewpoint, as classical games of chance of this type can be translated into the language of quantum game theory and are expanding our understanding of quantum systems (Flitney & Abbott 2002b; Lee et al. 2002; Meyer & Blumer 2002). Both games of chance and games of strategy are currently finding increasing applications in physics (Abbott 2001; Abbott et al. 2002) and econophysics (Johnson et al. 2003).

### (c) Approach with Cover’s switching function

It should be pointed out that our paper focuses on the version of the two-envelope problem where the player observes the amount in one envelope; for discussion on the case where the first envelope is left unopened see Douven (2007). As pointed out by , once one observes an amount in an envelope, it is reasonable to expect the problem to become tractable to decision theory, and we exploit this fact. A summary of variants of the problem and prior approaches can be found in Nalebuff (1989) and Nickerson & Falk (2006).

While surprising, making a payoff in the two-envelope game has been argued as a possibility (Linzer 1994). In our paper, we achieve this goal via a new approach to the two-envelope problem that has not appeared in prior literature. Based on an idea by T. Cover (2003, personal communication), we revisit the two-envelope problem with a random switching policy. The idea is that when we observe the amount in the first selected envelope, we swap to the second envelope with a probability that is a function of the observed amount. This is what we call Cover’s switching function. The idea is we want to make it less likely to swap envelopes when the observed amount is a large sum of money. Conversely, we want to swap with greater likelihood if the initial observed amount is small. Thus, Cover’s proposal is that the probability of switching should be some monotonically decreasing function of the observed amount. In this paper, we explore a number of such functions and demonstrate Cover’s process does indeed work. While it is surprising, experience with other two-state random switching processes, such as Parrondo’s games and the Allison mixture, suggest that it is possible to set up counterintuitive behaviour, provided we choose transition probabilities that break symmetry (Abbott 2009). If we leave the envelopes unopened, the problem becomes symmetrical, but as soon as we open an envelope and swap with a Cover switching function, we break this symmetry.

The rest of the paper is organized as follows. First, we analyse the case of switching envelopes with constant probability. Then, we compare this with the case of swapping envelopes according to a Cover switching function. In order to demonstrate that the switching policy does not need to be a smooth function, we then analyse the case of switching based on a discrete threshold value. So far, we have assumed an unbounded case, then, in §3, we consider what happens when there is an upper bound for the amount that can be placed in an envelope, which is physically a more realistic case. Section 4 provides illustrative numerical results from simulations of repeated ‘plays’ of the two-envelope problem, while §5 contains discussion of open problems and further work.

## 2. Switching in the standard two-envelope problem

In this section, we consider firstly a game that is essentially the two-envelope problem in one of its well-known forms, where there is one player competing against the gambling house. The house randomly chooses an amount of money, and places it in one envelope, as well as twice that amount of money in another envelope. The player then randomly chooses one of the two envelopes. Unlike the simple version of this problem, in this case, the player opens the envelope and observes the amount of money in it. The player then deduces that the unopened envelope contains either twice the observed amount or half of it. The house then offers the player the opportunity to switch envelopes. The problem is for the player to decide whether or not to switch envelopes. By switching, the player must keep the new amount in the second envelope. If not, the player keeps the initial amount in the first envelope.

Suppose the envelopes have equal probability of being chosen by the player, and then no swap is made at all. If we denote the lesser of the two amounts as x, then the return to the player given x is R(x)=0.5x+0.5(2x)=1.5x. If the player can repeatedly choose between different envelopes with different independently chosen contents, from a distribution with expected value E[X], the expected return will be R=1.5E[X]. This fact motivates a new question: is there any strategy the player can use to obtain a better conditional return than R=1.5E[X]? We will show that this is possible when the player switches stochastically, with a probability that is conditional on the value observed in the first envelope.

However, before demonstrating this, it is necessary to discuss certain hidden assumptions built into the original problem description. The first is that there is no upper limit placed on the amount of money that can be randomly selected by the house. The second is that there is no mention of a probability distribution from which the amount is selected. In order to consider expected returns, we must explicitly specify such a distribution. If the player were informed that the smaller amount were ‘chosen randomly,’ the player might assume that it is chosen from a uniform distribution. That is, without any further information or thought, the player would therefore anticipate that the chance of opening the envelope and seeing 1 cent is equal to the chance of seeing a trillion dollars.

For a real game, a rational player would deduce that the house has a maximum amount of money to dispense. Therefore, the player could be forgiven for thinking that if 10 billion dollars were observed in the first envelope, it would be unlikely to find 20 billion dollars in the second envelope. By similar reasoning, if the house wants to minimize the chances of running out of capital, it would ensure a higher probability for choosing a small amount for the first envelope. In other words, the house would probably avoid choosing the amount uniformly between 0 and billions of dollars in a real scenario, and instead choose some other distribution. While the standard description of the two-envelope problem ignores these issues, here we must consider them in order to demonstrate that a player may gain from randomized switching.

In this section, we therefore investigate whether the answer to the problem of switching or not changes when the following precise assumptions are imposed.

1. Letting X be the random variable representing the choice of the smaller amount of money, we assume X is continuously valued and has probability density function (PDF) fX(x)>0 on , where X has finite moments. We further assume that the distribution is stationary. Later, we impose additional more realistic constraints where X has a finite upper limit.

2. The player has no knowledge of the PDF fX(x). We note that, if a sequence of games are played, a rational player could theoretically use knowledge of amounts previously offered to continually update an estimate of the distribution from which the amount is chosen, and adapt the strategy accordingly. This is beyond the scope of this paper, and as a first step, we impose the simpler condition that the player’s switching strategy must be decided in advance of any games being played.

3. In order to make a general formulation, we impose a slight generalization to the original description, and allow the envelope containing the smaller amount of money to be offered to the player with probability p, instead of assuming that it is offered with probability 0.5.

4. The player uses Cover’s randomized switching strategy that is based on the amount observed in the opened envelope.

We now demonstrate for the first time that Cover’s switching strategy based on chance can do better than simple switching. We emphasize that our equations below are based on information that may be unknown to the player. All the player can do is to select a switching strategy that depends on the observation y. However, the point we demonstrate requires analysis based on an actual distribution for x, which we assume the player cannot know.

### (a) General expression for expected winnings

Suppose on one trial of the game, an amount x is selected from the distribution fX and placed in one envelope, while 2x is placed in the other envelope. Let the amount observed by the player in the opened envelope be y. The sample space of y is {x,2x}, with P(y=x)=p and P(y=2x)=1−p. Let the probability of switching be a function of the observation, i.e. PS(y)∈[0,1]. The probability that the player ends the trial with amount x is and the probability of finishing with 2x is Hence, given X=x, the expected return is The average return over the distribution of X is then

We are interested in whether or not the average return is increased by switching. The average return when the player never switches is the benchmark, and is given by R when PS(y)=0 ∀ y. That is, the benchmark base return is Therefore, the absolute average gain or loss incurred by switching according to PS(y) is 2.1 It is rational for the player to try and choose PS to maximize G. Clearly, foreknowledge of x would allow the player to always switch when offered x and never switch when offered 2x, resulting in a maximum gain of pE[X], for a return of 2E[X]. Since G is defined in terms of the strategy PS(y), and the player does not know x, the strategy must be random, given x, rather than deterministic, given x. Clearly, G<pE[X], and the player should try to choose PS(y) to maximize G.

Using a change of variable, we can rewrite G as 2.2

We now consider several possibilities for the switching strategy PS(y).

### (b) When does the player gain from switching envelopes?

From equation (2.1), a sufficient condition for a gain from switching is that go(x):=pPS(x)−(1−p)PS(2x)≥0 ∀ x, with strict inequality for some range of x.

Note that this is a sufficient but not necessary condition. It is feasible that go(x) could be small and negative for some x, but large and positive for other x, and lead to G>0. We can rewrite the sufficient condition as 2.3

We now consider three classes of strategy that might be attempted.

#### (i) Switching with a constant probability

Let the probability of switching be a constant, PS(y)=qy. The gain is

• — If p=0.5, then Gc=0, and there is no gain from switching, regardless of q.

• — If p<0.5, then Gc<0, unless q=0, and the player would always lose by switching with a constant probability q>0. When q=0, we have Gc=0.

• — If p>0.5, then Gc>0, and the player should switch with probability q=1. For any p with q=1, the gain or loss will be Gc=(2p−1)E[X] and total return Rc=(p+1)E[X].

If the house provides to the player any hint of a bias towards offering x rather than 2x, then always switching provides an expected gain over never switching. Conversely, never switching would result in an expectation of loss. On the other hand, if the house has bias towards offering 2x rather than x, then not switching is better than switching.

The above reasoning can alternatively be obtained by consideration of the sufficient condition, inequality (2.3). In this case, the first part of the sufficient condition for a gain reduces to i.e. p≥0.5. We observe that go(x)=0 ∀ x if p=0.5, and hence p>0.5 is required for a gain. Otherwise, obtaining a gain requires an observation-dependent switching strategy.

#### (ii) Cover’s proposal: switching with smoothly decreasing probability

We now suppose we choose the switching probability to be a monotonically decreasing function of the observed amount y, where . This is the essence of Cover’s proposal, and we now begin to see this always produces an expected gain. A valid switching function that always provides expected gain is 2.4 From inequality (2.3), there will be an expected gain, provided that we satisfy This inequality holds when p≥0.5, with strict inequality for x>0. Thus, PS(y)=eay provides a gain whenever p≥0.5. It may also provide a gain when p<0.5, but we do not consider this here. The gain is There is no contradiction here, even though, in the first formula, everything is positive (for p≥0.5), while in the second, the term in square brackets could be positive or negative. We are integrating, so, for any fX where the second integrand is negative, the integral will be larger for regions where it is positive.

Further examples of suitably monotonically decreasing switching functions that guarantee expected gain include and

#### (iii) Switching according to a threshold value of y

A further possibility is to replace the idea of a smooth switching function with a simple threshold decision. Considering equation (2.2), we define g(ϕ):=pfX(ϕ)−0.25(1−p)fX(ϕ/2) and suppose, for example, that g(ϕ)>0 for ϕ∈[0,a) and g(ϕ)<0 for . We now demonstrate that, under these conditions, the player can ensure an average gain from always switching, compared to never switching. As discussed below, this appears to lead to the paradoxical situation (for p=0.5) that the player will gain by always switching.

Suppose the strategy is 2.5 where b is the threshold amount, above which switching ceases.

The sufficient condition of inequality (2.3) is satisfied for all p≥0.5. It is true with strict inequality when These regions of x are the regions that provide a positive average expected gain. Other values of x provide zero gain. From equation (2.2), the gain as a function of b is If p=0.5, then , and a gain is guaranteed, regardless of the choice of b. The maximum possible gain, when g(ϕ) has only one change of sign, is given by b=a. However, since the distribution fX is unknown, the player can only guess a value of b. We have shown that any positive choice of b leads to a gain when fX has positive support on and p=0.5.

This result may seem paradoxical, as it suggests that switching provides a gain, even if the threshold value , which corresponds to almost never switching. Likewise, if , the strategy is to almost always switch. However, in these extreme strategies, the gain will be extremely small and approach zero.

### (c) Uniform fX

In the colloquial description of the two-envelope problem, it is not specified that x is chosen from a probability distribution. However, in practice when the game is played, it must always belong to some distribution (the distribution may be singular or non-stationary), and a rational player would deduce this. Indeed, the problem that the player does not know the prior distribution from whence the envelope amount was selected has been recognized (Jackson et al. 1994). In the absence of information about the actual distribution, the player may naively assume that it is uniform and without an upper limit. If this were actually the case, we can only consider a non-physical limiting case for a uniform distribution of X, and we now show that the player could never gain from switching in such a case. Let For this case, switching according to a threshold value of y does not lead to a gain. Suppose the threshold value is such that b<d. Then, Since , , and there is no gain. To avoid this problem, we can simply consider the case where X is drawn from a non-uniform distribution that still has an infinite upper bound, such as an exponential distribution. This case is considered as an illustration in §4. First, however, we consider the case where an upper bound on the choice of X is specified.

## 3. The two-envelope problem with an upper limit on the offered amount

All analysis in the previous section assumes that fX(x)>0 for . The problem changes when there is an upper bound on x such that x∈[0,B] and . A rational player’s strategy will depend on whether an upper limit, B, is known to them. If the player does not know the upper limit, the equations in the previous section apply, with the exception that fX(x)=0 for . We reiterate at this point that our equations are based on information that is unknown to the player. However, our point is to demonstrate that it is feasible for the player to gain if a switching strategy is based on the observation, y, whereas there is no gain if the switching strategy is independent of y. In this section, we assume B is known to the player, while fX(x) is otherwise unknown.

Since the player does know the upper limit, the observed amount will be y∈[0,2B]. It can then be inferred that the player should never switch whenever the observed amount is y∈(B,2B], since the offered envelope is the one with the higher amount containing 2x, i.e. when x∈[0.5B,B], which occurs with probability 1−p. This fact means a good switching strategy may be discontinuous, and can be written as Provided the player is rational and uses such a strategy, the benchmark gain should now contain an extra term. The probability the player ends the trial with amount x is and the probability of finishing with 2x is The expected return given X=x is The average return over the distribution of X is then

## 4. Numerical simulation

In order to experimentally confirm that gain can be achieved via Cover switching, we present simulation results for an N-repeated play of the two-envelope game. Of course, we must specify a distribution from which the smaller amount is selected. As discussed, in practice, it would be likely that the house ensures a distribution such that larger amounts occur less frequently. It also requires fewer assumptions if we do not specify an upper limit to the amount. Therefore, we suppose that X is exponentially distributed, with mean value 3, i.e. where the choice of the mean value is entirely arbitrary for illustrative purposes.

Figure 1.

Empirical average gain to the player after simulations of N= 20 000 repeats of the two-envelope choice. The smaller amount placed within the two envelopes, x, is chosen by the house independently for each repeat from an exponential distribution with mean = 3. The horizontal lines show the theoretical mean gain for each of three strategies. Case 1 is switching according to equation (2.4) with a=0.5. Case 2 is switching according to equation (2.5) with . These simulations illustrate that the player can gain, in comparison to not switching, in both cases. For each case, we show five independent sample paths for the process. Thin black lines at bottom (near a mean gain of zero), no switching; thin grey lines, random switching case 1; thick black lines near top, random switching case 2.

We repeat the simulation for two switching policies: (i) case 1, the exponential switching function of equation (2.4), and (ii) case 2, the threshold switching function of equation (2.5). These results are shown in figure 1 for N=20 000 trials, along with the case for no envelope switching. In each case, the graph shows five sample paths to give an idea of the spread. We can clearly see that, for no switching, the sample paths average to zero, and for both switching policies, a non-zero gain exists. The actual values of the gains are arbitrary, as they will depend on the parameterization of the switching functions, the specific values of which are indicated in the caption of figure 1. This raises the fascinating open question in optimization theory, for future study, namely, how to select the best switching function for a given distribution. It also raises the question of which blind optimization strategies can be adopted when, for example, there is a specified family of distributions or specified moments, but the distribution is otherwise unknown to the player.

## 5. Conclusion and open questions

We have successfully demonstrated that a two-envelope player can indeed increase their expected gain by adopting randomized switching, where the probability of switching is a monotonically decreasing function of the observed envelope’s contents, as suggested by Cover. Furthermore, a deterministic switching strategy where switching is always chosen when the observed contents is smaller than some threshold value can also lead to a gain.

There are a number of specific open questions that arise from this analysis for future work. How do we find the best switching functions and how sensitive are they to the distribution fX when the player does not know this distribution and the strategy must be specified in advance of repeated plays of the game? On the other hand, if a sequence of games is played, where a distribution for x is unknown to the player, and the strategy is not specified in advance, a rational player could adopt an adaptive strategy where the observed envelope amounts are continually updated to estimate the distribution from which the amount is chosen.

Wider open questions may touch upon applications of the results to control problems, optimization problems and to volatility pumping models on the stock market. There are a number of counterintuitive phenomena based on randomized two-state switching where Parrondo’s principle comes into play. There remains the question of systemizing and drawing all these phenomena under a common mathematical framework.

## Acknowledgements

M.D.D. is funded by an Australian Research Council APD Fellowship, grant no. DP0770747. The private communication occurred over lunch between Tom Cover and D.A. in September 2003, at Stanford University—D.A. would like to thank A.E. (Tony) Siegman for hosting that visit to Stanford University.