## Abstract

The two-envelope problem (or exchange problem) is one of maximizing the payoff in choosing between two values, given an observation of only one. This paradigm is of interest in a range of fields from engineering to mathematical finance, as it is now known that the payoff can be increased by exploiting a form of information asymmetry. Here, we consider a version of the ‘two-envelope game’ where the envelopes’ contents are governed by a continuous positive random variable. While the optimal switching strategy is known and deterministic once an envelope has been opened, it is not necessarily optimal when the content's distribution is unknown. A useful alternative in this case may be to use a switching strategy that depends randomly on the observed value in the opened envelope. This approach can lead to a gain when compared with never switching. Here, we quantify the gain owing to such conditional randomized switching when the random variable has a generalized negative exponential distribution, and compare this to the optimal switching strategy. We also show that a randomized strategy may be advantageous when the distribution of the envelope's contents is unknown, since it can always lead to a gain.

## 1. Introduction

The two-envelope problem and the associated two-envelope paradox (also known as the exchange problem, and the exchange paradox) are intriguing conundrums that have captured the attention of mathematicians, economists, philosophers and engineers for over half a century—see a brief review in McDonnell & Abbott (2009). Further areas of potential impact to applications areas and open questions arising are discussed by Abbott *et al.* (2010).

There are many versions of the two-envelope problem/paradox (Nalebuff 1989; Nickerson & Falk 2006). However, it can be quickly demonstrated that any apparent paradox seen when analysing such problems is the result of incorrect mathematical reasoning that overlooks Bayes’ theorem (Linzer 1994; Brams & Kilgour 1995*a*; Blachman *et al.* 1996). An example that illustrates this is provided herein as a remark in appendix A; the source of the problem is essentially that what is actually a conditional probability is incorrectly assumed to be an unconditional probability.

We now give an overview of the two-envelope problem considered in this paper and then discuss the relevance of optimization of gain.

### (a) The two-envelope problem

The two-envelope problem can be thought of as a game where there is a ‘house’ and a ‘player.’ The objective for the player is to gain money from the house. The version of the two-envelope problem that we consider is one in which the house selects a value *x* and seals the amounts $*x* and $(2*x*) into two identical envelopes. We assume here that *x* is an outcome of a continuously valued non-negative random variable, *X*. The player chooses one envelope at random, opens it, observes the value, *y*, then decides whether to keep or switch envelopes in order to maximize the payoff.

Suppose the player repeats this two-envelope game many times with different values of *x* chosen independently from the same distribution. Then whether or not the player knows what that distribution is, it can be shown that they cannot improve their average return by employing ‘switching strategies’ based on either (i) a deterministic decision that is independent of the observed value, i.e. always keeping or always switching; or (ii) switching with a constant probability that is independent of the observed value. However, the existence of conditional randomized strategies by which the player can on average make a gain by switching, compared with not switching, has been recently demonstrated and explored in McDonnell & Abbott (2009). These conditional strategies make use of the player's observation of the value *y* in the opened envelope.

This fact that conditional randomized switching can lead to a gain has also been recognized in a brief proof by Ross (1994) in response to Christensen & Utts (1992), and in a comment in the last paragraph of the response of Blachman *et al.* (1996) to the same paper.

Such conditional randomized switching strategies are not actually optimal if the distribution of the money sealed into the envelope is known to the player. When the distribution is known, the optimal strategy is to switch envelopes with probability 1 when the observed amount, *y*, satisfies a simple condition that depends on the distribution of the money, as pointed out by Christensen & Utts (1992), Brams & Kilgour (1995*a*) and Blachman *et al.* (1996). That is, the conditional randomized strategy of Ross (1994) and McDonnell & Abbott (2009) is suboptimal. Appendix A contains three different proofs of the optimal strategy, which is presented as theorem 2.4. We include the three proofs here as they each provide different insights into the two-envelope problem and the two-envelope paradox—these are summarized immediately after the presentation of theorem 2.4.

However, since the optimal strategy is derived under the assumption that the distribution that governs the selected value, *x*, is known, a randomized strategy may still be of value in the absence of such knowledge.

In mathematical finance, there is a body of work on two-state Markov switching models that embed both *drift* and *volatility* (Elliott *et al.* 2008), and thus future research that examines how the incorporation of these effects alter the optimality of the two-envelope switching process would be of interest in the field.

In this paper, we focus on determining exactly how suboptimal the conditional randomized switching strategy of McDonnell & Abbott (2009) is, in comparison with the optimal switching strategy. We also demonstrate that the conditionally randomized strategy may be a superior choice in a scenario where the player does not know the distribution of *X* or the probability with which they receive the smaller amount, in comparison with a conditional non-randomized strategy.

### (b) Genesis of the two-envelope problem and relevance to finance

The origins of the problem trace back to Kraitchik (1930), who first described the so-called *necktie paradox*. The essential features of this problem were then recast, in 1953, in terms of two wallets containing money (Kraitchik 1953), which then became known as the *wallet game* (Gardner 1982). Also in 1953, the essence of the problem was independently attributed to the physicist Erwin Schrödinger (Bollobás 1997). In 1988, the problem was restated in its present form as the two-envelope game (Zabell 1988).

In 2003, Tom Cover suggested the remarkable idea that if the player swaps to the second envelope with a probability that is a function of the observed amount, possible conditions exist that give rise to expectation of positive payoff for multiple independent plays (McDonnell & Abbott 2009). In 2009, this was formally demonstrated and this switching policy was dubbed *Cover switching* (McDonnell & Abbott 2009; Abbott *et al.* 2010). This scheme is now known to work if the probability of switching envelopes is chosen to be a *monotonically decreasing* function of the observed amount in the first envelope. The origins of Cover's thought can be traced to a paper in 1987, where he proposed a similar strategy for improving the chances of decidability for picking the largest of two randomly selected numbers conditioned upon observing one of them (Cover 1987).

The two-envelope problem is motivated by the fact that it encapsulates the task of maximizing payoff between two possible choices embedded with uncertainty (Agnew 2004). This type of decision-theoretic scenario is of significance, appearing in a number of fields ranging from physics and engineering to economics, as it touches upon decision theory, game theory and probability theory (Langtree 2004). It has been shown that a simple threshold decision can be adopted to increase a player's payoff (McDonnell & Abbott 2009) in the two-envelope problem, and this may be of interest in economic decision making such as in the optimization of a threshold or *barrier level* for path-dependent exotics such as *barrier options* (Rich 1994).

In the two-envelope problem, it has been pointed out that the act of swapping envelopes, after observing a value in one envelope, and adopting the Cover switching policy, breaks the symmetry that existed before an envelope was opened (Abbott *et al.* 2010). Discrete-time processes that produce a growth in some payoff, owing to symmetry-breaking, are what are called discrete-time *Brownian ratchets* (Abbott 2010) and are in the class of what are called *Parrondo games* (Harmer & Abbott 1999); see also Ivanitskii (2010).

Parrondian phenomena are of increasing interest in the area of finance (Johnson *et al.* 2003), because of the exploitation of asymmetry for ratcheting up payoffs. In terms of the stockmarket, Boman *et al.* (2001) has used a Parrondian framework for studying the dynamics of insider information. A number of models, in the Parrondian class, for increasing payoff by switching between poorly performing investments are well-known. For example, Maslov & Zhang (1998) demonstrate a model where switching between volatile assets and non-performing cash reserves produces increased payoff in a fashion not too dissimilar from Luenberger's *volatility pumping* method (Luenberger 1997).

There are also closely related models such as the excess growth model of Fernholz & Shay (1982) and Cover's universal portfolio (Cover & Ordentlich 1996). Stutzer (2010) shows connections between essential features of Parrondo effects and *portfolio rebalancing*. Further investigation into the two-envelope problem is, therefore, of interest for building mathematical foundations in these areas. Future studies that build on our work to extend the results to *fat-tailed* (heavy-tailed) distributions (Jacquier *et al.* 2004), such as the t and alpha-stable families, would thus be of interest in economics and finance.

In economic theory and finance, the study of *information asymmetry* is of importance where a party gains an advantage by having better access to information (Aboody & Lev 2000). The two-envelope problem now motivates the study of a fascinating type of situation where symmetry initially exists, but one party gains an advantage by breaking the symmetry and extracting information that is not apparently of *prima facie* value.

### (c) Outline

In this paper, we explore the extent of suboptimality of conditional randomized switching. In order to proceed, in §2, we define our notation, and discuss further the optimal switching strategy for the two-envelope problem. Section 3 then provides mathematical analysis of the expected gain for two cases of switching strategies. This includes a comparison of the gain for two concrete examples of the distribution of the amount selected by the house, for each switching strategy. Simulations of repeated ‘plays’ of the two-envelope problem were presented in McDonnell & Abbott (2009). Here, §4 provides analytical support to those simulations, via a derivation of the variance of the gain. We then use this to produce confidence intervals for the gain, and compare with simulations. We conclude the paper and suggest future work in §5. Appendix A contains three proofs of the optimal switching strategy, while appendix B (the electronic supplementary material) contains all other necessary mathematical proofs.

## 2. Mathematical background

### (a) Mathematical definitions and notation

Consider the two-envelope problem where a non-negative continuous random variable *X*, with finite mean, is drawn according to density *f*_{X}(*x*), , and the player is offered
2.1where *I*∈{0,1} is Bernoulli with . Thus, the player will be presented with *X* stochastically, with probability *p*=*P*_{Y |X}(*x*|*x*), and with 2*X* with probability 1−*p*=*P*_{Y |X}(2*x*|*x*). The player observes *Y* but not *X*. The player can either keep the amount *Y* , or switch to the complementary amount.

Suppose the player randomly switches with probability 0≤*P*_{S}(*y*)≤1 upon observation of *y*. We introduce the function *g*(*x*), which was defined in McDonnell & Abbott (2009) as
2.2Then following McDonnell & Abbott (2009, equations (2.1) and (2.2)), the *average gain* with switching function *P*_{S}(*y*) is
2.3
2.4
2.5where *average gain* is defined as the difference between the average return when using a switching strategy and the average return a player will obtain when never switching, which is the *benchmark return* (McDonnell & Abbott 2009)
We denote the *mean return* as
2.6Clearly, the average gain can be both positive and negative, and, from equation (2.4), must be bounded in the interval [−(1−*p*)E[*X*],*p*E[*X*]].

We now state three lemmas that will be useful in subsequent sections. Note that appendix B (the electronic supplementary material) contains mathematical proofs for all lemmas and corollary 3.3.

### Lemma 2.1

*Suppose the player switches with a constant probability k, regardless of the observed value, y. That is, P*_{S}(*y*)=*k, with k*∈[0, 1] *a constant. Then the maximum average gain in this case, G_{k}, is*
2.7

*where the region with non-zero gain is achieved with k*=1,

*and the region with zero gain with k*=0.

In §3, we shall use this lemma to calculate a benchmark gain for comparison with other switching strategies.

### Lemma 2.2

*A positive gain is obtained for any density f*_{X}(*x*) *when p*>0.5 *and P*_{S}(*y*) *is a monotonically decreasing function of y*.

This lemma tells us that a suitable choice of randomized switching strategy will always provide a gain when *p*>0.5. It does not however tell us anything regarding the special case of *p*=0.5 (or indeed for *p*<0.5).

### Lemma 2.3

*If f*_{X}(*x*) *is a monotonically decreasing function of x, then a negative gain is obtained for p*<1/5 *unless P*_{S}(*y*)=0 ∀*y* (*i.e. the player never switches*).

This lemma allows us to exclude consideration of values *p*<1/5 in later sections of the paper.

### (b) The ‘non-blind’ two-envelope problem

Unless otherwise specified, in this paper it is assumed that the player knows *p* and *f*_{X}(*x*), as well as *y* (but not the outcome of *I* or *X*). Since the player has full information on the probability with which the player will receive the envelope containing the smaller value, and the density from which that value is drawn, we refer to this situation as the ‘non-blind’ two-envelope problem.

Under these assumptions, the optimal switching strategy based on the observed value *y* has been derived for the case of *p*=0.5 (Christensen & Utts 1992; Brams & Kilgour 1995*a*; Blachman *et al.* 1996). Note that Brams & Kilgour (1995*a*) provide a correct derivation, while Blachman *et al.* (1996), with reference to the work of Brams & Kilgour (1995*a*), demonstrate why the derivation of Christensen & Utts (1992) contains an error (one that has been repeated in Castell & Batens (1994)). Note that the form of the optimal solution is slightly different, by a factor of 2, if *X* were a discrete distribution—as considered in Linzer (1994)—rather than continuous, e.g. Brams & Kilgour (1995*a*). As pointed out by Blachman *et al.* (1996), the expression of Christensen & Utts (1992) would be correct if a discrete distribution had been assumed rather than a continuous one.

Here, we extend this optimal strategy for continuously valued *X* to arbitrary *p*, and present it as a theorem.

### Theorem 2.4

*The optimal switching function P**_{S}*(y) for the ‘non-blind’ two-envelope problem is*
2.8

As a consequence of theorem 2.4, it is clear that there will be intervals of *y* for which it is optimal to switch with probability one, and complementary intervals where it is optimal to switch with probability zero. For the special case of *p*=0.5, theorem 2.4 recovers the ‘*exchange condition for continuous distributions*’ of Brams & Kilgour (1995*a*).

Appendix A provides three proofs of this theorem. The third proof relies on Bayesian analysis, and is essentially the same as that of Brams & Kilgour (1995*a*) and Blachman *et al.* (1996) (which, however, considers only *p*=0.5). We state it here as it provides complementary insights to the first two proofs, via a transparent demonstration of how a two-envelope ‘paradox’ can arise through incorrect mathematics. Our second proof uses calculus of variations, but is actually just a more elaborate version of the first proof, which is a concise proof by contradiction. Both do not rely on Bayesian analysis.

The second proof is amenable to several interesting extensions, where there are additional convex constraints on the switching function, *P*_{S}(*y*). For example, one could consider a modified version of the problem where it costs the player *c*(*y*) to switch when the observed value is *y*. The player may have an average cost constraint (again a linear constraint on *P*_{S}(*y*)),
2.9This constraint can be easily accommodated in the formulation of the second proof. If *c*(*ϕ*) and *C* are such that then *P**_{S}(*y*) remains optimal. If *P**_{S}(*y*) is ‘too expensive’, there will be a new optimal switching function. It can be verified that this function will again take values from {0,1} and the only difference will be the location of the threshold(s), which will be modified to meet the cost constraint.

### (c) The ‘blind’ two-envelope problem

Unlike the ‘non-blind’ two-envelope problem, the situation considered in McDonnell & Abbott (2009) is that where the player knows only *y* but not *p* and *f*_{X}(*x*). Without knowing *f*_{X}(*x*) and *p*, while the player cannot use the optimal strategy of theorem 2.4, it is shown in McDonnell & Abbott (2009) that a conditionally randomized switching strategy where the probability of switching monotonically decreases with the observed value *y* always leads to a gain *G*>0 for *p*≥0.5, and it may lead to a gain for *p*<0.5.

The example of such a strategy considered in McDonnell & Abbott (2009) and Abbott *et al.* (2010) is given by
2.10for some parameter *a*>0. Here, we refer to this as *negative exponential switching*. In McDonnell & Abbott (2009) and Abbott *et al.* (2010), this switching rule is referred to as *Cover switching*.

The second strategy of McDonnell & Abbott (2009) uses a deterministic decision rule, in which the envelopes are switched if the observed value is less than some threshold value, *b*. It was shown that this strategy leads to a gain for any value of *b* for *p*=0.5. We refer to this strategy as *threshold switching*, and write the switching function as
2.11This switching strategy has a similar form to the optimal switching strategy of theorem 2.4. However, the optimal strategy may include multiple regions of *y* for which it is optimal to switch with probability one, and multiple complementary regions where it is optimal to switch with probability zero. Therefore, threshold switching must be assumed to be suboptimal, unless proved otherwise.

The results presented in McDonnell & Abbott (2009) are based on the assumption, made entirely for the purposes of illustration, that the smaller value *X* (which is not observed by the player) is drawn from a negative exponential probability density function (PDF) with mean (and standard deviation) equal to 3,
2.12One of the goals of this paper is to generalize the results of McDonnell & Abbott (2009) in several ways, including (i) examining the effect of generalizing the distribution of *X* to that of a generalized negative exponential; and (ii) finding the maximum gain that can be achieved for both negative exponential switching and threshold switching, for a given density of *X*, and the parameter in each case (*a* and *b*). Since we wish to maximize the gain with these (generally) sub-optimal strategies as a function of *f*_{X} and *p*, we are in fact addressing the ‘non-blind’ two-envelope problem in most of this paper.

For the family of distributions we consider here, the optimal value of *b* will result in the same solution for maximum gain as that obtained by using the optimal strategy from theorem 2.4. The outcome of our analysis will be to establish exactly how sub-optimal negative exponential switching is, in comparison with optimal switching.

However, before progressing to this, we firstly demonstrate using the results plotted in figure 1, that if *a* or *b* are arbitrarily chosen without knowledge of *f*_{X}, there are large ranges of choices for which negative exponential conditional randomized switching outperforms threshold switching. Therefore, if through lack of knowledge of *f*_{X} one is unable to select the optimal value of *b* for threshold switching, and is forced into ‘guessing’ a value, one might be better off guessing a value for *a* and using the negative exponential conditional randomized switching scheme instead. We do not address this in the present paper, but leave the quantification of this scenario for future work.

The plot in figure 1 also demonstrates the results derived in McDonnell & Abbott (2009), that both switching strategies always lead to a gain *G*>0 when *p*=0.5.

### (d) The generalized negative exponential distribution

In this paper, we consider a specific family of distributions for *f*_{X}(*x*), with a focus on two special cases from this family. This is the generalized negative exponential family, which has PDF
2.13where *Γ*(*α*) is the gamma function, (Spiegel & Liu 1999). The shape of the PDF is determined by the parameter . Figure 2 shows *f*_{X}(*x*) for a range of *α*-values, with a mean *μ*=1.

We chose this family for three reasons. First, the negative exponential distribution is obtained when *α*=1, and this is the example distribution considered in McDonnell & Abbott (2009). Second, the shape of the density can be varied all the way from uniform when *α*=0, through ‘half-Gaussian’ when *α*=0.5 to *heavy tailed* when *α*>1, and becoming more heavy tailed as , which is a case that is of interest in finance and economics (Jacquier *et al.* 2004). Third, the negative exponential and uniform distributions are also considered briefly by Brams & Kilgour (1995*a*), but only for *p*=0.5. Below, we examine the effect of varying *α* on the gains that the player may achieve.

It is convenient to reduce the parameter space by insisting that *X* has the same mean *μ* for all values of *α*. This implies that
2.14and that the standard deviation, *σ*, is given by
The first special case we consider is the negative exponential, for which *α*=1, *μ*=*σ* and the PDF is
2.15The second special case is the uniform distribution, for which *α*=0, and the PDF is
2.16where *χ*_{[u,v]}(*x*)=1, *x*∈[*u*,*v*] and zero otherwise.

Finally, we state a lemma that will be useful in subsequent sections.

### Lemma 2.5

Any generalized negative exponential density *f*_{X}(*x*) with *α*>0 is a monotonically decreasing function of *x*.

## 3. Maximizing the gain for the two cases of switching function

In this section, we begin by deriving conditions for optimality of the single parameter within each of the two switching functions—negative exponential and threshold—for arbitrary distributions of *X*. We then specialize to the case of the generalized negative exponential distribution, for which it is shown that optimized threshold switching is also the optimal switching strategy out of all possible strategies for the ‘non-blind’ two-envelope problem.

### (a) Arbitrary densities, *f*_{X}(*x*)

We assume *f*_{X}(*x*) is absolutely continuous and differentiable for .

#### (i) Negative exponential switching

The switching function for this case is , and the gain *G* can therefore be written as a function of *a*,
3.1Note that since is monotonically decreasing for *a*>0, by lemma 2.2 *G*_{1}(*a*)>0 ∀*p*>0.5, *a*>0. However, by lemma 2.1, there will also always be a gain from always switching when *p*>0.5, which corresponds to *a*=0. It remains for us to determine the optimal value of *a* as a function of *p*.

Differentiating equation (3.1) with respect to *a*, and setting the result to zero, yields the functional equation
3.2where *L*(*a*) is the Laplace transform of *ϕ*^{2}*f*_{X}(*ϕ*) (Spiegel & Liu 1999). Any solutions of equation (3.2) provide stationary points of *G*_{1}(*a*). Since we seek the value of *a* that maximizes *G*_{1}, for arbitrary *f*_{X}(*x*), we need to consider all stationary points of *G*_{1}(*a*) that are local maxima, as well as the boundary values, *a*=0 and . We denote the global maximum of all such points as *a*_{opt}, and this value of *a* yields the maximum gain, *G*_{1}(*a*_{opt}).

If no solution to equation (3.2) exists for then we must have either *a*_{opt}=0 or . Both values correspond to special cases of the conditions specified in lemma 2.1, i.e. *a*=0 is equivalent to *k*=1 and is equivalent to *k*=0. Thus, in this situation, the optimal gain is given by equation (2.7).

We now present a lemma that assists in determining the optimal value of *a* for all values of *p*.

### Lemma 3.1

For *p*∈[2/3,1), and negative exponential switching, the optimal value of *a* is *a*_{opt}=0, and the optimal gain is *G*_{1}(*a*_{opt})=(2*p*−1)*E*[*X*].

We now demonstrate that for a certain class of *f*_{X} that for *p*∈[1/5,2/3) there is at most one value of *a* that satisfies equation (3.2). This fact allows us to use numerical optimization techniques to find the optimal value of *a*. To this end, we make use of the following lemma.

### Lemma 3.2

*Let p*>1/5 *so that g*(0)>0. *Let r*(*x*) *be a continuous, increasing function that is positive in* . *If g*(*x*) *has a single sign-change—i.e. a sign-change occurs at x*=*A if there exists some value A such that g*(*x*)>0 *for x*∈[0,*A*) *and g*(*x*)≤0 *for* —*then the function* *has at most one real root*.

This lemma leads us to the following corollary.

### Corollary 3.3

*If g*(*x*) *has a single sign-change, then the gain for negative exponential switching, G*_{1}, *has at most a single stationary point with respect to a, and if it exists this stationary point is the maximum value of G*_{1}(*a*) *for* *provided p*∈[0,2/3].

#### (ii) Threshold switching

We begin by noting that the optimal switching function of theorem 2.4 may have multiple threshold values of *y* at which *g*(*y*)=0 that demarcate between regions of always switching and never switching. The strategy we call *threshold switching* has only one such threshold value, and must therefore be suboptimal in comparison with the strategy of theorem 2.4, *unless* the distribution of *X* is such that *g*(*y*) has exactly one solution for *y*.

Consequently, in this section, we are seeking the optimal parameter, *b*, for threshold switching, without consideration of how many switching points should be used according to the strategy of theorem 2.4. The gain for threshold switching as a function of the threshold parameter *b* can be written as
3.3where *u*(·) is the Heaviside unit step function. We seek the value of *b* that maximizes *G*_{2}(*b*), and denote this as *b*_{opt}. Differentiating equation (3.3) and expressing the derivative of the Heaviside function as the Dirac delta function, *δ*(·), we find
Setting this to zero gives a condition for any stationary points, *b*_{s}, of *G*_{2},
3.4The optimal parameter, *b*_{opt}, must be either zero, infinity or a value from the set of all solutions to equation (3.4).

We may attempt to reduce this set by considering whether any of the stationary points are maxima, by aiming to show that . We have
Assuming the existence of a solution to equation (3.4), and then substituting equation (3.4) into the above results in
Thus, for any solution *b*_{s}>0 to be a local maximum, we have a sufficient condition
3.5This condition needs to be considered for any specific choice of *f*_{X}(*x*).

### (b) Generalized negative exponential density

#### (i) Negative exponential switching

Although below we show that analytical solutions for the optimal parameter, *a*, and the corresponding gain can be found for *α*=1, for general *α*, we are unable to find an explicit solution. We, therefore, seek a numerical solution to equation (3.2). This is made simpler when it is noted that lemmas 2.3 and 2.5 together imply that if *p*≤1/5, then a negative gain will result for generalized negative exponential *X*, unless the player never switches. This is equivalent to . Also, from lemma 3.1, for *p*>2/3 we have *a*_{opt}=0. Thus, we can restrict the numerical procedure to the interval *p*∈(1/5,2/3).

By lemma 3.2, there is at most one solution to equation (3.2) provided *g*(*x*) has a single sign change. As we now show, this is the case for the generalized negative exponential distribution for *p*>1/5.

### Lemma 3.4

*If f*_{X}(*x*) *is a distribution from the family of generalized negative exponential distributions with* *and p*∈(1/5,1], *then g*(*x*) *has a single sign change*.

Consequently, since there is a single solution to equation (3.2), *G*_{1}(*a*) can have at most one maximum with respect to *a* for *p*∈(1/5,2/3), by corollary 3.3. This allows us to use standard gradient descent numerical methods to maximize equation (3.1), for any given *p* and *α*. The result of carrying out this maximization for the interval *p*∈(1/5,2/3)—combined with *a*_{opt}=0 for *p*≥2/3 and for *p*≤1/5 is shown in figure 3*a*,*b*.

#### (ii) Threshold switching

For this switching strategy, we are able to obtain analytical solutions for the optimal value of *b*, *b*_{opt} and the corresponding optimal gain, *G*_{2}(*b*_{opt}).

Substituting equation (2.13) into equation (3.4) and simplifying yields the condition for a stationary point for *α*>0,
3.6By lemmas 2.3 and 2.5, the optimal value of *b* for is *b*_{opt}=0 and the resultant gain is zero. It can be seen that no solution to equation (3.6) exists when *p*<1/5.

For *p*∈(1/5,1), we have a single stationary point, *b*_{s}. It remains to check that this is a global maximum. That this is true can be proved by substituting *b*_{s} into the sufficient condition (3.5) to get
Using equation (3.4), the above condition becomes 1≥2^{−1/α}, and the condition is true for all *α*≥0.

Consequently for *α*≥0,
3.7The corresponding maximum gain for threshold switching can be written as
3.8where *P*(·,·) is the incomplete gamma function (Spiegel & Liu 1999),
Notice that both *b*_{opt} and *G*_{2}(*b*_{opt}) scale linearly with *μ* for *p*∈(1/5,1). Whatever the value of *μ*, the player can adjust the switching threshold to obtain a maximum gain proportional to *μ*. The special case of *α*=0 is derived in the following subsection.

The optimal parameter, *b*_{opt}, and the corresponding maximum gain, *G*_{2}(*b*_{opt}), are shown in figure 3*c*,*d*, as a function of *p* and *α*.

With reference to theorem 2.4, since there is a single solution to equation (3.4), optimized threshold switching is actually the optimal solution out of all switching strategies.

It is difficult to directly compare the gains from negative exponential switching and threshold switching based on figure 3*c*,*d*. Therefore, we now consider two special cases of *α*.

### (c) Examples from the generalized negative exponential distribution: uniform (*α*=0) and negative exponential (*α*=1)

Now as illustrative examples, we derive closed-form expressions for the gain for negative exponentially distributed *X* (i.e. *α*=1) for both negative exponential switching and threshold switching, and uniformly distributed *X* (i.e. *α*=0) for threshold switching. For negative exponential switching and *α*=0, we present a brief algorithm for numerically determining the optimal gain. The resulting optimal parameters, *a*_{opt} and *b*_{opt}, the corresponding optimal gain and the corresponding optimal return are each shown in figure 4.

#### (i) Negative exponential switching and negative exponential density (*α*=*1*)

The *α*=1 case (where *μ*:=E[*X*]=*σ*) is analytically tractable, since it is straightforward to show that
Substitution into equation (3.2) leads to a single solution for *p*∈(1/5,2/3), which by corollary 3.3, and lemma 3.4 must maximize the gain. The solution can be written as
3.9where
Hence, using the comments above lemma 3.4, we can write
3.10The corresponding value of the gain for all *p* is
3.11

#### (ii) Negative exponential switching and uniform density (*α*=*0*)

For the case, we find , and , so that, putting *z*=2*μa*, we need to solve the following equation
This cannot be carried out analytically, but the following iterative contraction mapping yields accurate numerical results:
From this we find *a*_{opt}=0.5*z*_{opt}/*μ*. It can be shown that
3.12The optimal gain is then obtained by substitution of *a*_{opt} into equation (3.12), and equation (3.12) into equation (3.1) to obtain

The corresponding value of the gain for all *p* is
3.13

#### (iii) Threshold switching and negative exponential density (*α*=*1*)

For this case, since *α*>0, the optimal solution is given by substituting *α*=1 into equations (3.7) and (3.8) to obtain
3.14The corresponding maximum gain for threshold switching is
3.15which is in agreement with (McDonnell & Abbott 2009, p. 3317) when *p*=1/2.

#### (iv) Threshold switching and uniform density (*α*=*0*)

From equation (3.7),
The corresponding optimal gain for *α*=0 may be obtained from equation (3.3) or equation (3.8), and is
3.16

#### (v) Discussion for *α*=*0* and *α*=*1*

Figure 4*a* shows the optimal parameters, *a*_{opt} (negative exponential switching) and *b*_{opt} (threshold switching) as a function of *p*, for each value of *α*. We have plotted 1/*b*_{opt} rather than *b*_{opt} since this demonstrates limiting behaviour for both switching functions at *p*=1/5.

Figure 4*b* shows the optimal gain for each situation, as well as the gain that would be obtained by switching according to the conditions of lemma 2.1, *G*_{k}. In lemma 2.1, we showed that if switching is independent of the observed value, *y*, then the player should either always switch, or never switch, in preference to randomized switching. It is clearly shown in figure 4*b* that both *y*-dependent switching functions fare better than deterministic switching for a large range of values of *p*, for both *α*=0 and *α*=1.

Figure 4*c* shows the mean return to the player for each situation. From equation (2.6), this is given by *R*_{m}=*R*_{b}+*G*, where *R*_{b}=(2−*p*)*μ*. The conditions of lemma 2.1 are labelled as *constant switching*. While the gain owing to randomized switching is also clearly evident, interesting asymmetries in the mean return with respect to *p* are visible. That is, the minimum mean return is not at *p*=0.5 except for the constant switching case.

## 4. Simulation results and their confidence intervals

As discussed in §1, this paper provides analytical support for the simulation results on the two-envelope problem presented in McDonnell & Abbott (2009), as well as extensions. In this section, we derive confidence intervals for the gain in order to further strengthen the claims from that paper.

The function *g*(*x*) defined in equation (2.2) can be thought of as the expected gain as a function of *x*, while *G* can be thought of as the mean gain with respect to a density *f*_{X}(*x*). We, therefore, here change the notation to *μ*_{g}:=*G*.

The Monte Carlo simulations performed previously in McDonnell & Abbott (2009) exhibit the customary convergence to the mean value as the number of simulations, *N*, is increased. It is possible to set confidence limits on the values obtained by simulation, which requires knowledge of the standard deviation of the gain; we denote this as *σ*_{g}. Once this has been obtained, we may then use the following well-known formula for the upper and lower 100*ξ*% confidence limits—provided *N* is large enough to permit the use of the Gaussian central limit approximation for the distribution of the mean:
To find *σ*_{g}, we must start with the return rather than the gain. Following McDonnell & Abbott (2009), let *Z* denote the random variable describing the amount in the final envelope left with the player. We denote the probability that the player ends the trial with amount *Z*=*z*, given *x* and 2*x* are in the two-envelopes, as *P*_{Z}. The sample space of *Y* given *x* is the smaller amount is {*x*,2*x*}. So if *Z*=*x*, then
and if *Z*=2*x* then
These probabilities may be easily misinterpreted, so we provide some additional clarification. They are not probabilities the player can use based on an observation *y*. They are simply the probability of ending up with the smaller and larger amounts, respectively, when a switching strategy *P*_{S}(*y*), based on the observation *y* is used. Deriving these equations is based on the fact that *P*_{S}(*x*) is the conditional probability of switching, given *Y* =*x*, and *P*_{S}(2*x*) is the conditional probability of switching, given *Y* =2*x*, which is the only reason why we can substitute for *y*=*x* and *y*=2*x* into *P*_{S}(·). It is irrelevant that *x* cannot be inferred from an observation, *y*.

We now introduce the *conditional mean square return*, given *x* is the smaller amount,
The *mean square return* is then
We now introduce the notation
The mean return stated in McDonnell & Abbott (2009) can be expressed as
and the mean square return can be expressed as
Since the gain, *μ*_{g}, is defined as the mean return owing to switching, less the mean return that would occur when there is no switching, i.e.
4.1the variance of the gain is equal to the variance of the return. Consequently, the standard deviation of the gain can be computed as
Several special cases are presented below.

### (a) Negative exponential switching with negative exponential *X* (*α*=1)

For negative exponential switching, , we have
A change of variable *w*=*ϕ*(1/*μ*+*ma*) reveals that this integral is a scaled gamma-function, and we obtain
For *p*∈(1/5,2/3), using the expressions derived above we can simplify *I*(*n*,*m*) when *a* is optimally chosen for the negative exponential PDF. In this case
and
So we have
and

### (b) Threshold switching for generalized negative exponential *X*

For the threshold switching function we can derive
where and *F*=*Γ*(*α*)*Γ*(3*α*)/*Γ*^{2}(2*α*).

### (c) Simulation results

Figure 5 shows 20 simulation sample paths of how the gain varies through 2000 plays of the two-envelope problem for optimized negative exponential switching. The parameters are *p*=0.5 and *α*=1, which are the same conditions used in McDonnell & Abbott (2009). The 95% confidence intervals are superimposed on the sample paths, and clearly show excellent agreement with the simulations. For example, after about 1000 trials, only one of the sample paths is outside the 95% confidence interval, which is precisely what is expected, on average, for a 95% confidence interval.

## 5. Conclusions and suggestions for further work

In this paper, we have developed analytical proofs that support simulation results in McDonnell & Abbott (2009), in order to build the mathematical underpinnings of how conditionally randomized switching leads to a gain in the two-envelope problem. In particular, we have shown that threshold switching, when optimized for the switching threshold, *b*, is the best possible switching strategy when the random variable *X* is from an generalized negative exponential family, and therefore is always superior to negative exponential conditionally randomized switching.

One of the reasons we chose the generalized negative exponential family for the distribution of *X* is that the parameter *α* allowed us to consider the special cases of uniformly and exponentially distributed, *X*, but also to consider the heavy-tailed case, which occurs when *α*>1. Given the relevance of heavy-tailed distributions in mathematical finance, it is suggested that further work investigates the two-envelope problem for other heavy-tailed distributions of *X*, in particular those with infinite variance.

We have also highlighted that both negative exponential switching and threshold switching will always provide a gain for *p*=0.5, for any value of their parameters, *a* and *b*, respectively, when compared with never switching, or switching independently of the observed value, *y*. Moreover, for the ‘blind’ two-envelope problem where the optimal value of *a* and *b* cannot be chosen since the player does not know *f*_{X} or *p*, an arbitrary choice of these parameters can easily lead to negative exponential switching being superior to threshold switching, as shown in figure 1. Precise mathematical formulation and derivation of optimal strategies for the ‘blind’ two-envelope problem would be a fascinating extension of this paper in future work.

## Acknowledgements

Mark D. McDonnell is the recipient of an Australian Research Fellowship funded by the Australian Research Council (project number DP1093425).

## Appendix A. proofs of theorem 2.4

In this appendix, we provide three proofs of theorem 2.4, the third of which repeats the proof technique of Christensen & Utts (1992) and Blachman *et al.* (1996). As well as extending the proof to arbitrary *p*, we restate this proof here in order to highlight one of the sources of the two-envelope ‘paradox.’ The first new proof is the simplest of the three given here, while the second uses calculus of variations, and thus suggests ways to extend this form of analysis to the ‘blind’ two-envelope problem, via the introduction of constraints.

#### (a) Proof by contradiction

### Proof.

We can choose each value 0≤*P*_{S}(*y*)≤1 independently, and we assume there are no constraints—e.g. continuity or smoothness—on *P*_{S}, other than the given pointwise bounds.

We will proceed by contradiction. Suppose is optimal. Consider the contribution to *G* at any point *y*_{0} for which *P*_{S}(*y*_{0})≠*P**_{S}(*y*_{0}). There are two cases to consider. First, suppose
A 1Then *P*_{S}(*y*_{0}) may be increased (since *P*_{S}(*y*_{0})≠*P**_{S}(*y*_{0})=1), resulting in an increase in *G*, contradicting optimality of *P*_{S}. Similarly, if
A 2then *P*_{S}(*y*_{0}) can be decreased (since *P*_{S}(*y*_{0})≠*P**_{S}(*y*_{0})=0), again resulting in an increase in *G*, contradicting optimality of *P*_{S}. □

#### (b) Proof by calculus of variations

### Proof.

The functional *G*(*P*_{S}) is linear in *P*_{S}, and 0≤*P*_{S}(*ϕ*)≤1 satisfies linear inequality constraints. Hence, the theorem may also be proved by considering the Karush–Kuhn–Tucker (KKT) conditions for optimality, which are both necessary and sufficient for a linear programme (Boyd & Vandenberghe 2004, p. 224).

The Lagrange dual incorporating the constraints on *P*_{S} is
A 3and the conditions for optimality are
A 4Now *P*_{S}(*ϕ*)≠0⇒*λ*_{1}(*ϕ*)=0 and *P*_{S}(*ϕ*)≠1⇒*λ*_{2}(*ϕ*)=0. These conditions are exclusive, hence either *λ*_{1}(*ϕ*)≠0 or *λ*_{2}(*ϕ*)≠0, but not both at the same time. If 0<*P*_{S}(*ϕ*)<1, both Lagrange multipliers are zero, and equation (A4) cannot be satisfied.

So there are only two cases. If *pf*_{X}(*ϕ*)≥(1−*p*)*f*_{X}(*ϕ*/2)/4 then we require *λ*_{2}(*ϕ*)>0 to satisfy equation (A4), and hence for such values of *ϕ*, we have *λ*_{1}(*ϕ*)=0 and *P*_{S}(*ϕ*)=1. Conversely, if *pf*_{X}(*ϕ*)<(1−*p*)*f*_{X}(*ϕ*)/4, we must have *λ*_{1}>0 and consequently *λ*_{1}(*ϕ*)=1 and *P*_{S}(*ϕ*)=0. □

#### (c) Proof using an estimation perspective and Bayesian analysis

This proof is essentially the same as that of Brams & Kilgour (1995*a*) and Blachman *et al.* (1996), but we state it here—extended to arbitrary *p*—as it provides complementary insights to the first two proofs, and a transparent demonstration of erroneous reasoning that leads to the two-envelope *paradox*.

### Proof.

We introduce a binary random variable *W* such that the event that the player keeps the envelope is *W*=1, and that they switch is *W*=0. Optimizing the gain is equivalent to optimizing the conditional distribution of *W* given *Y* , which we denote as *P*_{W|Y}. We further introduce a continuous random variable *Z* to describe the amount in the final envelope. The objective for the player is to maximize the expected value of *Z*, i.e.,
A 5This is maximum if E[*Z*|*Y* =*y*] is maximized for all *Y* =*y* (sufficient condition). To proceed, we note that E[*Z*|*Y* =*y*]=E[E[*Z*|*Y* =*y*,*W*]]. Thus
A 6The second line follows because the switching function *P*_{S}(*y*) is equivalent to the conditional probability *P*_{W|Y}(0|*Y* =*y*). In the third line, E[*Z*|*Y* =*y*,*W*=1]=*y* because when the player does not switch, the final amount will always be *Z*=*y*.

The remaining term E[*Z*|*Y* =*y*,*W*=0] is more difficult, as it is dependent on the random variable *X*, i.e. the amount chosen by the ‘house’ as the lower of the two values. We capture this dependence through the introduction of another binary random variable *D*, such that *D*=0 if the player initially has the lower valued envelope, i.e. *y*=*x*, and *D*=1 if the player initially has the higher valued envelope, i.e. *y*=2*x*. If the player switches and *D*=0 then *Z*=2*y* and if the player switches and *D*=1 then *Z*=*y*/2. Thus
A 7To proceed, we need to derive the probabilities *P*_{D|Y}(0|*y*) and *P*_{D|Y}(1|*y*). When *D*=0 we have *y*=*x* and *P*_{Y |D}(*y*|0)=*P*_{X}(*y*), and when *D*=1 we have *y*=2*x*, so *P*_{Y |D}(*y*|1)=0.5*P*_{X}(*y*/2). So the unconditional distribution of *Y* is *P*_{Y}(*y*) d*y*=*pP*_{X}(*y*) d*y*+0.5(1−*p*)*P*_{X}(*y*/2) d*y*. Then, by Bayes’ rule,
The necessity to use Bayes’ rule was recognized by Christensen & Utts (1992). However, a small oversight was made in Christensen & Utts (1992), which used an expression equivalent to *P*_{X}(*y*/2) d*y* rather than 0.5*P*_{X}(*y*/2) d*y*. This problem was pointed out by Blachman *et al.* (1996) in deriving the same expressions as those given here.

Substituting the above into equation (A7) gives
Substituting this into equation (A6), the end result is
Let us write this as E[*Z*|*Y* =*y*]=*y*+*yP*_{S}(*y*)*a*(*y*), which is a linear function in *P*_{S}(*y*). Therefore, the function is maximum either for *P*_{S}(*y*)=0, when *a*(*y*)<0 or *P*_{S}(*y*)=1, when *a*(*y*)>0. We wish to choose *P*_{S}(*y*) that maximizes E[*Z*|*Y* =*y*] ∀*y*. However, since *y*≥0 and *P*_{S}(*y*)∈{0,1}, it is clear that we must choose
□

### Remark A.1

Since *P*_{Y}(*y*) d*y*=*pP*_{X}(*y*) d*y*+0.5(1−*p*)*P*_{X}(*y*/2) d*y*,
Substituting E[*Z*|*Y* =*y*] into equation (A5) gives
This is the return that would be obtained if the player switches—see equation (2.6). Thus, the gain can be expressed as
which recovers equation (2.2) from McDonnell & Abbott (2009), i.e. equation (2.3) as stated above.

### Remark A.2

Recall that before switching, the probability that the player received the smaller amount is *p*. Thus, by switching the player will get the larger amount with probability 1−*p*. If the dependence on *X* is overlooked in the calculation of *P*_{D|Y}(*d*|*y*), the following erroneous reasoning would result:
In the event *p*=0.5, this would be E[*Z*|*Y* =*y*,*W*=0]=1.25*y*, which would result in an optimal switching function *P*_{S}(*y*)=1 ∀*y*. This means switching for any observed value maximizes the gain, which is the two-envelope apparent paradox.

Thus, this proof makes it very clear where fallacious reasoning leads to the apparent paradox, i.e. by overlooking that the probability of initially receiving the lower/higher amount, conditioned on the observed amount, *y*, is not independent of the observed amount. Further discussion of a different source of incorrect mathematical reasoning is given by Brams & Kilgour (1995*b*) in response to the letter of Mixon (1995).

- Received October 20, 2010.
- Accepted April 11, 2011.

- This journal is © 2011 The Royal Society