## Abstract

In this paper, we prove that the probability kernel of a random walk on a trinomial tree converges to the density of a Brownian motion with drift at the rate *O*(*h*^{4}), where *h* is the distance between the nodes of the tree. We also show that this convergence estimate is optimal in which the density of the random walk cannot converge at a faster rate. The proof is based on an application of spectral theory to the transition density of the random walk. This yields an integral representation of the discrete probability kernel that allows us to determine the convergence rate.

## 1. Introduction

Binomial models, introduced in Cox *et al*. (1979), play a central role in the theory and practice of derivatives pricing. They are widely used to obtain algorithms for determining the numerical value of options with general pay-offs and provide a flexible framework that can accommodate a variety of models (Hull & White 1988; Madan *et al*. 1991; Rubinstein 1994; Derman *et al*. 1996). The equivalence between binomial and trinomial trees has been established in Rubinstein (2000). The present paper will focus on the convergence properties of the latter.

It is known that the prices obtained from binomial and trinomial models converge to the prices given by continuous-time and space models (e.g. He 1990; Amin & Khanna 1994). The question of the rate at which the discrete option price converges to its continuous limit has been examined in Heston & Zhou (2000). The authors show that the convergence rate of a call option is at least of the order , where *n* is the number of time-steps in the binomial model. When expressed in terms of the lattice spacing *h*, this convergence rate is equivalent to *O*(*h*) (the relationship between the number of steps *n* and the distance *h* between the state nodes is given by *n*=3*σ*^{2}*T*/*h*^{2}, equation (2.4) in §2). In addition, the authors show that for twice continuously differentiable functions on a bounded interval in the prices converge at the rate *O*(*h*^{2}). They propose to improve the convergence rate for non-differentiable pay-offs by using smoothing strategies and applying the convergence result for smooth functions. This approach is well suited to general vanilla pay-offs since they are continuous and non-differentiable only at a single point. It is less applicable to functions characterized by a higher degree of irregularity such as the pay-offs of European double digitals and butterfly spreads.

The probability density function (PDF) of a random process can be viewed, in terms of pricing theory, as a forward value of an option whose pay-off is the Dirac delta function. The singular nature of this function has hindered efforts to address the question of the convergence rate of the PDF of a discrete model to the probability density of a continuous model, which is an interesting problem both theoretically and in practice.

In this paper, we address the question of the convergence of the PDF of a random walk on a commonly used trinomial tree (see §2 for definition) to its continuous counterpart, a Brownian motion with drift. The main result, contained in theorem 3.1, states that the probability density of the random walk on a trinomial tree converges to the density of the Brownian motion with drift at the rate *O*(*h*^{4}), where the parameter *h* represents the distance between the nodes of the tree at any time-step. Furthermore, we show that the convergence of the PDFs is uniform in the state-space (i.e. the same rate applies to any pair of starting and terminal points in the state-space) and that it is of the order *O*(*h*^{4}) and no faster. This result can be compared with the rate of convergence of the probability density of a continuous-time Markov chain to a Brownian motion with drift which has been shown to be of the order *O*(*h*^{2}) (Albanese & Mijatović 2006).

Related results dealing with the convergence rate of density functions of a sum of independent identically distributed random variables to the density of the limiting variable, known as local limit theorems, have been developed in mathematical statistics (ch. XVI in Feller 1971; Kolassa 2006). The proofs are based on Edgeworth expansions which use Hermite polynomials to approximate the density of the sum. See also Ibragimov & Linnik (1971) for applications of Fourier methods to the same problem.

The paper is organized as follows. In §2, we give a definition of a widely used random walk on the trinomial tree using natural moment matching conditions and find the spectral representation of its probability kernel. Section 3 contains the statement and proof of our main result (theorem 3.1). Section 4 concludes the paper.

## 2. Probability kernel of the random walk

A general idea behind multinomial lattice models is to approximate a stochastic process on a continuous state-space by a random walk on a discrete state-space. A very natural question for such an approximation is that of the convergence rate of the probability kernel of the random walk to the PDF of the initial stochastic process. In this section, we shall first recall a well-known approximation of the Brownian motion with drift1 by a discrete-time Markov chain (i.e. a random walk) on a trinomial tree. Once we have established the probability kernel of the chain, we will find its spectral representation as defined in appendix A. This representation will allow us to prove our main result contained in theorem 3.1.

Let *Y*_{t}≔*x*+*μt*+*σW*_{t} denote a Brownian motion with a real drift *μ* and a positive volatility *σ* which started at some point *x* in . As usual *W*_{t} denotes the standard Brownian motion. Let us fix a time horizon *T*. The basic idea of a trinomial tree is to divide the time-interval [0, *T*] into *n* small time-steps Δ*t*, i.e. *n*=*T*/Δ*t*, and approximate *Y*_{T} with a sum(2.1)where the random variables *X*_{i}, for *i*=1, …, *n*, are independent identically distributed with a domain {−*h*+*μ*Δ*t*, *μ*Δ*t*, *h*+*μ*Δ*t*}. At any fixed time-step *i*∈{1, …, *n*}, the parameter *h* denotes the distance between the consecutive nodes in the tree. In other words, the variable *X*_{i} is used to approximate the evolution of the process *μt*+*σW*_{t} in the short time-interval of length Δ*t*. The local geometry of the trinomial tree is described in figure 1.

It is intuitively clear that the convergence rate of the probability kernel of to the probability density of the process *Y*_{T} improves as we satisfy the moment matching conditions(2.2)for more integers *k*. The probabilities *p* and *q* from figure 1 can be chosen in such a way that the first five moments of the random variable *X*_{i} coincide with the first five moments of the normal distribution *N*(*μ*Δ*t*, *σ*^{2}Δ*t*) of the increment *Y*≔*Y*_{t+Δt}−*Y*_{t}. Recall that the moments of the normal random variable *Y*−*μ*Δ*t* are of the form(2.3)for any *k* in . Satisfying condition (2.2) for all odd integers, *k* reduces to the equation *p*=*q* which guarantees that the variables *X*_{i} are symmetric around their mean.

The moment matching conditions in equation (2.2) for *k*=2 and 4, together with formula (2.3) yield the following system of equations: 2*ph*^{2}=*σ*^{2}Δ*t* and 2*ph*^{4}=3(*σ*^{2}Δ*t*)^{2}. This system uniquely determines the local geometry of our trinomial tree as it implies that the relationship between the time-step Δ*t* and the spacing *h* is given by the following key formula:(2.4)

We also find that the probabilities *p* and *q* in figure 1 have to be equal to 1/6. The local geometry of our trinomial tree is now completely determined.

Assuming that we have fixed the spacing *h* and the corresponding time-step Δ*t*, we can define a discrete-time Markov chain as in equation (2.1), for every time *t* that is an integer multiple of Δ*t*. It is clear that the domain of the chain , at time *t*, is a subset of *μt*+*h* if and only if the starting point *x* of the random walk is an element of *h* (figure 2).

Let us now assume that *x* and *y* are arbitrary elements in *h*. The Markov chain defined above is time homogeneous2 and its evolution is therefore determined uniquely by the bounded operator , which can be expressed using the orthonormal basis3 as follows(2.5)

To simplify notation, we introduce the coordinate expression . In other words definition (2.5) provides a way of expressing the transition probability kernel of the Markov chain as a bounded linear operator on the Hilbert space *l*^{2}(*h*). Furthermore, since the transition probabilities have been uniquely determined by the local geometry of the trinomial tree, we obtain the following coordinate expression for our operator:(2.6)

Recall that the *discrete Laplace operator* Δ_{h} : *l*^{2}(*h*)→*l*^{2}(*h*) can be defined in the following way for every sequence *f* in *l*^{2}(*h*):

A simple yet important observation, based on relation (2.4), is that the operator can be expressed as a linear combination of the identity operator *I* on the Hilbert space *l*^{2}(*h*) and the discrete Laplace operator Δ_{h} in the following way:(2.7)

The easiest way to show that (2.7) holds is by expressing the discrete Laplace operator in coordinates, substituting it into the formula on the right-hand side, applying equation (2.4) and noticing that the result matches coordinate expression (2.6) for the probability kernel . Equality (2.7) will play a central role in obtaining the spectral representation of the probability kernel of the random walk , which will be established in proposition 2.1 and applied in the proof of theorem 3.1.

So far we have concerned ourselves with the evolution of the chain over the short time period Δ*t*. Our next task is to understand the probability kernel(2.8)

The global geometry of the trinomial tree, for fixed values of the drift *μ* and volatility *σ*, is shown in figure 2.

We now need to express probability kernel (2.8) in terms of the operator and find its spectral representation. This is done in the following proposition.

*Let* *be the Markov chain defined above and let* *be its probability kernel given by equation* *(2.8)*. *Let T be a time horizon and* Δ*t a time-step. Then the following holds*:

*The transition probability**equals*,*where δ*_{y}*is the element of the orthonormal basis of l*^{2}(*h*)*that corresponds to the singleton y*∈*h*(*equation**(2.5)**) and the number of time-steps n is T*/Δ*t*.*The spectral representation of the probability kernel**is given by the integral**for any pair of elements x*,*y in h*.

Before proceeding to the proof of proposition 2.1, we should note that the above representation for the probability kernel of the random walk has the property that the time parameter *t* and the space parameters *x*, *y* feature independently. That is, the kernel of the integral in proposition 2.1(b) is a product of two functions, one depending on time *T* and the other on the dislocation (*x*−*y*). This feature of the spectral representation will play a crucial role in the proof of theorem 3.1.

Part (a) of the proposition follows from the representation in equation (2.5), which specifies the evolution of our chain over the time-step Δ*t*, and an iterative application of the Chapman–Kolmogorov equations (see ch. 6 in Grimmett & Stirzaker 2001).

In order to prove part (b), we need to find a spectral representation of the operator in the sense of appendix A. This may be achieved in the following way: find a diagonal representation for the discrete Laplace operator. The corresponding unitary transformations in this representation (see appendix A for the role of unitary transformations in the definition of the spectral representation) can also be used to define a (trivial) representation for the identity operator *I*. Thus, by equation (2.7) we obtain the spectral representation of .

The spectral representation, as defined in appendix A, for the discrete Laplace operator will be given by a unitary transformation _{h} : *l*^{2}(*h*)→*L*^{2}([−*π*/*h*, *π*/*h*]), also known as the *semidiscrete Fourier transform*, defined bywhere an arbitrary element *f* of the Hilbert space *l*^{2}(*h*) is viewed as a function from *h* to .

The usual inner product on *L*^{2}([−*π*/*h*,*π*/*h*]) with respect to the Lebesgue measure on the domain [−*π*/*h*, *π*/*h*] (see appendix A for definition) makes the family of functions , where *m*∈, into an orthonormal basis of *L*^{2}([−*π*/*h*, *π*/*h*]) (for the proof of this fundamental fact see theorem II.9 in Reed & Simon 1980). In particular, this implies that the mapping given byis well-defined and is an inverse of the semidiscrete Fourier transform (both of these facts follow directly from the definition of the inner product on *L*^{2}([−*π*/*h*, *π*/*h*]) and the fact that the functions form an orthonormal basis). It follows from the definitions of the respective inner products on Hilbert spaces *l*^{2}(*h*) and *L*^{2}([−*π*/*h*, *π*/*h*]) that _{h} is a unitary transformation (for definition see appendix A). Since is an inverse of _{h}, it must also be a unitary operator.

The following calculation in the Hilbert space *L*^{2}([−*π*/*h*, *π*/*h*]) yields a spectral representation of the discrete Laplace operator Δ_{h}:

All infinite sums that feature in this calculation are well-defined elements of *L*^{2}([−*π*/*h*, *π*/*h*]) since the coefficients are the elements of the convergent sequence *f*∈*l*^{2}(*h*).

This calculation, together with equality (2.7), implies that the operator has a spectral representation of the form(2.9)where *ϕ* is any element of *L*^{2}([−*π*/*h*, *π*/*h*]) and *n* equals the number of time-steps *T*/Δ*t* in the trinomial tree.

We are now in the position to obtain the spectral representation of the probability kernel of the random walk at a given time horizon *T*, by combining equality (2.9) with part (a) of the proposition. By substituting *ϕ* with in equation (2.9), we find the following:

The proof of part (b) in the proposition can now be concluded by substituting equality (2.4) into the formula we have just obtained. ▪

## 3. The main theorem

In this section, we shall examine the convergence of the discrete probability kernel to the conditional PDF *p*_{T}(*x*, *y*) of the Brownian motion with drift. We will establish the precise convergence rate of to *p*_{T}(*x*, *y*) and prove that it is uniform in the state variables *x* and *y* (theorem 3.1). Before stating this result, we should recall that the PDF of the Brownian motion with drift *Y*_{t}=*x*+*μt*+*σW*_{t} at a given time horizon *T* has a spectral representation of the form(3.1)

For the proof of this standard fact, see section 4 in Albanese & Mijatović (2006).

Before proceeding to the statement of theorem 3.1, let us clarify the limiting procedure as the parameter *h* goes to zero, which involves a passage from a discrete state-space *h* to the continuum of . First of all, we need to fix a strictly decreasing sequence of positive real numbers with the following two properties:

The first property enables us to study the behaviour of transition probabilities as the lattice spacing goes to zero. The second property ensures that the lattice *h*_{j} contains the lattice *h*_{i} for all *j*≥*i*. It is clear that there are many sequences satisfying these requirements. A very simple example is given by *h*_{n}≔(1/2)^{n}.

Let us now fix a sequence that satisfies the above conditions. In all that follows, we shall assume that the distance *h* between two consecutive points in the lattice is equal to one of the elements in the sequence . A limit when the spacing *h* goes to zero is defined as the limit where the parameter *h* visits all the elements of the sequence from some index *N*∈. It is clear that the limit, as *h* goes to zero, yields the same result for any choice of sequence that satisfies the above conditions and is therefore a well-defined concept. We may now state our main theorem.

*Let a positive real number T be a time horizon*. *Let* *be the probability kernel of the random walk* , *given by equation* *(2.8)* *in* *§2*. *For any pair of elements x*, *y*∈*h*, *let p*_{T}*(**x*, *y**)* *denote the conditional PDF of the Brownian motion with drift, given in equation* *(3.1)*. *Then the following holds**and the error term O**(**h*^{4}*)* *is independent of x and y*. *Equivalently, this means that there exist positive constants C and δ such that the inequality* *holds for all h*<*δ and all x, y*∈*h*. *Furthermore, the probability kernel* *does not converge to the density p*_{T}*(**x*, *y**)* *at the rate O**(**h*^{4}*f**(**h**))*, *for any non-constant function f such that* lim_{h→0}*f**(**h**)*=0.

In this theorem, we are comparing the PDFs of the family of the discrete state-space processes , with the density of *Y*_{t} whose state-space is clearly continuous. Since the conditional probability density of the Brownian motion with drift (starting at *x*∈*h*) can be defined by the limit , it is clear that the density of the discrete state-space process should be given by the quotient which is equal to . Moreover, this argument shows that the ‘discrete’ density converges to the ‘continuous’ density *p*_{T}(*x*, *y*).

Before proceeding the proof of theorem 3.1 recall that, by definition, a function *k*(*h*) is of type *O*(*g*(*h*)) if and only if it is bounded above by *Mg*(*h*), for a positive constant *M*, on some interval [0, *ϵ*) where *ϵ*>0, i.e. lim sup_{h↘0}|(*k*(*h*))/(*g*(*h*))|≤*M*.

Let us start by choosing an arbitrary starting point *x*∈*h* and an arbitrary terminal point *y*∈*h* (figure 2). Our goal is to find an upper bound on the differenceof the densities for Brownian motion with drift *Y*_{t} and the discrete-time Markov chain at a given time horizon *T*. This will be achieved by comparing the spectral representation for *p*_{T}(*x*,*y*), given by equation (3.1), with the representation of the discrete probability kernel specified in (b) of proposition 2.1.

The basic idea that allows this comparison is as follows. We will start by defining a positive function *K*(*h*) that goes to infinity ‘very slowly’ as the distance *h* between the nodes of the tree approaches zero. This will enable us to decompose the integration domains of the spectral representations mentioned above into disjoint sets [−*K*(*h*), *K*(*h*)] and −[−*K*(*h*), *K*(*h*)], and study the difference *D*(*h*) by expressing it as Riemann integrals over the components of this decomposition. This will yield the desired upper bound. A similar strategy was used in the proof of theorem 5.1 in Albanese & Mijatović (2006; for more on this comparison see figure 3).

Before specifying the function *K*(*h*) let us define the function *f*_{h}(*p*), which contains all the essential information about the spectrum of the operator from equation (2.5) (figure 3), using the formula(3.2)

Using the function *f*_{h}(*p*), we can rewrite the representation given in proposition 2.1 (b) of the probability kernel in the following way(3.3)

Let be the positive function which we will use to decompose the domain of integration. Note that since the inequality *K*(*h*)≤*π*/*h* holds for small *h* (i.e. the interval [−*K*(*h*),*K*(*h*)] is contained in the domain [−*π*/*h*,*π*/*h*] for small *h*), we can bound the difference *D*(*h*) using the integralsin the following way:(3.4)

The factor of 2 in front of *A*_{2}(*h*) and *A*_{3}(*h*) arises because the integrands in *D*(*h*) are symmetric functions on the components of the set −[−*K*(*h*), *K*(*h*)]. Note that in obtaining (3.4), we used the well-known inequality for any integrable function *f* and any Borel measurable set *U* in . Observe also that the right-hand side of equation (3.4) is independent of the coordinates *x* and *y* since they only appear in the expression e^{ip(x−y)} whose absolute value equals one.

Our task now is to prove the inequalities for some positive constants , where the index *j* runs from 1 to 3. Let us start by estimating the integral *A*_{1}(*h*) for small values of the spacing *h*(3.5)

The second inequality follows from the elementary relationship |e^{z}−1|≤e^{|z|}−1 which is valid for all *z* in the complex plane (take the absolute value of the Taylor series for the function *z*↦e^{z}−1 and apply to it the triangle inequality: ). It is clear from equation (3.5) that in order to bound *A*_{1}(*h*), we first need to understand the asymptotic behaviour of the expression |*f*_{h}(*p*)+(*p*^{2}/2)|. The next claim constitutes a central step in the proof of the theorem.

*Claim*. The following inequality holds(3.6)for some positive constant *C* and all *p* in the interval [−*K*(*h*),*K*(*h*)].

To see this, we need to recall that the function *x*↦log(*x*) has a Taylor expansion of the formwhich converges uniformly on compact subsets of the interval (0, 6). Using this fact, we can express the function *f*_{h}(*p*) as an infinite seriessince |cos(*hp*)−1| is always strictly smaller than 3. This expression leads to the following inequality(3.7)

In order to obtain a bound for the infinite series in equation (3.7), we should recall that lim_{h→0}*hK*(*h*)^{m}=0 for any *m*∈, which implies that *hp*<1 for *p*∈[−*K*(*h*), *K*(*h*)] and all small *h*. Therefore, we find that |1−cos(*hp*)|≤*h*^{2}*p*^{2}/2, since the error of a partial sum of an alternating series whose elements form a monotonically decreasing sequence is bounded above by the absolute value of the first element in the series after the partial sum. This implies the following inequalities(3.8)

The last inequality in equation (3.8) is valid for small values of the spacing *h*, which guarantees that the geometric series is bounded above 2.

We are now left with the task of estimating the first summand on the right-hand side of equation (3.7), i.e. *G*(*h*)≔|(*p*^{2}/2)−((1−cos(*hp*))/*h*^{2})−((1−cos(*hp*))^{2}/6*h*^{2})|. To that end we recall the following decomposition of the cosine:

Substituting this expression into the definition of *G*(*h*) we find(3.9)for some constant *G*_{0} and all small *h*. Inequality (3.9) holds because the convergent geometric series , whose value is below 2 for small *h*, dominates each of the infinite series above. Substituting inequalities (3.8) and (3.9) into (3.7) and defining *C*≔*G*_{0}+6 proves the claim.

We can now estimate the integral *A*_{1}(*h*). The elementary fact e^{x}−1≤2*x*, which is valid for small non-negative *x*, tells us that, for small values of *h*, equation (3.5) implies the following estimate(3.10)for some positive constant .

It is clear from figure 3 that the function *f*_{h}(*p*) is decreasing on the interval [0, *π*/*h*]. The quantity *A*_{2}(*h*) is therefore bounded above by the expression . Recall that by definition *K*(*h*) was set to . Since our claim implies that the following holds *f*_{h}(*p*)≤|*f*_{h}(*p*)+(*p*^{2}/2)|−(*p*^{2}/2)≤*Ch*^{4}*p*^{6}−(*p*^{2}/2) for *p*∈[−*K*(*h*),*K*(*h*)], we can conclude that(3.11)

The second inequality in equation (3.11) follows directly from the definition of *K*(*h*) and the property lim_{h→0}*h*^{4}*K*(*h*)^{6}=0.

We are now left with the easy task of estimating the decay rate for *A*_{3}(*h*). A straightforward application of L'Hospital's rule implies that *A*_{3}(*h*) is bounded above by . By substituting the definition of *K*(*h*) we conclude that(3.12)

Inequality (3.4), together with (3.10), (3.11) and (3.12), now yields the necessary convergence estimate and thus proves the first part of the theorem.

We are now left with the task of showing that the rate of convergence cannot be faster than *O*(*h*^{4}). We will do this by assuming that the difference *D*(*h*) converges to zero at the rate *O*(*h*^{4}*f*(*h*)) for some non-constant function *f*(*h*), such that lim_{x→0}*f*(*h*)=0, and then showing that this assumption leads to contradiction.

Note that if we redefine , the same calculation as above implies that the integrals *A*_{2}(*h*) and *A*_{3}(*h*) converge to zero at least at the rate *O*(*h*^{5}). Under our current assumptions this implies that the integraltends to zero at the rate of *O*(*h*^{4}*g*(*h*)), where *g*(*h*)=max{*f*(*h*), *h*}, since it can be bounded above by a linear combination of *D*(*h*), *A*_{2}(*h*) and *A*_{3}(*h*). On the other hand, it is not hard to see that the following holdswhere *k* : (−*ϵ*, *ϵ*)→ is an infinitely differentiable function defined on an interval containing zero (i.e. *ϵ*>0) and has the key property lim_{x→0}*k*(*x*)=0. The elementary inequality e^{x}−1≥*x*, valid for all real numbers *x*, implies the followingfor all small *h* and *p*∈[0, *K*(*h*)] The following calculation yields a contradictionsince the last integral clearly converges to a finite positive value, while the limit lim_{h→0}*g*(*h*) equals zero. This concludes the proof of the theorem. ▪

## 4. Conclusion

In this paper, we examined the convergence rate of the PDF of a random walk on a trinomial tree to the density of a Brownian motion with drift. We found that the rate of convergence is precisely of the order *O*(*h*^{4}) and no faster, the parameter *h* being the distance between the neighbouring nodes of the tree at each time-step.

From the point of view of pricing theory, this result can be interpreted in terms of the convergence of the prices of Arrow–Debreu securities. An *Arrow–Debreu security* is an option whose pay-off at some maturity *T* is the Dirac delta function concentrated at a given point of the domain of the underlying process. It is clear that the PDF can be viewed as a non-discounted value of such an option and that theorem 3.1 can therefore be viewed as a convergence result for securities with a very singular pay-off. This should be contrasted with the well-known fact (Heston & Zhou 2000) that the smoothness of the pay-off function is crucial for proving the convergence estimates for discretization schemes such as trinomial trees and partial differential equation (PDE) methods.

## Acknowledgments

I would like to thank Prof. Eckhard Platen for useful comments.

## Footnotes

↵This process is at the heart of the Black–Scholes model (Black & Scholes 1973) since an exponential of it is a geometric Brownian motion.

↵By definition this means that the following equality holds = for all elements

*x*,*y*∈*h*. In our case this is obvious since all summands in definition (2.1) are identically distributed and the left-hand side equals (*X*_{i}=*μ*Δ*t*+*y*−*x*), for some index*i*, while the right-hand side equals (*X*_{1}=*μ*Δ*t*+*y*−*x*).↵The sequence

*δ*_{y}in*l*^{2}(*h*) takes value 1 at*y*and value 0 everywhere else. It is clear that such sequences for all*y*∈*h*form a basis of the Hilbert space*l*^{2}(*h*) with the inner product given by equation (A 1) in appendix A. The measure (in that definition assigns a weight of 1 to each element in*h*.- Received November 13, 2006.
- Accepted March 14, 2007.

- © 2007 The Royal Society