## Abstract

In this paper, we show how to use sequential Monte Carlo methods to compute expectations of functionals of diffusions at a given time and the gradients of these quantities w.r.t. the initial condition of the process. In some cases, via the exact simulation of the diffusion, there is no time discretization error, otherwise the methods use Euler discretization. We illustrate our approach on both high- and low-dimensional problems from optimal control and establish that our approach substantially outperforms standard Monte Carlo methods typically adopted in the literature. The methods developed here are appropriate for solving a certain class of partial differential equations as well as for option pricing and hedging.

## 1. Introduction

The calculation of expectations of functionals of diffusions at a given time and the gradients w.r.t. the initial condition of these quantities is an important problem, with many applications. A short list includes engineering/statistics (optimal control/decision problems), finance (option pricing and hedging) and applied mathematics (solution of partial differential equations (PDEs)). The problem is, given a diffusion
1.1
where *t*∈[0,*T*], , and {*W*_{t}}_{0≤t≤T} is a *d*-dimensional standard Wiener process, sought to calculate the quantities
1.2
for some , with the expectation w.r.t. the law of the diffusion started at . In most cases, the expectations cannot be calculated in closed form, and numerical methods have to be used.

In practice, Euler approximation and standard Monte Carlo (as in ch. 3 and 4 of Robert & Casella 2004) techniques are applied. In the context of expectations, the diffusion is Euler discretized and the latter is simulated to yield a Monte Carlo (MC) estimate of the expectation (see Lapeyre *et al*. (2001) for a book-length summary). Various extensions can be found in Gobet & Maire (2005) and Zou & Skeel (2004). When calculating the derivatives, there are two primary methods based on Euler approximation and Mallivian calculus. In terms of the former, approaches include the likelihood ratio method (or importance sampling (IS); e.g. Glasserman 2003), resampling methods (e.g. Glasserman & Yao 1992), pathwise methods (Yang & Kushner 1991) and score methods (Glynn 1986). More recently, ideas based on Mallivian calculus have been introduced (e.g. Gobet & Munos 2005; Detemple & Rindisbacher 2007).

A recent development in the simulation of diffusion processes can be found in the work of Beskos *et al*. (2006). In that paper, it is shown how to sample, with no time-discretization error, from a certain class of diffusion processes. Such methodology allows the development of MC methods that can compute the quantities above, with no time-discretization error (e.g. Casella & Roberts 2008).

Intrinsically, most of the above techniques rely on rather standard MC and IS. For large *T* (or *d*), and ‘complex’ functions the variance of MC or IS estimators will often be very high. That is, as *T* (or *d*) grows it would be expected that direct MC or IS estimates will be highly inaccurate (e.g. Robert & Casella 2004). A solution discussed in Kappen (2005) is to use a Laplace approximation of the transition density of the diffusion as an IS distribution; this approach is unreliable for large *d* and *T*.

In reference to the connection with the solution of PDEs, Zou & Skeel (2004) present a method that can be interpreted as a special case of the algorithms described in this paper. However, there are many key aspects that we clarify from their methodology, such as an appropriate sequence of target distributions and fixed and not random number of samples.

### (a) Proposed solutions and contribution of the article

As stated above, standard MC methods often lead to estimators with very high variance. In this paper, we propose two techniques that are able to,

(i) in low dimensions, calculate the quantities in equation (1.2) with no time-discretization error,

(ii) in high dimensions, via Euler approximations, accurately compute equation (1.2).

Both solutions rely upon sequential Monte Carlo (SMC) methods. These approaches have been applied to a variety of challenging problems in statistics and engineering to compute high-dimensional integrals w.r.t. a sequence of probability measures. Such ideas can be used in contexts where standard MC methods are destined to fail. A brief review is provided in §3.

For low-dimensional problems, under assumptions (A1)–(A3), we will adopt the exact simulation of diffusions methodology along with ideas from continuous-time SMC methods (Rousset 2006). Note that assumptions (A2) and (A3) are quite restrictive, but in general, we are interested in high-dimensional problems; in such scenarios, under assumption (A1), our approach can be applied to any class of diffusions.

The contribution of this article is as follows.

(i) We provide a MC-based solution to compute the expressions in equation (1.2). This solution relies on non-standard methodology, which is more likely to work than ordinary MC methods.

(ii) We provide specialists in engineering/statistics/finance/applied mathematics with tailored methods for diffusions.

(iii) We give a result on the convergence of a commonly used estimate, not currently present in the literature.

### (b) Structure of paper

In §2, our assumptions and the notation is given; in §3, we provide a review of the simulation methods adopted, and an additional convergence result, missing from the literature; in §4, our approach is detailed; §5 features numerical simulations, in the context of optimal control, and in §6, the paper is concluded. The appendix contains the assumptions of proposition 3.1; the proof is available from http://stats.ma.ic.ac.uk/a/aj2/public_html/.

## 2. Notation and assumptions

### (a) Notation

Let denote a measurable space. For *n*≥1, the product space *E*_{1}×···×*E*_{n} is written *E*_{[1,n]} and . The *i*th element of *d*-dimensional vector *x*_{t} is written as *x*_{i,t} and for any vector *x*_{1:n}:=(*x*_{1},…,*x*_{n}). If *x*∈*E*^{d} (resp. ), *x*_{i} (resp. *x*_{i,n}) is the *i*th element of the vector. *I*_{d} is the *d*×*d* identity matrix. For a matrix *Σ*, *Σ*_{j} is the *j*th column of *Σ* and (*Σ*)_{ij} is the (*i*,*j*)th element. denotes the *d*-dimensional normal distribution of mean *μ* and covariance *Σ*, and the corresponding density, evaluated at *x*, is written as *ϕ*_{d}(*x*;*μ*,*Σ*). is the Poisson distribution of parameter *ρ* and is the uniform distribution on the Borel set *A*.

The expectation operator is written as and conditional expectations of continuous-time Markov processes are , with the natural filtration. The Wiener measure is written as and corresponding (resp. conditional) expectations as (resp. , *t*>0 or for a Brownian bridge, *s*<*t*). For a discrete-time Markov chain, the operator is used to denote expectation w.r.t. the law of chain, started at *x*. For any *π*-integrable function *g*, . For a set *A*, the indicator function is written as .

The Euclidean norm is denoted as |·|. is the space of functions on that are continuous, have compact support and continuous first and second derivatives. is the class of bounded measurable functions on *E*, with the associated sup-norm written as . For any differentiable function *g*(*x*,*y*), and *x* in *d*-dimensional space, the gradient is written as ∇_{x}*g*(*x*,*y*):=(∂*g*/∂*x*_{1}(*x*,*y*),…,∂*g*/∂*x*_{d}(*x*,*y*)), with ∇^{2} the Laplacian.

### (b) Assumptions

Throughout the paper, the following assumptions are invoked at various places, given equation (1.1). Note that assumptions (A2) and (A3) are not needed in our discretized solution below.

(A1) (

*Existence of solution*):*μ*and*σ*satisfy Lipschitz and growth conditions, sufficient to guarantee the existence and uniqueness of a strong solution to equation (1.1) (e.g. Øksendal 1995).(A2) (

*Transformation to unit diffusion*): There exists a diffeomorphism such that, for*Y*_{t}=*η*(*X*_{t}), 2.1(A3) (

*Properties of the drift*): There exists such that ∇*A*=*α*. Also*α*satisfies the Novikov condition (e.g. Øksendal 1995).

## 3. SMC methods

In this section, we review SMC methods.

### (a) The algorithm

SMC methods are techniques for approximating high-dimensional integrals w.r.t. a sequence of probability measures (Del Moral 2004). For the non-expert, it should be noted that this task is possible using ordinary MC; however, it is computationally inefficient and in many cases will lead to estimates with high/infinite variance.

Let *ι*≥1 and a sequence of probability measures. It is assumed that *π*_{n}(*d**x*_{1:n}) is defined on and admits a density w.r.t. some dominating measure *d**x*_{1:n}. Additionally, , , , , *x*_{i}∈*E*_{i}′ and *d**x*_{1:n}=*d**x*_{1:n−1}×*d**x*_{n}, *n*≥2.

SMC methods approximate the distributions by a large number (*N*) of random samples—termed particles—that are propagated over time using a combination of IS and resampling mechanisms. The algorithm requires one to specify an importance density *M*_{1}(*x*_{1}) on to initialize the recursion at time 1 and a family of transition kernels with associated densities {*M*_{n}(*x*_{1:n−1},*x*_{n})}_{n≥2} to ‘extend’ *x*_{1:n−1}∈*E*_{n−1} by sampling conditional upon *x*_{1:n−1} at time instants *n*=2,…,*ι*. These kernels are chosen such that it is easy to sample from them.

Now introduce, for *n*≥2,
with Under weak assumptions, it can be justified theoretically (Del Moral 2004) that the algorithm in figure 1 can approximate, as , expectations w.r.t. the sequence {*π*_{n}}.

In step 1 of figure 1, sampling from the density
can be achieved using standard multinomial sampling. However, other *resampling* approaches are possible (see Doucet *et al*. (2001) for details). In this article, the systematic resampling method is used. This step is required as the variance of the product of {*G*_{n}} increases over time, and again the estimation procedure suffers from high variance (see Doucet *et al*. 2001).

An important point is that it is not necessary to resample the particles at every time instant of the algorithm; it may be performed in a dynamic fashion, e.g. using the effective sample size (see Doucet *et al*. 2001). Note that it is possible to calculate normalizing constants of {*π*_{n}}, as a byproduct of the algorithm.

In our application, {*G*_{n}}_{1≤n≤ι} only depend upon the sub-sequence *x*_{n−1:n}, with *x*_{0} given, so that *G*_{n}(*x*_{1:n})=*G*_{n}(*x*_{n−1:n}). Similarly, *M*_{n}(*x*_{1:n−1},·)=*M*_{n}(*x*_{n−1},·) and write as it is only required to store *x*_{n−1:n} for the next step of the algorithm. This is assumed from herein.

### (b) Marginal approximations

In many applications, it is of interest to recursively compute approximations of signed measures associated to our target
3.1
with *θ*_{1}(*d**x*_{1}):=*f*_{1}(*x*_{1}) d*x*_{1}, , , *n*≥2; throughout it is assumed that {*f*_{n}} are explicitly known (or can be approximated).

### Example

To illustrate why this is of interest, consider calculating
3.2
with given, some potential functions and
for a given Markov transition *p*. This is an example of computing the derivative of an expectation as in §4*a*(ii). Now it is not sensible to use SMC to approximate the above equation, owing to what is known as the path degeneracy problem; typically it is only possible to accurately calculate expectations w.r.t. , for some small *k*≥1 (see Andrieu *et al*. 2005). As a result, if one wishes to sample from and compute equation (3.2) one should not use the SMC methods in §3*a*. However, defining
, SMC can be used (demonstrated below) to approximate the sequence and sample from using SMC. Note that *θ*_{ι}(*g*) is equal to equation (3.2).

It will be computationally efficient to use the same particles, which approximate {*π*_{n}}, to approximate the signed measure. At time 1, this can be achieved by IS
with . Now, for *n*≥2, given an empirical approximation at time *n*−1 of *θ*_{n−1}; , the following point-wise approximation of the signed density is also available:
Now sample particles from , then, using IS, the approximation
3.3
is obtained, where
3.4
and the indicator function in is omitted (given *x*_{0})
Such recursions are often adopted in applications, e.g. in the computation of the filter derivative (see Poyiadjis (2006) for a nice review). Note that the computational cost of calculating the weights is *O*(*N*^{2}).

#### (i) Convergence of marginal approximation

Before we continue, a theoretical result is given. This shows that the marginal approximations converge, in probability, to the true values, along with bounds on the rate of convergence. To our knowledge, this has not been proved in the literature. The assumptions are in appendix A; recall the proof is available from http://stats.ma.ic.ac.uk/a/aj2/public_html/. Note that the expectation, below, is taken w.r.t. the stochastic process that is simulated by the SMC algorithm and *θ*^{N} (resp. *θ*) is to be interpreted as a signed measure. Recall that *θ*_{n} is as in equation (3.1) and is given in equation (3.3).

#### Proposition 3.1

*Assume assumptions* (*B1*)–(*B4*). *Then for any n*≥1, , *such that*

For diffusion processes, especially when one uses approximations on bounded sub-spaces of , it will be possible to verify assumptions (B1)–(B4).

### (c) Random weight SMC

One concept that will be used in this paper is the random weight SMC algorithm (see Rousset & Doucet 2006; Fearnhead *et al*. 2008). The basic idea is the following: suppose that with λ the probability density of a random variable *Y* ; *α*_{n} and *β* both *β* non-negative functions that are known exactly and the expectation is unknown. Consequently, it is impossible to compute the importance weights appearing in the SMC algorithm described earlier. However, a key observation of Fearnhead *et al*. (2008) and Rousset & Doucet (2006) is that it is not required to know this weight exactly, only an unbiased estimate of it. In other words, the target density is adopted. Hence, marginally, it is possible to approximate expectations w.r.t. *π*_{n}.

## 4. SMC-based solutions

In this section, our simulation methods for computing the quantities in equation (1.2) are introduced. Recall that there is an expectation and a derivative of an expectation, which are termed ‘expectation’ and ‘derivative’, respectively, for the duration of the paper. It should be noted that a possible drawback of the methods is their dependence on *x*_{0}; this is discussed in more detail in §6. From herein, *ι*Δ*t*=*T*.

### (a) Exact solution

In this section, our exact solution is described. Note that assumptions (A1)–(A3) are required here.

#### (i) Expectation

To compute the expectation, the target at time *n* corresponds to
4.1
for some (potentially *γ*_{n}≡1), with
the exact transition density of the diffusion (Dacunha-Castelle & Florens-Zmirou 1986). It is possible to use the random weight SMC algorithm combined with the Poisson estimator (Wagner 1988; Beskos *et al*. 2006; Fearnhead *et al*. 2008) to deal with the expectation w.r.t. the Brownian bridge. It is easily seen that the expectation
4.2
(*c*>0,*ρ*>0), w.r.t. , , conditional upon a path of a Brownian bridge, is an unbiased estimator of
Taking expectations w.r.t. the Brownian bridge tied down at and yields the desired property. In other words, it suffices to sample the skeleton of a Brownian bridge at those times *ψ*_{1:k}, instead of entire sample paths. The actual estimation is performed as in §4*b*(i) (see that section for a discussion of this aspect).

#### (ii) Derivative

In the derivative in equation (1.2), a standard result from stochastic calculus is used (e.g. Gikhman & Skorohod 1969),
4.3
where the matrix-valued random variable *Z*_{t} solves the one-dimensional equation d*Z*_{t}=∂*α*(*Y*_{t})*Z*_{t} d*t* with *Z*_{0}=*I*_{d}, ∂*α*(*Y*_{t}) the matrix of partial derivatives of *α*. More generally, for a diffusion of the form (1.1) (and given (A1)), the gradient process satisfies
with ∂*σ*_{j} the matrix of partial derivatives of the *j*th column of *σ*. Let us focus upon the case *d*=1. Given {*Y*_{s}}_{t≤s≤T},
thus the task is to compute Using a straightforward application of lemma 1.4.1 of Rousset (2006), it follows that
In the notation of Rousset (2006), let to verify the result. For illustration, suppose *γ*_{n}=1 for each , then
with and
This coincides with the example in §3*b*. In this case (or indeed when are not unit functions), the variance of an SMC-based estimate will often increase exponentially with the time parameter. As a result, to approximate the derivative the marginal procedure in §3*b* is used. The recursion is
In other words
Since the expectation w.r.t. the Brownian bridge is unknown, a MC approximation (with samples) of the importance weight is used,
4.4
(see equation (4.2) for details).

### (b) Approximate case

In the approximate case, the following scheme is proposed. A standard Euler discretization is used
4.5
with Δ*t*>0,*ι*Δ*t*=*T*; denote the associated Markov transition as . Let *μ*_{Δt}(*x*)=*x*+*μ*(*x*)Δ*t*.

#### (i) Expectation

The expectation is approximated by 4.6 where See Stramer & Yan (2007) for a summary of methods to approximate the transition density of a diffusion.

To compute this expectation, we would like to be able to sample from the optimal IS density (given the {*G*_{n}} could be computed exactly) using SMC sequentially; this choice would correspond to the following sequence of intermediate target densities:
4.7
The problem is that it is often impossible to compute these densities point-wise and hence use the SMC methods above. While it may be possible to use random weight SMC ideas, we favour a simpler approach that is to construct a sequence of densities and then to develop SMC algorithms to sample from these densities. The following sequence of target densities is used:
4.8
with *φ*∈(0,1],*ι*Δ*t*=*T* and *x*_{0}=*x*. Thus the weight functions are
4.9
The normalizing constant of *π*_{ι} is the (discretized) expectation of interest; this idea has appeared in the context of rare event estimation (Del Moral & Garnier 2005). It is possible to adopt the harmonic mean estimator and use the estimate (when is used for *M*_{n})
4.10
This can have lower variance than standard SMC techniques to calculate the normalizing constant. The estimate (4.10) can be straightforwardly modified for different proposals . Note that the choice *γ*_{n}=|*g*|^{φ} is also used for the target density (4.1).

An important point is the sequence of distributions that are used.

#### Remark

If the Euler discretized Markov chain exhibits ergodicity properties, then for large *n*
where is as in equation (4.8) with *φ*=1 and κ the invariant measure of the chain. In other words, if we are interested in computing marginal quantities, then it is very similar to integrating w.r.t. the invariant measure. As a result, in simulations it will be computationally beneficial to use as the target density close to time *ι*; before that the underlying chain can be used as the target distribution.

#### (ii) Derivative

Assume (A1)–(A3), the idea is to approximate the exact solution 4.11 by adopting a trapezium rule approximation of the integral in the exponent. This solution is of interest, when comparing approximate and exact methods, i.e. in terms of accuracy. The signed measures of interest are In a non-recursive form so that is equal to equation (4.11) when the integral in the exponent has been replaced by a trapezium rule approximation. The weight at time 1 when sampling from is . The recursions for can be performed as in equation (3.4).

More generally, let us focus on the appropriate version of equation (4.3), i.e. assumptions (A2) and (A3) are not needed. Conditional upon *X*_{n}, suppose the Wiener increment is known; this is possible if

(i) the Euler approximation is sampled from, or

(ii) it is possible to invert the approximated stochastic differential equation (SDE) via analytic or numerical methods.

Then (note the index of the product starting at *ι* and ending at 1) where *A*_{i,Δt} is the *d*×*d* matrix
and is the Wiener increment obtained from equation (4.5). To obtain the gradient, each element of the *d*×*d* matrix is approximated via a signed measure. That is, at time 1
. Then at time *n*,
which can be estimated as above. Integrating ∇*g*(*X*_{T}) gives an estimate of equation (4.3).

In the case that neither solution above can be adopted, it is possible to use the likelihood ratio method; we have found the convergence of this method to be quite slow.

## 5. Simulations

Our approaches are now applied to a problem from optimal control. It is required that from herein.

### (a) Problem formulation

Consider the following set-up; let be a probability space on which we define the following -valued diffusion 5.1 with , the control.

Our objective is to compute the optimal control {*u*_{t}} associated to the following path functionals of the diffusion:
5.2
where and {*u*_{t}} is adapted to the natural filtration. It is possible to establish, under some assumptions, that the optimal *cost-to-go* function that satisfies the Hamilton–Jacobi–Bellman (HJB) equation is given by

### (b) A probabilistic representation

Under assumption (A1), , it can be established (e.g. Mitter 1982; Runggaldier 1998) that the problem (5.2) can be reduced to computing , where expectations are w.r.t. laws of the diffusions of the form (1.1) and *u*_{t}=∂*V*_{t}/∂*x*_{0}(*x*_{0}). Under assumptions (A1)–(A3), , with *g*_{η}=*g*°*η*^{−1} and *u*_{t}=(∂*η*^{−1}/∂*y*_{0})(∂*V*_{t}/∂*y*_{0})(*y*_{0}).

This can be established via an application of the Feynman–Kac formula (e.g. Karatzas & Shreve 1988, theorem 7.6) and transformation of the resulting PDE. In other words, our quantities of interest (1.2) can be estimated by considering expectations w.r.t. the uncontrolled diffusion.

### (c) Example 1

#### (i) Diffusion and terminal cost

In the following example, our approximate and exact approaches are investigated, when considering the uncontrolled sine diffusion where *t*∈(0,1] and
for ϖ>0 and note . The objective of the example is to investigate the accuracy of the approximate solution and to illustrate some aspects of SMC algorithms that are important for their implementation. In terms of the former, we look at the level Δ*t* where the approximate solution is close to the exact one, but the computational cost is not too high.

#### (ii) Implementation details

*N*=1000 particles are simulated, in equation (4.4), *φ*=0.9. is used as the target distributions for the entire simulation. It was found for a fixed *N* that changing did not yield significantly different conclusions.

For both the approximate and exact solutions, the importance density was used. Note that under an Euler approximation, and constraining the state to some ‘large’ set (which occurs on any computer implementation), the assumptions of proposition 3.1 hold; the estimate of the control converges. In the exact case and , also *ω*=*π* and *ζ*=100.

#### (iii) Simulation results

We look at the cost-to-go and control on a grid of five equally spaced points, corresponding to five SMC algorithms, for both the exact and approximate approaches. The results, averaged over 10 replications, for

(i) approximate solution: Δ

*t*=0.5 (CPU time 4.5 s), Δ*t*=0.05 (CPU time 76.5 s), and(ii) exact solution: Δ

*t*=0.05 (CPU time 88.5 s)

can be seen in figure 2.

In figure 2*a*, it can be observed that, even at discretization level Δ*t*=0.5, it is possible to approximate the exact (cost-to-go) solution quite accurately; this is unsurprising as the variance in the Euler approximation (i.e. Δ*t*) is quite small. However, in figure 2*b*, it appears that the approximation of the control is not quite accurate. This is to be expected because it is a functional of two expectations, and is thus inherently more difficult to approximate.

Tables 1 and 2 provide more insight into the SMC procedure implemented here. In table 1, the variance of the estimate of the cost-to-go, for each initial point, averaged over 10 runs of the algorithm, can be seen. These indicate that there is no significant variability in the estimate, but provides an important point of discussion; as Δ*t* gets bigger, the variance of the Poisson estimator (it is finite in this example) gets larger, but also the number of time steps of our SMC algorithm will fall. As Δ*t* gets bigger, it means that the degree of weight degeneracy and diversity of our particle collection will be higher; that is, there is a trade-off in setting Δ*t* for the exact solution—too large and the variance of the Poisson estimator might be very high, too small the variance of the SMC might be too high. In this particular example, we can see it is better to have Δ*t* large as this requires only two steps of the algorithm; this will not be the case in general.

In table 2, the relative variance of the estimators of the cost-to-go of our approximate algorithm, against the exact algorithm, can be seen. The variances for Δ*t*=0.05 are uniformly smaller than that in the exact case. That this occurs is quite unsurprising; the algorithms are run for the same number of time-steps on a smaller state-space. In the case Δ*t*=0.5, the variances are not always smaller, although the smaller number of timesteps in the algorithm may mean the point above does not have as much impact.

This example demonstrates that the Euler approximation can yield quite similar answers to an exact approach with less coding effort and slightly lower CPU times. This is encouraging, as this approach will be typically required in high-dimensional problems.

### (d) Example 2

In the following example, we demonstrate that our approach can produce better results than standard MC methods for the same computational cost.

#### (i) Diffusion and terminal cost

The problem we investigate is a financial application associated to the Black–Scholes model: d*X*_{t}=*X*_{t}(*r* d*t*+*σ*(*t*) d*W*_{t}), where are equities, *r*∈(0,1) is the risk-free interest rate and is the volatility function. The terminal cost function is
with *ς*_{i}>0, for each *i*. Note that, while the exact transition dynamics are known for the Black–Scholes model, we look at Euler approximations. The reason is to further investigate the approximate case; this is typically applied in practice.

The financial interpretation of the problem is as follows (see Björk (2004) for definitions). The computation of the optimal cost-to-go corresponds to the arbitrage price process—note that we are taking expectations w.r.t. the equivalent martingale measure. The control is the dividend that we are able to set; the control, *u*_{t}, in the diffusion (5.1) can be interpreted as the risk premium. In effect, the problem can be thought of as a large company that is able to pay out dividends and seeking to maximize the expected log of the terminal cost, with a cost on the path space, reflected in (5.2). Since the function *g* is maximized when *S*_{ς,d}(*x*_{1:d,T})=0, can be thought of as a sequence of target values of the average of the share prices.

#### (ii) Implementation details

The parameters: *d*=5, Δ*t*=0.05, *T*=10 (so *ι*=200), *r*=0.01, and *ω*=2*π*. The form of is to represent a ‘seasonal’ effect in the covariance structure. In the terminal cost *ζ*=1, and *ς*=(1,3/2,2).

The target density is defined via the potential function *g*(*x*_{1:d,T})^{φ}. The proposal
with was used. This is an approximation of the locally optimal importance distribution (e.g. Doucet *et al*. 2000) and is selected because *h* has a similar behaviour to *g* and *M*_{n} can be normalized (which is required to compute the weights). We set *ρ*=1. In addition, was introduced as a target density at time 175 of the algorithm.

*N*=1000 particles are generated for the SMC method, with *φ*=0.015. The selected value of *φ* was based on the rate of resampling; if resampling occurs too often, then the sequence of densities is too complex to sample using the methods here. In addition, if *φ* is too large, in high-dimensional problems, then it is likely that the discrepancy between the target densities (4.7) and (4.8) is large. As a result, the resampling mechanism may remove particles that will be consistent with the optimal importance distribution. In this example, we resample using an effective sample size criterion of 1/3*N* and this occurred approximately twice a run of the algorithm. The algorithms are run from three starting points *S*_{ς,d}(*x*_{1:d,T})=(1.05,1,1.95).

#### (iii) Simulation results

In tables 3 and 4, we can observe our results. Note that, to compute the cost-to-go for a given *N*, the computational cost is similar. However, to compute the control, the CPU time for the SMC algorithm is approximately 9000 s (*N*=1000) against 15 s (for MC) for the same number of particles. Thus, for the control estimated via MC, separate runs with 500 000 particles are reported (approx. CPU time 9000 s).

In table 3, the estimated optimal cost-to-go can be seen for each of the three starting points. For some of the MC runs of the algorithm, none of the samples lie in equation (1.2), that is to say, that *g*=0 for each sample, and the estimate is not defined. Conversely, for our SMC method, we always record a value for *g*>0 and the variance of our estimate is not so substantial; that is, it can be reduced for larger particle sizes. The results illustrate that SMC methods can be used in scenarios where standard MC simulations exhibit a high variance, simply by adopting sensible target densities and proposal mechanisms.

In table 4, the estimated control for both the algorithms can be seen. The SMC algorithm produces estimates with much smaller standard deviations.

## 6. Summary

In this paper, we have considered SMC solutions to calculate expectations and sensitivities w.r.t. laws of diffusions. It was demonstrated that it is possible to find solutions with no time-discretization error, as well as biased approaches that can be used in high-dimensional problems. Note that the approach can straightforwardly be extended to jump-diffusion and killed-diffusion problems.

Extensions to the ideas that are presented are as follows. Firstly, any of the extensions of SMC methods (e.g. Doucet *et al*. 2006) can be used for estimation, and would improve upon the ideas presented here.

Secondly, as noted in §4, we have run many algorithms for different *x*_{0}; this can be computationally expensive. However, such a problem occurs in other applications, such as population genetics (see Fearnhead & Donnelly (2001) for a potential solution).

Thirdly, it is feasible to extend the ideas here to solve PDEs connected to Markov processes via the Feynman–Kac formula (Gobet & Maire 2005). The SMC ideas, along with different target densities or instrumental distributions, are likely to provide efficient solutions in this context. A full investigation is the subject of current work.

Fourthly, the ideas here can be extended to more complex control problems, e.g. the optimal portfolio problem in mathematical finance (e.g. Björk 2004). In this context, the process can be discretized and the problem is to compute . In some cases, *u*_{t} is one-dimensional and the expectation above can be approximated using SMC techniques coupled with an approach in Pitt (2002).

Finally, the problems looked at here coincide with a large number seen in other areas, e.g. option pricing. The SMC methods developed here can substantially improve over methods used in that literature (e.g. Glasserman 2003) and a full investigation in this application area is warranted.

## Appendix A

The assumptions for proposition 3.1 are

(B1) (

*finiteness of Hahn decomposition*): For any*n*≥1, and .(B2) (

*boundedness of f*_{n}): and for any*n*≥2, and fixed*v*_{n}∈*E*_{n}′, .(B3) (

*uniform ergodicity of**M*_{n}): For any*n*≥2, ∃ϵ_{n}∈(0,1) and probability density ϑ_{n}(*v*_{n})>0 ∀*v*_{n}∈*E*_{n}′ such that , A 1(B4) (

*integrability*of ): For any*n*≥2, with ϑ_{n}(*v*_{n}) as in equation (A1).

## Footnotes

- Received April 19, 2009.
- Accepted August 11, 2009.

- © 2009 The Royal Society