## Abstract

This article addresses the weak convergence of numerical methods for Brownian dynamics. Typical analyses of numerical methods for stochastic differential equations focus on properties such as the weak order which estimates the asymptotic (stepsize

## 1. Introduction

Stochastic gradient systems are stochastic differential equations in *d* dimensions having the form
*V* (*x*), *x*∈**R**^{d}, is a potential energy function, and *σ*>0 is a constant which characterizes the strength of the additive noise, here described by a standard *d*-dimensional Wiener process *w*(*t*). These systems originate in the work of Einstein [1,2] to describe the motion of Brownian particles. They arise in mathematical models for chemistry, physics, biology and other areas, when the cumulative effect of unresolved degrees of freedom must be incorporated into a model to ensure its physical relevance. Under mild conditions on *V* , system (1.1) is ergodic [3,4] and has the unique invariant distribution *β*=2*σ*^{−2}. Numerical methods for solving equation (1.1) compute a discrete sequence of states *X*_{1},*X*_{2},… by iteratively approximating the short-time evolution. The error in the numerical solution is typically quantified in either a *strong* sense (accuracy with respect to a particular stochastic path associated to (1.1)) or by reference to an evolving distribution (*weak* error or error in averages); the latter is the focus of this article. Ideally, the discrete states are ultimately distributed in a way that is consistent with the invariant distribution, but for complex applications the introduction of error in the numerical process is inevitable. In this article, we examine the asymptotic (

Undoubtedly, the most common numerical method for solving (1.1) is the Euler–Maruyama method which approximates *X*(*t*_{k}), *t*_{k}=*hk*, by the iteration
*i*=1,…,*d*, *k*=1,…, are i.i.d. random variables with the law *τ*], with *τ*=*hN*. The probability measure associated to (1.1) is described by a probability density *ρ*(*t*,*x*) that evolves according to the Fokker–Planck equation
*L*_{2} sense) of the generator for (1.1), defined by
*ρ*(*t*,*x*) evolves from an initial probability distribution *ρ*(0,*x*) to the steady state *φ* be a test function (such as an element of the Schwartz space of *φ* at time *τ* may be taken to be

The discretization scheme (1.3) may also be viewed as giving rise to an evolving probability distribution, and thus one may think of the iterates in (1.3), *X*_{1},*X*_{2},…, as being characterized by densities *ρ*_{1},*ρ*_{2},…. If stepsize *h* is used, then the average at time *τ*=*Nh* is given by
*h*. We refer to this as the *weak error*. For the Euler–Maruyama method, it is known (e.g. [5–7]) that
*Ch*, where *C* is a constant that is independent of the stepsize. A better way to write this is
*C* depends inherently on the time interval. This formula can be seen as a consequence of an asymptotic expansion of the weak error, as proposed by Talay & Tubaro [8] (see also [5–7]). We note that *C* also depends on the distribution of the initial state of the system, i.e. *ρ*(0,*x*), as well as the particular observable, but we suppress these aspects in our notation. The asymptotic (*C* describes the performance of the numerical method for computing averages with respect to the invariant distribution. For the Euler–Maruyama method, one finds that *C* is bounded as

In order to calculate averages in systems with complicated potentials and/or a large number of variables, one often must perform numerical calculations with a very long time interval. It is then desirable to use as large a timestep as is reasonable in the interest of reducing the computational effort, which is typically quantified in terms of the number of force evaluations. Weak first-order methods like Euler–Maruyama can be inefficient in practice. Schemes such as the second-order stochastic Heun method [6,7] can have greater efficiency: the stochastic Heun method uses two evaluations of the force −∇*V* at each timestep, thus, in comparison to Euler–Maruyama, it must introduce less than about half the error at a given stepsize to be deemed superior. The alternative method discussed in this paper has been proposed in [9]
*i*=1,…,*d*, *k*=1,…, are i.i.d. random variables with the law *superconvergence* property in the long-time limit. Moreover, we show that this convergence is exponential in *τ*.

We note that there are several recent papers (see [11–13] and references therein), where the idea of modified differential equations is exploited in order to construct higher order schemes for computing ergodic limits. This approach provides the possibility of modifying schemes which are of weak order one on finite-time intervals to provide second-order approximations in ergodic limits. However, such modified schemes require either to evaluate derivative of forces or to perform two force evaluations [11], i.e. their computational cost is at least as high as for the Heun scheme and substantially higher than for (1.7). Furthermore, although the theoretical approaches in our paper and in [11] share some similarities, the results of [11] are not applicable to the non-Markovian approximation (1.7) and they do not also include an analysis demonstrating that the leading term in the error of their modified schemes goes to zero exponentially fast.

## 2. Preliminaries

Let *w*(*t*) from (1.1) is defined as well as the sequence of random variables *ξ*_{1},*ξ*_{2},… from the method (1.7). We suppose that all components of random variables *ξ*_{k} arising in (1.7) and the Wiener process *w* are independent. This assumption will allow us to use Ito integrals of the form *t*≥*t*_{k}, where *b*(*s*,*x*) is a deterministic ‘good’ function (also note that in this paper we are considering the weak-sense convergence only). In what follows, **E**(⋅) denotes expectation with respect to the measure *P*.

We use the following notation for the solution of (1.1): *X*(*t*)=*X*_{t0,x}(*t*) when *X*(*t*_{0})=*x*, *t*≥*t*_{0}, and also we will write *X*_{x}(*t*) when *t*_{0}=0. Recall (e.g. [3]) that the process *X*(*t*) is exponentially ergodic if for any *x*∈**R**^{d} and any function *φ* with a polynomial growth there are *C*(*x*)>0 and λ>0 such that
*X*(*t*) of (1.1) is exponentially ergodic with the Gibbs invariant density
*c*_{0}∈**R** and *c*_{1}>0 such that
*p*≥1
*K*>0 and 0<λ≤*c*_{1} depend on *p* (e.g. [3,4]).

Introduce the operator *L*
*p*(*t*,*x*,*y*) for (1.1) satisfies the Fokker–Planck (forward Kolmogorov) equation
*ρ*(*x*) satisfies the stationary Fokker–Planck equation

## 3. Main result

We start with a simple illustrative example.

### Example 3.1

Let *a*(*x*)=−*αx* with *α*>0, then *X*(*t*) from (1.1) is the Ornstein–Uhlenbeck process, which is Gaussian with **E***X*_{x}(*t*)=*x* *e*^{−αt} and *Cov*(*X*_{x}(*s*),*X*_{x}(*t*))=(*σ*^{2}/2*α*)(*e*^{−α(t−s)}−*e*^{−α(t+s)}) for *s*≤*t*. It is not difficult to calculate that for the Euler scheme (1.3)
*K*>0 independent of *τ*, and for scheme (1.7)

In what follows, we will assume the following.

### Assumption 3.2

*The potential* *V* (*x*)∈*C*^{7}(**R**^{d}), *its first-order derivatives grow not faster than a linear function at infinity and higher derivatives are bounded. The relations* (1.2) *and* (2.3) *hold. The function* *φ*(*x*) *in* (2.6) *lies* ∈*C*^{6}(**R**^{d}) *and it and its derivatives grow not faster than a polynomial function at infinity*.

The most restrictive condition in assumption 3.2 is the requirement for *a*(*x*)=−∇*V* to be globally Lipschitz
*K*>0 is independent of *x*∈**R**^{d}. (Refer to remark 3.6 and the example presented in §5*b*.)

Introduce the multi-index **i**=(*i*_{1},…,*i*_{d}) and *u*(*t*,*x*) of (2.6) belongs to *t* and *x* which have continuous partial derivatives through order 8 in the spatial variables. Moreover, for some constant _{u}>0 (e.g. [14])
**i—**≤8 and 0≤*j*≤2.

The proof of the following lemma (which is an analogue of the moments bound (2.4) for scheme (1.7)) is rather standard and is omitted here.

### Lemma 3.3

*Assume that* (2.3) *and* (3.1) *hold. Let X*_{k} *be defined by the scheme* (1.7). *Then for all sufficiently small* *h*>0 *for all* *p*≥1, *there is* *γ*∈(0,2*c*_{1}) *and* *K*>0 *such that*

We prove the following convergence and error expansion theorem for scheme (1.7).

### Theorem 3.4

*Let assumption 3.2 hold. Then scheme* (1.7) *is first-order weakly convergent and for all sufficiently small h>0 its error has the form*
*where*
*for some K>0,* *and λ>0 independent of h and τ.*

### Proof.

Note that we shall use the letters *K*, *h*, *t*, *τ* and *x*. We will exploit ideas from [7], ch. 2 and, in particular, from the proof of theorem 2.2.5 on the Talay–Tubaro expansion. It is convenient to introduce the additional notation for *X*_{k} from (1.7): *X*_{k−1}.

As *X*_{0}=*x*, we have *X*_{t1,X0,X0(t1)}(*τ*)=*X*_{x}(*τ*), and, using independence of *X*_{k} and *w*(*t*)−*w*(*t*_{k}), *t*≥*t*_{k} for all *k*, we can write *X*_{tk,Xk}(*τ*)=*X*_{tk+1,Xtk,Xk(tk+1)}(*τ*), *k*=0,…,*N*−1. Then (cf. [7], p. 101)

Using the definition (2.5) of *u*(*t*,*x*) and independence of *X*_{k} and *w*(*t*)−*w*(*t*_{k}), *t*≥*t*_{k}, we get
*u*(*τ*,*x*)=*φ*(*x*), we have **E***φ*(*X*_{tN−1,XN−1}(*τ*))=**E***u*(*t*_{N},*X*_{tN−1,XN−1}(*t*_{N})) and

Expanding *h* around *X*_{k} by the usual Taylor formula, we obtain
*K*>0, *h*, *x*, *t* and *τ*. To derive the estimate (3.9), we used (3.3), the assumptions on *a*(*x*) and its derivatives from assumption 3.2, and (3.4).

Introduce the auxiliary process
*r*_{i}(*t*_{k},*x*), *i*=2,…,5, satisfy estimates of the form (3.9), which are derived using the same facts as in the case of *r*_{1}(*t*_{k},*x*).

By lemma 2.1.9 from [7], p. 99 and again using independence of *X*_{k} and *w*(*t*)−*w*(*t*_{k}), *t*≥*t*_{k}, we get

We have for the second term in (3.14):
*r*_{i}(*t*_{k},*x*), *i*=6,7,8, satisfy estimates of the form (3.9), which are derived using the same facts as in the case of *r*_{1}(*t*_{k},*x*) except *r*_{6}(*t*_{k},*x*) where (2.4) was also used.

Let

Owing to the properties of *u*(*t*,*x*) (see (3.2) and (3.3)) and of *a*(*x*) (see assumption 3.2), we have
*K*>0 and *h*, *x*, *t* and *τ*. Using (3.19) and (3.4), we obtain from (3.17) that
*K*>0, *h*,*x* and *τ*, i.e. scheme (1.7) is of first weak order.

It remains to prove expansion (3.5). Consider now the (*d*+1)-dimensional system
*C*_{0}(*τ*,*x*) is equal to
*K*>0 does not depend on *x*, *t* and *τ*. Using the demonstrated first-order convergence of (1.7) (cf. (3.20)), it is not difficult to obtain that

Now we prove that in the limit of *h*.

### Theorem 3.5

*Let assumption 3.2 hold. Then the coefficient C*_{0}(*τ,x*) *from* (3.6) *goes to zero as* *for some constants K>0,* *and λ>0, i.e. over a long integration time scheme* (1.7) *is of order two up to exponentially small correction.*

### Proof.

We have
*p*(*t*,*x*,*y*) is the transition density for (1.1) (see (2.7)) and *ρ*(*y*) is the invariant density. Using integration by parts and (1.2), it is not difficult to verify that for any 0≤*t*≤*τ*
*X*(*t*) (cf. (2.1)), we have for *K*>0, _{B}>0 independent of *x*, *t* and *τ*.

Using (3.30), we obtain for some λ>0 and all sufficiently large *τ*>0

## 4. Discussion

(1) We emphasize that the fact that the average of *B*_{0}(*t*,*x*) with respect to the invariant measure is equal to zero (see (3.29)) is the reason why scheme (1.7) is second-order accurate in approximating ergodic limits (see theorem 3.5).

(2) In the case of the Euler scheme (1.3), we get the same error expansion as (3.5) for scheme (1.7) but with a different

(3) Let a one-step weak approximation *X*_{t,x}(*t*+*h*) of (1.1) generate a method of order *p*. Then, according to the Talay–Tubaro expansion [8] (see also [7], §2.2.3), the global error of the method has the form
*n*∈**N** (*n* can be arbitrarily large if the potential *V* (*x*) belongs to *C*_{0}(*τ*,*x*),…,*C*_{n}(*τ*,*x*) are independent of *h*. It follows from the proof of theorem 2.2.5 in [7] that the coefficients *C*_{i} in (4.1) can be presented in the form
*B*_{0}(*s*,*x*) is the coefficient at the leading term in the one-step error expansion of the method analogous to *B*(*s*,*x*) in theorem 3.4. The other *B*_{i}(*s*,*x*), *i*≥1, consists of the coefficient at *h*^{p+i+1} from the the one-step error expansion of the method (analogously to as *B*_{0}(*s*,*x*) does at *h*^{p+1}) and of the coefficients at *h*^{p+i+1} from one-step error expansions for approximations of *C*_{j} with *j*<*i* (see details in [7], §2.2.3). Furthermore, one can deduce from the proof of theorem 3.5 that if the averages of *B*_{i}(*s*,*x*) 0≤*i*≤*q*≤*n*, with respect to the invariant measure are equal to zero then in the limit of *p*+*q* order of accuracy in *h*. Hence, such a detailed one-step error analysis is the basis for discovering long-time integration properties of numerical schemes and can serve as a guide in the construction of highly efficient numerical methods for computing ergodic limits for diffusions.

We observe that higher order methods for sampling from the Gibbs distribution have recently been constructed based on the idea of modified differential equations (e.g. in [11]) but they are Markovian in comparison with scheme (1.7) considered here. To our knowledge (see also the recent paper [13]), a non-Markovian scheme with order higher than 2 in computing ergodic limits has not yet been proposed. We note in particular the remarkable simplicity of scheme (1.7), which requires just a single evaluation of force per step but nevertheless has second-order accuracy in the computation of ergodic limits.

## 5. Numerical experiments

We compare the sampled distributions for the Euler–Maruyama scheme (1.3) with the second-order (in the sense of approximating ergodic limits) scheme (1.7), with both methods equal in cost (measured in terms of evaluations of the force). We also compare the sampled distributions with Heun's method, a second-order scheme requiring two evaluations of *a*(*x*)=−∇*V* (*x*)
*V* , it is necessary to consider anharmonic models in order to capture the representative behaviour of the scheme.

### (a) Anharmonic scalar model

We consider solutions to (1.1) using the one-dimensional potential energy function
*x*∈[0,2*π*).

#### (i) Error in infinite time

We sample the configurational distribution ^{8}].

We note that the weak-sense convergence results are proved in §3 under the assumption that test functions *φ*(*x*) are sufficiently smooth and they and their derivatives grow not faster than polynomial functions at infinity (see assumption 3.2). This is a usual assumption in stochastic numerics [5–7]. At the same time, this assumption is not sufficient to guarantee convergence in distribution of scheme (1.7), which would require to consider *φ*(*x*) being step functions. In [18], first-order weak-sense convergence of the Euler scheme and the corresponding Talay–Tubaro error expansion were proved in the case of *φ*(*x*) being measurable bounded functions, which, in particular, implies convergence in distribution of the Euler scheme. Further, first-order convergence for density of the Euler scheme was proved in [19]. Ideas from Bally & Talay [18,19] can be exploited to extend the convergence results obtained in §3 for scheme (1.7) to include the case of non-smooth *φ*(*x*). Here, we show and compare convergence in distribution of scheme (1.7) and the other two tested methods experimentally.

For each scheme, we divide [0,2*π*] into 100 equal histogram bins to approximate the sampled distribution and compare the observed density of bin *i* (denoted *i* (denoted *ρ*_{i}) computed to high precision using a numerical solver. The error in the distribution is then reported as either the approximate *L*_{2} difference in the sampled distributions or as the relative entropy (or Kullback–Leibler divergence [20]) of the two distributions, defined by *h*=0.2 and subsequent timesteps are increased by 10%. The distributions are averaged over 32 independent realizations per timestep, and the overall errors are plotted in figure 1.

The results match the analysis given in §3 for the large-time regime. In the case of the *L*_{2} error, the Euler–Maruyama scheme gives a first-order error in the computed distribution, whereas the other schemes give second-order errors. For the computation of relative entropy, we see a doubled rate of convergence (from first to second order, or from second to fourth order). Writing *ε* is a small parameter and *k* scheme, then we find that the relative entropy is proportional to *h*^{2k}. In practice, we observe that Heun's method and the method (1.7) give a fourth-order relationship with the stepsize, whereas the Euler–Maruyama scheme has relative entropy proportional to *ε*^{2}. The non-Markovian method gives approximately an order of magnitude improvement in this example.

#### (ii) Error in finite time

We consider the weak accuracy of the Euler–Maruyama scheme (1.3), Heun's method (5.1) and the method (1.7). In order to realize the evolving distribution computed for each scheme, we average over 2.56×10^{9} independent trajectories with initial points drawn from a normal distribution with mean *π* and variance 1 (where the tails of the distribution outside the periodic region are cut off). We divide [0,2*π*] into 21 histogram bins and run over *t*∈[0,9].

As the exact solution is unknown, we compute a baseline solution using Heun's method with *h*=0.04 over the time interval. This solution is compared to the evolving distributions for *h*=0.16, 0.24, 0.32 and 0.48. The growth of the error at multiples of *t*=0.96 is plotted at the top and bottom of figure 2, along with guidelines to indicate the order of accuracy.

We plot the error after time *t* for each scheme, using *h*=0.16, in the central plot of figure 2. Initially, the error in scheme (1.7) reduces like *t*=4. This is due to the behaviour described in §3, where only the first-order component has an exponentially decreasing prefactor. The stabilization occurs when the *h*^{2} part of the error begins to dominate the observed error.

### (b) Lennard–Jones box

As a more challenging problem, we compute the error in the radial distribution function for *r*∈(0,6) for a 6×6×6 periodic box of 64 Lennard–Jones particles, with interaction potential
*q*_{i} denotes the position of particle *i*, i.e. *x* in (1.1) and (1.2) is 3×64=192-dimensional. We chose, arbitrarily, *β*=10 and estimate the radial distribution function during simulation by dividing the interval (0,6) into 120 histogram bins of equal length, with the error computed as the *L*_{2} difference between the exact and computed radial distributions.

We observe that the Lipschitz condition (3.1) is not, formally, satisfied for many molecular dynamics potentials (including Lennard–Jones potentials) due to the presence of singularities. Nonetheless it is likely that, due to energetic considerations it would be possible to create a modified domain (a) in which typical solutions remain and (b) in which the Lipschitz condition (3.1) can be verified. The numerical example presented here strongly suggests that the global Lipschitz condition could be relaxed. More directly, the assumption (3.1) can be verified if the potential is replaced by one without singularities, for example, by using instead Morse potentials, or by a smoothly truncated singular potential, or by a smooth Gaussian approximation of the singular potential [21].

Owing to the size and complexity of the problem, we cannot use standard numerical solvers to compute the exact solution. Therefore, we compute a baseline solution using scheme (1.7) to compute 368 realizations of a 10^{7} step trajectory (after a 10^{6} step equilibration period), with a small stepsize of *h*=0.0016.

We next compute the radial distribution functions computed using the three schemes in §5*a*, at 10 different timesteps beginning at *h*=0.002 and with subsequent timesteps increasing by 10%. The trajectories were all taken over a constant time window of [0,20 000], with sampling beginning after a 10^{6} step equilibration.

We plot the error for all three schemes in figure 3. For both the Euler–Maruyama scheme and Heun's method, we average over 32 realizations for each timestep that we consider. This was sufficient to resolve the error introduced by these discretization methods. However, scheme (1.7) proved to be sufficiently accurate that further computation was required to discern the leading error term, with the error at each timestep computed using 256 realizations to reduce the sampling error.

The results show good agreement with the theory presented in §3. The method (1.7) demonstrates an order of magnitude improvement in the long-time error of averages compared to Heun's method, while at the same time requiring half the cost (in terms of force evaluations).

## 6. Summary

In this article, we have closed the gap in understanding between the typical weak error analysis of numerical discretization methods and the invariant measure accuracy of e.g. [9–11], demonstrating in particular that the non-Markovian numerical integration method (1.7) makes an exponentially rapid transition from first-order weak accuracy to second-order accuracy as

- Received February 12, 2014.
- Accepted July 15, 2014.

- © 2014 The Author(s) Published by the Royal Society. All rights reserved.