This article addresses the weak convergence of numerical methods for Brownian dynamics. Typical analyses of numerical methods for stochastic differential equations focus on properties such as the weak order which estimates the asymptotic (stepsize ) convergence behaviour of the error of finite-time averages. Recently, it has been demonstrated, by study of Fokker–Planck operators, that a non-Markovian numerical method generates approximations in the long-time limit with higher accuracy order (second order) than would be expected from its weak convergence analysis (finite-time averages are first-order accurate). In this article, we describe the transition from the transient to the steady-state regime of this numerical method by estimating the time-dependency of the coefficients in an asymptotic expansion for the weak error, demonstrating that the convergence to second order is exponentially rapid in time. Moreover, we provide numerical tests of the theory, including comparisons of the efficiencies of the Euler–Maruyama method, the popular second-order Heun method, and the non-Markovian method.
Stochastic gradient systems are stochastic differential equations in d dimensions having the form 1.1 where 1.2 V (x), x∈Rd, is a potential energy function, and σ>0 is a constant which characterizes the strength of the additive noise, here described by a standard d-dimensional Wiener process w(t). These systems originate in the work of Einstein [1,2] to describe the motion of Brownian particles. They arise in mathematical models for chemistry, physics, biology and other areas, when the cumulative effect of unresolved degrees of freedom must be incorporated into a model to ensure its physical relevance. Under mild conditions on V , system (1.1) is ergodic [3,4] and has the unique invariant distribution , where β=2σ−2. Numerical methods for solving equation (1.1) compute a discrete sequence of states X1,X2,… by iteratively approximating the short-time evolution. The error in the numerical solution is typically quantified in either a strong sense (accuracy with respect to a particular stochastic path associated to (1.1)) or by reference to an evolving distribution (weak error or error in averages); the latter is the focus of this article. Ideally, the discrete states are ultimately distributed in a way that is consistent with the invariant distribution, but for complex applications the introduction of error in the numerical process is inevitable. In this article, we examine the asymptotic () behaviour of the weak error.
Undoubtedly, the most common numerical method for solving (1.1) is the Euler–Maruyama method which approximates X(tk), tk=hk, by the iteration 1.3 where and i=1,…,d, k=1,…, are i.i.d. random variables with the law For analysis of the weak error, one considers a finite time interval [0,τ], with τ=hN. The probability measure associated to (1.1) is described by a probability density ρ(t,x) that evolves according to the Fokker–Planck equation where is the adjoint (in the L2 sense) of the generator for (1.1), defined by 1.4 The solution ρ(t,x) evolves from an initial probability distribution ρ(0,x) to the steady state . Let φ be a test function (such as an element of the Schwartz space of functions rapidly decaying at infinity). Then the average of φ at time τ may be taken to be 1.5
The discretization scheme (1.3) may also be viewed as giving rise to an evolving probability distribution, and thus one may think of the iterates in (1.3), X1,X2,…, as being characterized by densities ρ1,ρ2,…. If stepsize h is used, then the average at time τ=Nh is given by 1.6 It is natural to compare (1.5) and (1.6) as a means of quantifying the error as a function of h. We refer to this as the weak error. For the Euler–Maruyama method, it is known (e.g. [5–7]) that The Landau notation means that the given quantity is bounded for by Ch, where C is a constant that is independent of the stepsize. A better way to write this is as C depends inherently on the time interval. This formula can be seen as a consequence of an asymptotic expansion of the weak error, as proposed by Talay & Tubaro  (see also [5–7]). We note that C also depends on the distribution of the initial state of the system, i.e. ρ(0,x), as well as the particular observable, but we suppress these aspects in our notation. The asymptotic () behaviour of C describes the performance of the numerical method for computing averages with respect to the invariant distribution. For the Euler–Maruyama method, one finds that C is bounded as , thus one obtains first-order approximation of averages both at finite time and in the long-time limit.
In order to calculate averages in systems with complicated potentials and/or a large number of variables, one often must perform numerical calculations with a very long time interval. It is then desirable to use as large a timestep as is reasonable in the interest of reducing the computational effort, which is typically quantified in terms of the number of force evaluations. Weak first-order methods like Euler–Maruyama can be inefficient in practice. Schemes such as the second-order stochastic Heun method [6,7] can have greater efficiency: the stochastic Heun method uses two evaluations of the force −∇V at each timestep, thus, in comparison to Euler–Maruyama, it must introduce less than about half the error at a given stepsize to be deemed superior. The alternative method discussed in this paper has been proposed in  1.7 where and i=1,…,d, k=1,…, are i.i.d. random variables with the law . This method is very similar in form to the Euler–Maruyama method (1.3) and is as easy to implement, but the sums of successive random increments are not statistically independent, so the method is fundamentally non-Markovian in nature. The scheme was motivated in  by an analysis of Langevin dynamics algorithms. In , the same method, along with some alternatives, was further analysed from the perspective of the invariant measure, providing a rigorous foundation for the statement that the error in long-time averaging computed using (1.7) is of order two, i.e. The remarkable feature of this estimate is that the second-order accuracy is achieved with only a single evaluation of the force at each timestep. The result of  relies on study of the invariant distribution and a Baker–Campbell–Hausdorff expansion of the generator of the process. Such an operator-based approach does not clarify the progression from finite-time averaging to infinite-time averaging and, in particular, nothing is demonstrated in [9,10] about the weak accuracy of the method. In this article, we address this issue directly, studying the way that the finite-time averages obtained using the numerical scheme (1.7) converge, as , to steady-states of the numerical method. To do this, we compute the Talay–Tubaro expansion at finite time and show that Then we demonstrate that implying a superconvergence property in the long-time limit. Moreover, we show that this convergence is exponential in τ.
We note that there are several recent papers (see [11–13] and references therein), where the idea of modified differential equations is exploited in order to construct higher order schemes for computing ergodic limits. This approach provides the possibility of modifying schemes which are of weak order one on finite-time intervals to provide second-order approximations in ergodic limits. However, such modified schemes require either to evaluate derivative of forces or to perform two force evaluations , i.e. their computational cost is at least as high as for the Heun scheme and substantially higher than for (1.7). Furthermore, although the theoretical approaches in our paper and in  share some similarities, the results of  are not applicable to the non-Markovian approximation (1.7) and they do not also include an analysis demonstrating that the leading term in the error of their modified schemes goes to zero exponentially fast.
Let be a sufficiently rich probability space on which the -measurable Wiener process w(t) from (1.1) is defined as well as the sequence of random variables ξ1,ξ2,… from the method (1.7). We suppose that all components of random variables ξk arising in (1.7) and the Wiener process w are independent. This assumption will allow us to use Ito integrals of the form t≥tk, where b(s,x) is a deterministic ‘good’ function (also note that in this paper we are considering the weak-sense convergence only). In what follows, E(⋅) denotes expectation with respect to the measure P.
We use the following notation for the solution of (1.1): X(t)=Xt0,x(t) when X(t0)=x, t≥t0, and also we will write Xx(t) when t0=0. Recall (e.g. ) that the process X(t) is exponentially ergodic if for any x∈Rd and any function φ with a polynomial growth there are C(x)>0 and λ>0 such that 2.1 where 2.2 The solution X(t) of (1.1) is exponentially ergodic with the Gibbs invariant density under the condition (e.g. [3,4]): there exist c0∈R and c1>0 such that 2.3 Under this condition, for all p≥1 2.4 where K>0 and 0<λ≤c1 depend on p (e.g. [3,4]).
Introduce the operator L where is the generator for (1.1) defined in (1.4). We recall that the function 2.5 satisfies the Cauchy problem for the backward Kolmogorov equation 2.6 i.e. so The transition density p(t,x,y) for (1.1) satisfies the Fokker–Planck (forward Kolmogorov) equation 2.7 where is adjoint of and the invariant density ρ(x) satisfies the stationary Fokker–Planck equation 2.8
3. Main result
We start with a simple illustrative example.
Let a(x)=−αx with α>0, then X(t) from (1.1) is the Ornstein–Uhlenbeck process, which is Gaussian with EXx(t)=x e−αt and Cov(Xx(s),Xx(t))=(σ2/2α)(e−α(t−s)−e−α(t+s)) for s≤t. It is not difficult to calculate that for the Euler scheme (1.3) and where with K>0 independent of τ, and for scheme (1.7) and We see that although both schemes have first-order accuracy on finite-time intervals, the ergodic limit of scheme (1.7) is exact while the ergodic limit of the Euler scheme approximates the ergodic limit of the Ornstein–Uhlenbeck process with order one which is usually the case for weak schemes of order one [14–16].
In what follows, we will assume the following.
The potential V (x)∈C7(Rd), its first-order derivatives grow not faster than a linear function at infinity and higher derivatives are bounded. The relations (1.2) and (2.3) hold. The function φ(x) in (2.6) lies ∈C6(Rd) and it and its derivatives grow not faster than a polynomial function at infinity.
The most restrictive condition in assumption 3.2 is the requirement for a(x)=−∇V to be globally Lipschitz 3.1 where K>0 is independent of x∈Rd. (Refer to remark 3.6 and the example presented in §5b.)
Introduce the multi-index i=(i1,…,id) and Under assumption 3.2, we have the following. The solution u(t,x) of (2.6) belongs to , the space of functions of t and x which have continuous partial derivatives through order 8 in the spatial variables. Moreover, for some constant and λu>0 (e.g. ) 3.2 and 3.3 for all 1≤|i—≤8 and 0≤j≤2.
We prove the following convergence and error expansion theorem for scheme (1.7).
Let assumption 3.2 hold. Then scheme (1.7) is first-order weakly convergent and for all sufficiently small h>0 its error has the form 3.5 where 3.6 for some K>0, and λ>0 independent of h and τ.
Note that we shall use the letters K, and λ to denote various constants which are independent of h, t, τ and x. We will exploit ideas from , ch. 2 and, in particular, from the proof of theorem 2.2.5 on the Talay–Tubaro expansion. It is convenient to introduce the additional notation for Xk from (1.7): , which explicitly expresses the dependence on Xk−1.
As X0=x, we have Xt1,X0,X0(t1)(τ)=Xx(τ), and, using independence of Xk and w(t)−w(tk), t≥tk for all k, we can write Xtk,Xk(τ)=Xtk+1,Xtk,Xk(tk+1)(τ), k=0,…,N−1. Then (cf. , p. 101)
Using the definition (2.5) of u(t,x) and independence of Xk and w(t)−w(tk), t≥tk, we get Also, as u(τ,x)=φ(x), we have Eφ(XtN−1,XN−1(τ))=Eu(tN,XtN−1,XN−1(tN)) and . Combining the above expressions, we arrive at the following identity for the global error of the method (1.7) 3.7
Expanding in powers of h around Xk by the usual Taylor formula, we obtain 3.8 where and 3.9 for some K>0, and λ>0 independent of h, x, t and τ. To derive the estimate (3.9), we used (3.3), the assumptions on a(x) and its derivatives from assumption 3.2, and (3.4).
Introduce the auxiliary process Note that Using the Taylor expansions around , we get for the second term in (3.8): 3.10 for the third term in (3.8): 3.11 for the fourth term in (3.8): 3.12 and for the fifth term in (3.8): 3.13 The functions ri(tk,x), i=2,…,5, satisfy estimates of the form (3.9), which are derived using the same facts as in the case of r1(tk,x).
By lemma 2.1.9 from , p. 99 and again using independence of Xk and w(t)−w(tk), t≥tk, we get 3.14
We have for the second term in (3.14): 3.15 and for the third term in (3.14): 3.16 The functions ri(tk,x), i=6,7,8, satisfy estimates of the form (3.9), which are derived using the same facts as in the case of r1(tk,x) except r6(tk,x) where (2.4) was also used.
Owing to the properties of u(t,x) (see (3.2) and (3.3)) and of a(x) (see assumption 3.2), we have 3.19 for some K>0 and independent of h, x, t and τ. Using (3.19) and (3.4), we obtain from (3.17) that 3.20 for some constants K>0, and λ>0 independent of h,x and τ, i.e. scheme (1.7) is of first weak order.
It remains to prove expansion (3.5). Consider now the (d+1)-dimensional system 3.21 Solving (3.21) by scheme (1.7), we get 3.22 where C0(τ,x) is equal to 3.23 and 3.24 Introduce 3.25 for which we have (cf. (3.19)) where K>0 does not depend on x, t and τ. Using the demonstrated first-order convergence of (1.7) (cf. (3.20)), it is not difficult to obtain that 3.26 The equality (3.17) together with (3.18) and (3.22)–(3.26) implies (3.5) and (3.6). ▪
Now we prove that in the limit of , scheme (1.7) has second order of accuracy in h.
Let assumption 3.2 hold. Then the coefficient C0(τ,x) from (3.6) goes to zero as 3.27 for some constants K>0, and λ>0, i.e. over a long integration time scheme (1.7) is of order two up to exponentially small correction.
We have 3.28 where p(t,x,y) is the transition density for (1.1) (see (2.7)) and ρ(y) is the invariant density. Using integration by parts and (1.2), it is not difficult to verify that for any 0≤t≤τ 3.29 Further, using geometric ergodicity of X(t) (cf. (2.1)), we have for from (3.25) 3.30 for some constants K>0, and λB>0 independent of x, t and τ.
(1) We emphasize that the fact that the average of B0(t,x) with respect to the invariant measure is equal to zero (see (3.29)) is the reason why scheme (1.7) is second-order accurate in approximating ergodic limits (see theorem 3.5).
(2) In the case of the Euler scheme (1.3), we get the same error expansion as (3.5) for scheme (1.7) but with a different (see , §2.2.3) The average of with respect to the invariant measure is not equal to zero and, consequently, the Euler scheme (1.3) approximates ergodic limits with order one—the same order as its weak convergence over a finite-time interval (see also example 3.1).
(3) Let a one-step weak approximation of the solution Xt,x(t+h) of (1.1) generate a method of order p. Then, according to the Talay–Tubaro expansion  (see also , §2.2.3), the global error of the method has the form 4.1 where n∈N (n can be arbitrarily large if the potential V (x) belongs to its first-order derivatives grow not faster than a linear function at infinity and its higher derivatives of any order are bounded) and the functions C0(τ,x),…,Cn(τ,x) are independent of h. It follows from the proof of theorem 2.2.5 in  that the coefficients Ci in (4.1) can be presented in the form The function B0(s,x) is the coefficient at the leading term in the one-step error expansion of the method analogous to B(s,x) in theorem 3.4. The other Bi(s,x), i≥1, consists of the coefficient at hp+i+1 from the the one-step error expansion of the method (analogously to as B0(s,x) does at hp+1) and of the coefficients at hp+i+1 from one-step error expansions for approximations of Cj with j<i (see details in , §2.2.3). Furthermore, one can deduce from the proof of theorem 3.5 that if the averages of Bi(s,x) 0≤i≤q≤n, with respect to the invariant measure are equal to zero then in the limit of the scheme has p+q order of accuracy in h. Hence, such a detailed one-step error analysis is the basis for discovering long-time integration properties of numerical schemes and can serve as a guide in the construction of highly efficient numerical methods for computing ergodic limits for diffusions.
We observe that higher order methods for sampling from the Gibbs distribution have recently been constructed based on the idea of modified differential equations (e.g. in ) but they are Markovian in comparison with scheme (1.7) considered here. To our knowledge (see also the recent paper ), a non-Markovian scheme with order higher than 2 in computing ergodic limits has not yet been proposed. We note in particular the remarkable simplicity of scheme (1.7), which requires just a single evaluation of force per step but nevertheless has second-order accuracy in the computation of ergodic limits.
5. Numerical experiments
We compare the sampled distributions for the Euler–Maruyama scheme (1.3) with the second-order (in the sense of approximating ergodic limits) scheme (1.7), with both methods equal in cost (measured in terms of evaluations of the force). We also compare the sampled distributions with Heun's method, a second-order scheme requiring two evaluations of a(x)=−∇V (x) 5.1 As scheme (1.7) computes exact long-time averages for all quadratic potential energy functions V , it is necessary to consider anharmonic models in order to capture the representative behaviour of the scheme.
(a) Anharmonic scalar model
We consider solutions to (1.1) using the one-dimensional potential energy function with periodic x∈[0,2π).
(i) Error in infinite time
We sample the configurational distribution using trajectories generated using the Euler–Maruyama scheme (1.3), Heun's method (5.1) and the method (1.7), where the trajectory runs over a fixed time interval of [0.2×108].
We note that the weak-sense convergence results are proved in §3 under the assumption that test functions φ(x) are sufficiently smooth and they and their derivatives grow not faster than polynomial functions at infinity (see assumption 3.2). This is a usual assumption in stochastic numerics [5–7]. At the same time, this assumption is not sufficient to guarantee convergence in distribution of scheme (1.7), which would require to consider φ(x) being step functions. In , first-order weak-sense convergence of the Euler scheme and the corresponding Talay–Tubaro error expansion were proved in the case of φ(x) being measurable bounded functions, which, in particular, implies convergence in distribution of the Euler scheme. Further, first-order convergence for density of the Euler scheme was proved in . Ideas from Bally & Talay [18,19] can be exploited to extend the convergence results obtained in §3 for scheme (1.7) to include the case of non-smooth φ(x). Here, we show and compare convergence in distribution of scheme (1.7) and the other two tested methods experimentally.
For each scheme, we divide [0,2π] into 100 equal histogram bins to approximate the sampled distribution and compare the observed density of bin i (denoted ) to the exact canonical density of bin i (denoted ρi) computed to high precision using a numerical solver. The error in the distribution is then reported as either the approximate L2 difference in the sampled distributions or as the relative entropy (or Kullback–Leibler divergence ) of the two distributions, defined by . The relative entropy gives a measure of the information lost between two probability distributions. The two error quantities are approximated as We compute the configurational distribution using each scheme at 16 different timesteps, where the smallest is h=0.2 and subsequent timesteps are increased by 10%. The distributions are averaged over 32 independent realizations per timestep, and the overall errors are plotted in figure 1.
The results match the analysis given in §3 for the large-time regime. In the case of the L2 error, the Euler–Maruyama scheme gives a first-order error in the computed distribution, whereas the other schemes give second-order errors. For the computation of relative entropy, we see a doubled rate of convergence (from first to second order, or from second to fourth order). Writing , where ε is a small parameter and (conservation of total probability), we have In the discrete context, if for an order k scheme, then we find that the relative entropy is proportional to h2k. In practice, we observe that Heun's method and the method (1.7) give a fourth-order relationship with the stepsize, whereas the Euler–Maruyama scheme has relative entropy proportional to ε2. The non-Markovian method gives approximately an order of magnitude improvement in this example.
(ii) Error in finite time
We consider the weak accuracy of the Euler–Maruyama scheme (1.3), Heun's method (5.1) and the method (1.7). In order to realize the evolving distribution computed for each scheme, we average over 2.56×109 independent trajectories with initial points drawn from a normal distribution with mean π and variance 1 (where the tails of the distribution outside the periodic region are cut off). We divide [0,2π] into 21 histogram bins and run over t∈[0,9].
As the exact solution is unknown, we compute a baseline solution using Heun's method with h=0.04 over the time interval. This solution is compared to the evolving distributions for h=0.16, 0.24, 0.32 and 0.48. The growth of the error at multiples of t=0.96 is plotted at the top and bottom of figure 2, along with guidelines to indicate the order of accuracy.
We plot the error after time t for each scheme, using h=0.16, in the central plot of figure 2. Initially, the error in scheme (1.7) reduces like , but stabilizes after t=4. This is due to the behaviour described in §3, where only the first-order component has an exponentially decreasing prefactor. The stabilization occurs when the h2 part of the error begins to dominate the observed error.
(b) Lennard–Jones box
As a more challenging problem, we compute the error in the radial distribution function for r∈(0,6) for a 6×6×6 periodic box of 64 Lennard–Jones particles, with interaction potential where qi denotes the position of particle i, i.e. x in (1.1) and (1.2) is 3×64=192-dimensional. We chose, arbitrarily, β=10 and estimate the radial distribution function during simulation by dividing the interval (0,6) into 120 histogram bins of equal length, with the error computed as the L2 difference between the exact and computed radial distributions.
We observe that the Lipschitz condition (3.1) is not, formally, satisfied for many molecular dynamics potentials (including Lennard–Jones potentials) due to the presence of singularities. Nonetheless it is likely that, due to energetic considerations it would be possible to create a modified domain (a) in which typical solutions remain and (b) in which the Lipschitz condition (3.1) can be verified. The numerical example presented here strongly suggests that the global Lipschitz condition could be relaxed. More directly, the assumption (3.1) can be verified if the potential is replaced by one without singularities, for example, by using instead Morse potentials, or by a smoothly truncated singular potential, or by a smooth Gaussian approximation of the singular potential .
Owing to the size and complexity of the problem, we cannot use standard numerical solvers to compute the exact solution. Therefore, we compute a baseline solution using scheme (1.7) to compute 368 realizations of a 107 step trajectory (after a 106 step equilibration period), with a small stepsize of h=0.0016.
We next compute the radial distribution functions computed using the three schemes in §5a, at 10 different timesteps beginning at h=0.002 and with subsequent timesteps increasing by 10%. The trajectories were all taken over a constant time window of [0,20 000], with sampling beginning after a 106 step equilibration.
We plot the error for all three schemes in figure 3. For both the Euler–Maruyama scheme and Heun's method, we average over 32 realizations for each timestep that we consider. This was sufficient to resolve the error introduced by these discretization methods. However, scheme (1.7) proved to be sufficiently accurate that further computation was required to discern the leading error term, with the error at each timestep computed using 256 realizations to reduce the sampling error.
The results show good agreement with the theory presented in §3. The method (1.7) demonstrates an order of magnitude improvement in the long-time error of averages compared to Heun's method, while at the same time requiring half the cost (in terms of force evaluations).
In this article, we have closed the gap in understanding between the typical weak error analysis of numerical discretization methods and the invariant measure accuracy of e.g. [9–11], demonstrating in particular that the non-Markovian numerical integration method (1.7) makes an exponentially rapid transition from first-order weak accuracy to second-order accuracy as . Our results are confirmed in several numerical experiments, with the ultimate conclusion being that scheme (1.7) is typically superior to the Euler–Maruyama and Heun's methods in terms of accuracy and efficiency for the purpose of averaging in the long term (in the transient region, the other methods may of course be better, depending on the problem).
- Received February 12, 2014.
- Accepted July 15, 2014.
- © 2014 The Author(s) Published by the Royal Society. All rights reserved.