## Abstract

We investigate the algebraic structure underlying the stochastic Taylor solution expansion for stochastic differential systems. Our motivation is to construct *efficient* integrators. These are approximations that generate strong numerical integration schemes that are more accurate than the corresponding stochastic Taylor approximation, independent of the governing vector fields and to all orders. The sinhlog integrator introduced by Malham & Wiese (2009) is one example. Herein, we show that the natural context to study stochastic integrators and their properties is the convolution shuffle algebra of endomorphisms; establish a new whole class of efficient integrators; and then prove that, within this class, the sinhlog integrator generates the *optimal* efficient stochastic integrator at all orders.

## 1. Introduction

We consider the simulation of stochastic differential systems of arbitrary order . We assume for our system has the form
This system is driven by a *d*-dimensional Wiener process (*W*^{1},…,*W*^{d}) and governed by a drift vector field *V* _{0} and diffusion vector fields *V* _{1},…,*V* _{d}. We use the convention and interpret the stochastic integrals in the Stratonovich sense. Hereafter, we will assume lies in the interval of existence of the solution. In general, we also suppose that the vector fields for *i*=0,…,*d* are sufficiently smooth and non-commuting. We focus on solution series and their use in strong simulation schemes. The stochastic Taylor expansion for the flowmap *φ*_{t}: *y*_{0}↦*y*_{t}, taking the data *y*_{0} at time *t*=0 to the solution *y*_{t} at time *t* for the stochastic differential system above is given by (see Baudoin 2004 or Lyons & Victoir 2004)
Here *w*=*a*_{1}…*a*_{n} is a word with letters *a*_{1},…,*a*_{n} chosen from the alphabet . The sum is over all possible words *w* in , the free monoid on . All the stochastic information is encoded in the scalar random variables (Stratonovich integrals)
The partial differential operators *V* _{w}:=*V* _{a1}○⋯○*V* _{an}, constructed by composing the vector fields, encode all the geometric information.

Strong numerical integration schemes for stochastic differential systems are based on truncating the stochastic Taylor expansion and applying the resulting approximate flowmap over successive small computation subintervals spanning the global time interval of interest. More generally, across a computation interval [0,*t*], for an invertible map defined in §2, we can

— construct the series

*σ*_{t}=*f*(*φ*_{t});— truncate the series

*σ*_{t}to according to a grading g(*w*) on the words*w*; and— compute and use this as the basis of a numerical scheme.

For example, suppose *f*=id, the identity map. Then, *σ*_{t} is just the stochastic Taylor expansion *φ*_{t}, which we split according to the grading g(*w*) as follows
for . A stochastic Taylor numerical scheme of strong order *n*/2 would be the first term on the right shown; the remainder is the last term. The grading g(*w*) here is determined by the variance of the stochastic integrals *J*_{w}; zero letters in *w* contribute a count of one, while non-zero letters contribute a count of one-half towards g(*w*). An important technicality is to include in the integrator, the *expectation* of the terms in the remainder at leading order. This is because not including them would decrease the expected global order by one-half (an explanation can be found in Buckwar *et al.* (2012) or Malham & Wiese (2009)). In other words, a stochastic Taylor integrator of strong order *n*/2 is
where the expectations of the *J*_{w}, here denoted , are known analytically. The *Euler–Maruyama* and *Milstein* numerical methods correspond to the cases *n*=1 and *n*=2, respectively, applied on successive computation subintervals with the vector fields evaluated on the initial data on each subinterval. Stochastic Runge–Kutta methods are constructed by replacing the partial differential operators *V* _{w} by finite differences.

Another example is , so that . This is the exponential Lie series which is the basis of the Castell–Gaines method (see Castell & Gaines 1995, 1996; also see Fliess 1981; Azencott 1982; Ben Arous 1989; Castell 1993). Truncating the exponential Lie series to generates a Lie polynomial in the Lie algebra of vector fields. We assume that we can suitably approximately simulate the multiple integrals *J*_{w} retained in the truncation (more on this presently). Hence, and our approximation to the solution *y*_{t} across [0,*t*] can be generated as follows. For a given realization of the *J*_{w} terms retained, we simply solve the ordinary differential system for *u*=*u*(*τ*), for *τ*∈[0,1] and *u*(0)=*y*_{0}. Though we might achieve this analytically, more often one has to use a suitable accurate ordinary differential integrator. In either case, we have .

We measure the accuracy of a strong order integrator by the root mean square of its local remainder
More precisely, we measure ∥*r*_{t}○*y*_{0}∥_{L2} for each , i.e. the square root of the expectation of the Euclidean norm of *r*_{t}○*y*_{0}. Let us now clarify an important issue. The order of a strong numerical method is determined by the set of multiple Wiener integrals simulated and included. Since the set of all multiple (Stratonovich) Wiener integrals is generated by those based on Lyndon words (see Reutenauer 1993, p. 111 and Gaines 1994), we need only simulate the multiple Wiener integrals indexed by Lyndon words. The other multiple Wiener integrals of that order can be computed by linear combinations of products of the appropriate Lyndon word multiple integrals of that order or less. However, Lyndon word multiple integrals of the same order cannot be generated as such (or from each other). Hence, more correctly, the order of a method is determined by the set of Lyndon word multiple Wiener integrals included. Consequently, in general, with this multiple integral set, a more accurate method can only have a better error constant and an improvement in scaling is not possible. Throughout this article, we assume that we can suitably approximate/simulate the Lyndon word multiple Wiener integrals up to the order required. The bulk of computational effort in higher order strong simulation methods is devoted to this task, though there has been some recent advances on this front (see Wiktorsson 2001; Lyons & Victoir 2004; Levin & Wildon 2008; Malham & Wiese 2011).

Castell & Gaines (1995, 1996) proved that their simulation method of strong order one-half was *asymptotically efficient* in the sense of Newton (1991): it ‘minimizes the leading coefficient in the expansion of the mean-square errors as power series in the sample step-size’. This property extends to their strong order one method when the diffusion vector fields commute. However, Malham & Wiese (2009) demonstrated that when the stochastic differential system above is driven by a multi-dimensional driving Wiener process and the governing diffusion vector fields do not commute, then a strong numerical simulation based on the exponential Lie series is not asymptotically efficient (independent of the vector fields); also see Lord *et al.* (2008). Malham & Wiese (2009) proved, in the absence of drift, that a strong simulation method generated by taking the sinhlog of the flowmap, truncating the resulting series and then taking the inverse sinhlog, is *efficient* to all orders. This means that the error of the sinhlog integrator is always smaller than the error of the corresponding stochastic Taylor integrator in the mean-square sense, independent of the vector fields. In this paper, we:

show that the natural context to study stochastic integrators and their properties is the convolution algebra of endomorphisms on the Hopf shuffle algebra of words;

establish a new class of efficient stochastic integrators using this algebraic structure (we include drift and grade according to word length); and

prove that within this class, the sinhlog integrator generates the

*optimal*efficient stochastic integrator to all orders. By this, we mean that the error of the integrator realizes its smallest possible value when compared with the error of the corresponding stochastic Taylor integrator, in the mean-square sense.

Our paper is structured as follows. In §2, we demonstrate the direct relation between stochastic expansions and the convolution shuffle algebra of endomorphisms on the Hopf shuffle algebra of words. We define an inner product structure on the convolution shuffle algebra of endomorphisms that is based on the correlation measure between multi-dimensional stochastic processes in §3. We also demonstrate some natural useful identities and orthogonality properties of endomorphisms therein. In §4, we prove results (ii) and (iii) stated above. Finally in §5, we discuss the implications of our results and provide some concluding remarks.

## 2. Stochastic expansions and the convolution shuffle algebra

We introduce the convolution shuffle algebra and show that it is the natural context to study stochastic expansions. For the moment, we proceed formally as the basic material can be found in the monograph by Reutenauer (1993); also see remark 2.1 below. Dropping *J*'s and *V*'s, we can represent the stochastic Taylor series for the flowmap by
which lies in the product algebra over the commutative ring . We are interested in the following two Hopf algebra structures on . One has shuffle as product and deconcatenation Δ as coproduct ( on the left). The other has concatenation as product and deshuffle as coproduct ( on the right). The unit, counit and antipode are the same for both algebras and . We denote the empty word . The product of two terms in is
where we concatenate the words on the right representing the composition of vector fields. On the left, represents the sum of all possible shuffles of the words *u* and *v*, representing the product of two multiple integrals.

### Remark 2.1

The homomorphism from to the free associative algebra of vector fields is established as follows (see Malham & Wiese 2009, §2*c*). Let denote the ring generated by multiple Stratonovich integrals and the constant random variable 1, with pointwise multiplication and addition. Also, let denote the set of all vector fields on . The flow map *φ* defined in §1 lies in , where is the subset of vector fields *V* _{w} with *w* of length *n*. The linear *word-to-vector field map* given by *κ*: *w*↦*V* _{w} is a concatenation homomorphism, i.e. *κ*(*uv*)=*κ*(*u*)*κ*(*v*) for any . And the linear *word-to-integral map* given by *μ*: *w*↦*J*_{w} is a shuffle homomorphism, i.e. for any (see Lyons *et al.* 2007, p. 35 or Reutenauer 1993, p. 56; this also underlies our choice to use Stratonovich integrals rather than Itô integrals that satisfy a quasi-shuffle relation). Hence, the map is a Hopf algebra homomorphism that naturally extends to the free associative algebra of vector fields.

Suppose we apply a polynomial or power series function to the flow-map *φ*, say *f*=*f*(*φ*). For example, suppose *f* has a simple power series expansion
with coefficients . Then the product in implies that after rearrangement, we can always express the result in the form
where is given by (|*w*| denotes the length of the word)
See Reutenauer (1993, p. 58) or Malham & Wiese (2009) for more details. This suggests that we can encode the action of any such function *f* of the flowmap *φ* by an endomorphism . Indeed, the embedding given by
is an algebra homomorphism for the non-commutative convolution product defined for all by
Since the coproduct Δ here is deconcatenation, which takes a word *w* and produces a sum of all possible two-partitions *u*⊗*v* of *w*, we see that explicitly
Thus, we can encode and study the structure and properties of functions of the flowmap through endomorphisms , with convolution as product. For convenience, we henceforth denote
the convolution shuffle algebra of endomorphisms on , which is a unital associative non-commutative -algebra. The unit *ν* in is the composition of the unit and counit. Indeed, *ν* sends non-empty words to 0 and the empty word to itself. For any Hopf algebra, by definition, the antipode *S* is the inverse of the identity endomorphism with respect to the convolution. Thus, we have
On , the antipode is given by *S*(*a*_{1}…*a*_{n}):=(−1)^{n}*a*_{n}…*a*_{1}, i.e. it is the sign-reversing endomorphism.

### Remark 2.2

There is a dual-convolution concatenation algebra which we could alternatively use (see Reutenauer 1993, §1.5).

### Remark 2.3

There is natural compatibility between convolution and composition in . For an algebra homomorphism , one verifies that *Z*○(*X*☆*Y*)=(*Z*○ *X*)☆(*Z*○*Y*). For a coalgebra homomorphism , we have (*X*☆*Y*)○*Z*=(*X*○*Z*)☆(*Y* ○*Z*).

As the shuffle product is commutative, one can show that if are algebra homomorphisms, then *X*☆*Y* is also an algebra homomorphism. The subset of algebra homomorphisms forms the group of -valued characters, with unit *ν*. The inverse of is given by *X*^{☆(−1)}:=*X*○*S*. Note is a subgroup of the Lie group
The corresponding Lie algebra of infinitesimal characters is a Lie subalgebra of the Lie algebra
of . The inverse in is given by . Note that we denote by *X*^{☆k} the *k*-factor convolution product *X*☆*X*☆⋯☆*X* for any . Observe that, for , if *w* has length less than *k* then *X*^{☆k}○*w* returns 0. Hence, the formal sum for *X*^{☆(−1)} makes sense and indeed, note that for . In fact, we observe that . See Patras (1994), Patras & Reutenauer (2002) and Manchon (2008) for more details.

An important endomorphism is the *augmented ideal projector* given by
Thus, *J* sends non-empty words to themselves, but the empty word to 0, i.e. . Note for example, *J*^{☆k} takes a word *w* with |*w*|≥*k* and creates a sum of all possible *k*-partitions of *w* shuffled together (with the empty word an excluded partition). On the other hand, id^{☆k} also includes all partitions involving the empty word. Now observe that the endomorphism *F* corresponding to the function *f* of the flowmap above, with the power series coefficients {*c*_{k}}, is the endomorphism defined by the series in :
More generally, we can consider functions on . For example, for , with *X*(**1**)=*ϵ***1** and , we can construct an endomorphism through a series expansion about a multiple *ϵ* of the unit *ν* as follows:
where . Of particular interest are the bijective logarithm and exponential maps defined for any by
The sinhlog and coshlog maps defined for by
also have series representations in powers of (*X*−*ν*). These maps and their compositional inverses underlie our main result. For all , set *h*^{☆} to be
for which the square root exists. Then, we have the compositional inverses
i.e. we have sinhlog^{−1}○sinhlog^{☆}(*X*)=*X* and coshlog^{−1}○coshlog^{☆}(*X*)=*X*.

To illustrate this new perspective and its natural connection to stochastic expansions, consider three examples. First, we observe that the stochastic Taylor expansion for the flowmap is simply the identity . Second, the sinhlog function considered by Malham & Wiese (2009) is given by
since *S*=id^{☆(−1)}. In other words, applying the sinhlog function to the flowmap *φ* in corresponds to applying sinhlog^{☆} to the identity . Third, the Eulerian idempotent, which is a Lie idempotent from the free associative algebra to the free Lie algebra, is given by
This is the exponential Lie series or Chen–Strichartz formula (see Magnus 1954; Chen 1957; Strichartz 1987; Baudoin 2004; Burgunder 2008). To see this, we use that *J*^{☆k} can be expressed as a sum over permutations with a prescribed descent set; see Reutenauer (1993, p. 65). Note also that as the antipode is the inverse of the identity with respect to the convolution product, then
Hence, we can also express the sinhlog endomorphism as
These three examples belong to the subalgebra of endomorphisms generated by the unit *ν* and augmented ideal projector *J*.

### Remark 2.4

Note that the sinhlog and coshlog endomorphisms are projectors as and .

We conclude this section by defining some endomorphisms and their properties useful in our subsequent analysis.

### Definition 2.5 (reversing and sign endomorphisms)

These endomorphisms in are defined for any word as follows: (i) reversing endomorphism: |*S*|: *w*↦*a*_{n}…*a*_{1}; and (ii) sign endomorphism: *D*: *w*↦(−1)^{n}*w*.

Note, for example, that *D*○*D*≡id and *S*=*D*○|*S*|=|*S*|○*D*. Observe also that since we have .

## 3. Convolution shuffle algebra with expectation inner product

We have seen that classes of endomorphisms correspond to functions of the flowmap. Our goal here is to define an appropriate inner product on and analyse its properties. This will be modelled on the mean-square measure of an -valued stochastic process, constructed as follows (hereafter, the real parameter *t*>0 is fixed).

### (a) Expectation endomorphism

Let denote the free monoid of words on the alphabet .

### Definition 3.1 (expectation map and endomorphism)

The *expectation map* is the linear map which, in the case indexes *d* independent Wiener processes, is given by
Here d(*w*) is the number of non-zero consecutive pairs in *w* from the alphabet and *n*(*w*)=*z*(*w*)+d(*w*), where *z*(*w*) counts the number of ‘0’ letters *w* contains.

We define the *expectation endomorphism* as

Note that the endomorphisms *X*−*E*○*X*≡(id−*E*)○*X* in have zero expectation. Indeed, (id−*E*) lies in the kernel of *E* as *E*○*E*=*E*.

### Remark 3.2

Strictly, the expectation map , where *t* is a parameter commuting with all of , and (id−*E*)○*X* takes values in .

### Remark 3.3

The values quoted for the expectation map for any are the expectations of the corresponding multiple Stratonovich integrals, see Kloeden & Platen (1999, eqns (5.2.34) and (5.7.1)) or Buckwar *et al.* (2012). Briefly, every Stratonovich integral labelled by is a linear combination of Itô integrals. The Itô integrals concerned, cycle through the set of words which consist of *w* and all words *u* obtained by successively replacing any two adjacent non-zero equal indices in *w* by 0. Each replacement contributes a factor one-half to the coefficient of the Itô integral in the linear combination. Since the expectation of any Itô integral is zero unless all its labelling letters are 0, the expectation of the Stratonovich integral is given by the deterministic integral remaining after the replacement process (and zero if there is none).

### (b) Inner product of endomorphisms

Let denote a given set of indeterminate vectors indexed by words . We use both (*u*,*v*) and V_{u,v} to denote the inner product of the vectors V_{u} and V_{v}. Let V denote the infinite square matrix indexed by the words with entries V_{u,v}.

### Definition 3.4 (inner product)

We define the *inner product* of with respect to V to be
The norm of an endomorphism is .

Let us motivate this definition and provide some equivalent characterizations. Suppose we apply two separate functions to the flowmap *φ* which are characterized by the endomorphisms *X* and *Y* in . We assume the governing vector fields and driving Wiener processes are given as well as some data . With a slight abuse of notation, we express the two stochastic processes *x*_{t} and *y*_{t} associated with *X* and *Y* as follows
Our definition is based on the *L*^{2}-inner product . We would like our inner product to be independent of the data *y*_{0}, hence we replace the vector fields evaluated at the data by the set of indeterminants .

An equivalent characterization of the inner product is as follows. The action of an endomorphism *X* on a word can be written , for some -valued coefficients X_{w,u}. In other words, we can represent by a matrix indexed by the words . We can order the indexing by word length and then lexicographically within each length (for example). Now, let denote the symmetric matrix of values over all and the symmetric matrix defined above. Then using our definition above, we observe that
where † denotes matrix transpose. Note that both W and V are positive definite. Hence, we can view the definition of the inner product above as defined with respect to the weights W and V, where the stochastic and geometric information are, respectively, encoded. All results we subsequently establish will hold independent of V. Finally, we now note that our inner product is: (i) symmetric as the shuffle product and vector inner product are commutative; (ii) bilinear as the endomorphisms and expectation are linear; and (iii) positive definite as the matrix of expectations W is positive definite.

### Remark 3.5

We assume that the solution to our stochastic differential system for any data *y*_{0} is *L*^{2}-valued on the time interval of interest. This is equivalent to saying that is finite, which is equivalent to assuming tr (WV) is finite.

### (c) Graded class subspace

We often will be concerned with endomorphisms that act on subspaces of selected according to a grading. Recall that is a connected Hopf algebra graded by the length of words.

### Definition 3.6 (grading map)

This is the linear map given by
The empty word has length zero, i.e. |**1**|=0.

### Remark 3.7

There is another natural grading on given by the variance of the words , as mentioned in §1. This is determined by the exponent in *t* when computing the root-mean-square deviation from the expectation of *w*, i.e. the square-root of . Briefly, for any word *w*, zero letters contribute a count of one, while non-zero letters contribute a count of one-half towards the grade value. The nuances of the two gradings are revealed in §5.

### Definition 3.8 (graded class subspace)

For a given , let denote the subspace of of all words *w* of given length g(*w*)=*n*. We set
A subspace is a *graded class subspace* if, for a given , , or . For any graded class subspace , we denote by
the canonical projection from onto .

Hereafter, we set, for any graded class subspace , for all : We carefully state the subspace in all instances, so no confusion should arise.

### (d) Properties of the sinhlog and coshlog endomorphisms

The following lemma from Malham & Wiese (2009) is a crucial ingredient in what follows. We restate it here and discuss an extension we rely upon for clarity.

### Lemma 3.9 (Malham & Wiese 2009, lemma 4.3)

*For any pair* *we have* .

Though the context for this lemma was the alphabet , it in fact extends to . The proof detailed in Malham & Wiese relies on two results. First, any Stratonovich integral can be expressed as a linear combination of Itô integrals, as described in remark 3.3 (also see Kloeden & Platen 1999; eqn (5.2.34)). Importantly, reversing the word associated with a Stratonovich integral generates a mirror linear combination of Itô integrals with their respective words reversed (with the same coefficients). Second, the expectation of the product of any two Itô integrals depends only on the number of non-zero letters and the lengths of subwords containing only 0 letters (see Kloeden & Platen (1999): lemma 5.7.2). These characteristic quantities are invariant to reversing the words concerned. Both of these results from Kloeden & Platen are stated for .

### Remark 3.10

Importantly, note that if *u* and *v* have the same length, then we can replace |*S*| by *S* as the result is then invariant to the sign in the antipode *S*. This observation underlies the restriction to when in lemma 3.11 below.

We now state the main lemma of this section, outlining properties of the principal endomorphisms we have thus far highlighted, in particular, the endomorphisms The results herein are used to establish the optimal efficiency properties of the sinhlog integrator in §4.

### Lemma 3.11

*We assume either*, *and* *is the graded class subspace* *or*, *and* *is any graded class subspace. Then for any* *(and for any V), we have the following properties*

〈

*X*,*Y*〉=〈|*S*|○*X*,|*S*|○*Y*〉;〈|

*S*|,|*S*|〉=〈*S*,*S*〉=〈id,id〉;〈sinhlog

^{☆}(id),coshlog^{☆}(id)〉=0;∥id∥

^{2}=∥sinhlog^{☆}(id)∥^{2}+∥coshlog^{☆}(id)∥^{2};〈

*X*,*E*○*Y*〉=〈*E*○*X*,*E*○*Y*〉;〈

*E*○id,*E*○id〉=〈*E*○|*S*|,*E*○|*S*|〉=〈*E*○*S*,*E*○*S*〉;〈

*E*○sinhlog^{☆}(id),*E*○coshlog^{☆}(id)〉=0;〈|

*S*|,*J*^{☆n}〉=〈id,*J*^{☆n}〉,

*where property 8 only holds for* *independent of the alphabet. Property 3 indicates that sinhlog*^{☆}(id) *and coshlog*^{☆}(id) *are orthogonal with respect to the inner product*.

### Proof.

We establish properties 1–8 in order. Using the linearity properties of the expectation map and reversing endomorphism |*S*|, we observe that
and property 1 follows. Property 2 is a special case of property 1. Using property 2 and lemma 3.9, we deduce that 4〈sinhlog^{☆}(id),coshlog^{☆}(id)〉=〈id,id〉−〈*S*,*S*〉=0, thus establishing property 3. Property 4 follows directly when we observe that sinhlog and coshlog additively decompose the identity, i.e. id=sinhlog^{☆}(id)+coshlog^{☆}(id). Properties 5–7 now follow from the properties of the expectation endomorphism. Property 8 follows directly from the commutativity of the shuffle product.

## 4. Sinhlog and the optimal efficient integrator

Recall our construction of a stochastic integrator for a given formal power series that we outlined in the steps in §1. In terms of the convolution shuffle algebra and corresponding endomorphism , those steps and concepts therein, have natural concise translations. Proceeding through the construction, by direct analogy with , a general stochastic integrator has the form The error associated with this approximation is One integrator will be more accurate than another if the -norm of its associated error is smaller than the other.

### Remark 4.1

In light of our comments in §1 that the set of Lyndon word multiple integrals included in an integrator determine its order, we will restrict ourselves to functions that are grade-preserving. All the functions herein such as the logarithm, exponential, sinhlog, coshlog or any power series in *J* are grade-preserving (their action on words generates linear combinations of words of the same length).

### Remark 4.2

Also as mentioned in §1, we need to include the expectations of the terms in the remainder in in the integrator. These are in general encoded by , so the effective *leading order terms* in the remainder are . We highlight this at the appropriate junctures.

### Definition 4.3 (pre-remainder)

We define the *pre-remainder* *Q* to be the remainder terms after applying the endomorphism *f*^{☆}(id) and then truncating to include the terms in i.e. it is

The relationship between *R* and *Q* plays a key role in our subsequent analysis.

We now examine the properties of integrators based on the sinhlog endomorphism, indeed of a whole class of endomorphisms to which it belongs, as well as the coshlog endomorphism. First, we focus on the sinhlog endomorphism and re-establish that it is an efficient integrator, in the context of the convolution shuffle algebra. As we do so, we will see natural questions that emerge, motivating us to dig deeper.

### Definition 4.4 (efficient integrator)

A numerical approximation to the solution of a stochastic differential equation is an *efficient integrator* if it generates a strong numerical integration scheme that is more accurate in the mean square sense than the corresponding stochastic Taylor integration scheme of the same strong order (recall we grade according to word length), independent of the governing vector fields and to all orders. In other words, if *R* denotes the remainder of an integrator, it is efficient if for for any *n*, and for any V:

The sinhlog endomorphism of interest has the expansion
To construct a sinhlog integrator of strong order *n*/2, we start by applying the projection operator to sinhlog^{☆}(id); call the result
Since *J*^{☆(n+1)} is zero for any words *w* with |*w*|<*n*+1 and each of the terms in the expansion above is grade-preserving, we see that
The pre-remainder is thus given by
The compositional inverse of the sinhlog endomorphism is given by sinhlog^{−1}○*X*=*X*+*h*^{☆}(*X*,+*ν*) where *h*^{☆}(*X*,+*ν*):=(*ν*+*X*^{☆2})^{☆(1/2)} has the expansion
By definition, the error *R* in the approximation is given by
The leading order term in *R* is
We justify this as follows. Consider the term *P*☆*Q*. Since at leading order sinhlog^{☆}(id)=*J*, we see at leading order
This means that for some coefficients ,
This follows from the fact that the two indicator functions annihilate any lower order terms in the two-part deconcatenation implied by the convolution, and that *J* annihilates the empty word. We also only retain the leading order terms in *Q* by applying as shown in the final step above.

### Remark 4.5

We shall use the big notation such as or above to denote endomorphisms that only generate terms involving words that are of higher order with respect to the grading *g* than those generated by the preceding endomorphisms.

Thus, using lemma 3.11(3) with , we have
Now using id=sinhlog^{☆}(id)+coshlog^{☆}(id) and lemma 3.11(5,7), we deduce that
Recall from remark 4.2 that we need to include in our integrators the expectation of the leading order terms in the remainder. This means that the remainders for the stochastic Taylor expansion and sinhlog integrators are and , respectively, as at leading order . We have thus established that integrators based on the sinhlog endomorphism are at least as accurate as corresponding stochastic Taylor integrators at leading order, i.e. they are efficient.

We can prove the following extension of corollary 4.2 in Malham & Wiese (2009) to the alphabet ; indeed, assume this to be the alphabet hereafter.

### Lemma 4.6

*With* *if n is odd, then the error of the sinhlog integrator is optimal; it realizes its smallest possible value when compared with the error of the corresponding stochastic Taylor integrator*.

### Proof.

To prove this, order by order we perturb the sinhlog integrator as follows. For a fixed , perturb the coefficient of *J*^{☆(n+1)} in the sinhlog expansion so that
where is a parameter. Then repeating the procedure above, we have
Since the leading order behaviour of the inverse of sinhlog^{☆} is unaffected, we have at leading order *R*_{ϵ}=*Q*_{ϵ}. As above, truncate *Q*_{ϵ} with . Then, with we see
wherein the penultimate step we used the orthogonality property of sinhlog and coshlog. When we include the expectations of the leading order terms in the remainder (cf. remark 4.2), this relation becomes
The linear term in *ϵ* is zero if *n* is odd, by lemma 3.11(8). Hence, in this case, the mean-square excess, the terms on the right other than ∥*Q*_{ϵ}∥^{2}, is optimized when *ϵ*=0.

### Remark 4.7

This result can be found in Malham & Wiese (2009) for . It extends to using lemma 3.11, though under the proviso that we grade according to word length. This means that we endeavour to include some multiple integrals in our stochastic integrator involving the drift ‘0’ index that we would not ordinarily include if we were grading according to variance (of the multiple Wiener integrals). However, we take the perspective here that there are far fewer of these terms at each grading according to length, and the computational effort associated with their simulation is a small fraction of that overall. If we include them, we guarantee efficiency. See §5 for an example. Of course if only, then the two notions of grading coincide and this technicality is redundant.

Some natural questions now arise. It is apparent that the first key result in the argument above to prove efficiency was lemma 3.11(4) showing that the norm-square of the identity endomorphism decomposes into the sum of the norm-squares of the sinhlog and coshlog endomorphisms. The second key result, to prove that the sinhlog integrator was optimal when *n* is odd, was lemma 3.11(8). First, with regard to efficiency, since there is no immediate apparent difference, it would seem that we could choose coshlog as our integrator endomorphism. Further, the result of lemma 3.11(4) essentially relied on the fact that 〈*S*,*S*〉=〈id,id〉. However, we also know from lemma 3.11(2) that 〈|*S*|,|*S*|〉=〈id,id〉. Thus, in principle, we could also consider as an integrator (or indeed, any pair for which 〈*X*,*X*〉=〈id,id〉). Note that is the equivalent of applying the sinhlog endomorphism at even orders and the coshlog endomorphism at odd orders. Second, with regard to optimal efficiency, it would also seem that would deliver optimality at both even and odd orders as the linear term in *ϵ* above would be zero. However, this is not the case. The key is the relation between the error *R* and the pre-remainder *Q*.

We define the following class of endomorphisms, set
for any . Then, we see *f*^{☆}(id,+1) is the sinhlog endomorphism and *f*^{☆}(id,−1) is the coshlog endomorphism. Note that the compositional inverse of *f*^{☆}(*X*;*ϵ*) is
where *h*^{☆}=*h*^{☆}(*X*,*Y*) is the convolutional square root of *X*^{☆2}+*Y* given in §2. Our main result is as follows.

### Theorem 4.8

*For every ϵ>0, the class of integrators f*^{☆}(id;ϵ) *is efficient. When ϵ=1, the error of the integrator f*^{☆}(id;1) *realizes its smallest possible value compared with the error of the corresponding stochastic Taylor integrator, i.e. if R*_{ϵ} *denotes the remainder of the integrator, we have that the mean-square excess*
*is positive and maximized at ϵ=1. Thus, a strong stochastic integrator based on the sinhlog endomorphism is optimally efficient within this class.*

### Proof.

We prove the theorem in four steps. Note that we have for any :
The integrator of interest of strong order *n*/2 is and given by
The pre-remainder is . Further, the approximation results in an error given by
Our goal now is to carefully examine this relationship between *R* and *Q*, and in particular, the difference shown on the right-hand side.

*Step* 1. A thorough understanding of the function *h*^{☆} is thus required, and we motivate our analysis by considering the corresponding real-valued function given by
The function *h*=*h*(*x*,*y*) is real-analytic for *y*>−*x*^{2}, zero on *y*=−*x*^{2}, where its tangent plane is orthogonal to the (*x*,*y*)-plane, and complex-valued elsewhere. Hence, we can choose a point inside the open region *y*>−*x*^{2} about which the Taylor series for *h*=*h*(*x*,*y*) has a non-zero radius of convergence. In particular, for *y*>−*x*^{2}, the difference *h*(*x*+*q*,*y*)−*h*(*x*,*y*) at leading order in *q* is given by

*Step* 2. By analogy with step 1, we see in the convolution algebra that we can expand
Further we note that at leading order
and thus for all *ϵ*>−1, we have
Substituting these expressions into that for *R* above, we get at leading order

*Step* 3. Let us now compare the mean-square error of *P* with the corresponding stochastic Taylor integrator . We see that on we have
If we include the expectations of the leading order terms in the integrator (again cf. remark 4.2), then we have
The comparison above becomes

*Step* 4. We see that the mean-square excess is
Note for any , we have |〈*X*,*Y* 〉|≤∥*X*∥∥*Y* ∥. Using lemma 3.11, we see that
Hence, we observe that 〈(id−*E*)○id,(id−*E*)○|*S*|〉≤∥(id−*E*)○id∥^{2}, and thus
Note that all our statements above hold for any V. Hence, for all *ϵ*>0, the mean-square excess is positive and thus *P* is an efficient integrator, thus establishing the first statement of the theorem. When *ϵ*=0, the mean-square excess is zero as expected. For *ϵ*<0, it is negative. Further, we see that for *ϵ*>0, the mean-square excess is largest when *ϵ*=1, corresponding to the sinhlog endomorphism. We have thus established the second statement of the theorem and the proof is complete.

Note when *ϵ*=−1, the mean-square excess in step 4 of the proof above is undefined. We can explain this as follows. The first equation in step 2 becomes
However, now the leading order behaviour in *P* is . If we denote , then we see that
Proceeding formally, we substitute these last two expansions into the expression for *R* above. Retaining leading order terms, we find
In other words, the term in the error *R* corresponding to the difference of *h*^{☆} evaluated at *P*+*Q* and *P* generates the term . The inverse of *J* is not an element of the group . It does have the formal expansion *J*^{☆(−1)}=*S*+*S*^{☆2}+*S*^{☆3}+⋯ , but this does not have a finite evaluation on the empty word. Hence on , the term is not finite. However, it is finite on —the inverse contracts the number of deconcatenations—but this now introduces terms in the remainder of the same order as those we retain in the integrator, i.e. we have a form of order reduction.

## 5. Concluding remarks

We established that the sinhlog integrator is optimally efficient when we grade according to word length. For example, the sinhlog integrator specified by g(*w*)≤2 on the computation interval [*t*_{n},*t*_{n+1}] is given by
where
Whatever the vector fields are, this is guaranteed to be more accurate than
which is the corresponding integrator based on the stochastic Taylor expansion according to the grading g(*w*)≤2. (A thorough numerical investigation confirming this conclusion in the diffusion-only case is presented in Malham & Wiese (2009). Further, Lord *et al.* (2008) demonstrate that high accuracy, higher order stochastic integrators can deliver greater accuracy for a given computational effort than the Euler–Maruyama scheme, despite the cost associated with simulating the multiple Wiener integrals included.) First, we note that at this order, is also the form of the exponential Lie series for g(*w*)≤2 (this is not true at any higher orders), which is not an efficient integrator. However, at this order, the two integrators are distinguished when we compute their inverse endomorphisms to generate the corresponding approximations. Second, the inverse of the sinhlog endomorphism is sinhlog^{−1}(*σ*)=*σ*+(id+*σ*^{2})^{1/2}. If the vector fields are linear, is a matrix and we can construct sinhlog^{−1}(*σ*) by computing the matrix square-root. If the vector fields are nonlinear, we can expand the square root to sufficiently high degree terms (see Malham & Wiese 2009, remark 5.3). Third, the stochastic Taylor-based scheme above is a modification of the Milstein method, where we additionally include terms involving *J*_{i0}, *J*_{0i} and *J*_{00} for *i*=1,…,*d*.

We considered here a comparison under the mean-square measure of the local errors. How the local errors accumulate to contribute to the global error was considered in the study of Malham & Wiese (2009), where it was shown that the local error comparison transfers to the global error.

If we know further structure in the stochastic differential system concerned, then the class of endomorphisms that are efficient will widen. For example, if the diffusion vector fields commute with themselves, but not with the drift vector field, then we can deduce that the exponential Lie series is an efficient integrator, again, under the proviso we grade according to word length. This is because in the remainder at leading order, the largest terms according to root mean square scaling with respect to stepsize, will involve words with non-zero letters. However, these will not in fact be present as the diffusion vector fields commute (further see Malham & Wiese (2008)).

Our results have only relied on the symmetry properties of the expectation map (see lemma 3.11) and that it realizes positive values on words with a scaling according to word length. In particular, they do not depend on the coefficients explicitly. Hence, the sinhlog integrator is optimally efficient within the whole class of possible coefficients. In a slightly different direction, our result also holds for weak approximations of the multiple integrals, as long as the *L*^{2}-norm of (sums of) integrals is preserved. We assume here that the error is still being measured in the mean-square sense.

Though, in general, the Eulerian idempotent is not efficient, it is a natural object in the construction of geometric integrators. Recently, the Dynkin and Klyachko Lie algebra idempotents have proved to be important in the context of geometric integration (see Patras & Reutenauer 2002; Munthe-Kaas & Wright 2008; Chapoton 2009; Lundervold & Munthe-Kaas 2011*a*,*b*). Interesting questions here besides the extension of their use to stochastic differential equations is which of these projectors generates a more accurate and efficient geometric integrator than the other?

## Acknowledgements

All the authors would like to thank the London Mathematical Society with Scheme 4, as well as the Edinburgh Mathematical Society, for support for a visit by K.E.-F., A.L. and H.M.K. to Heriot–Watt in December 2009 when a large part of the research herein was initiated. We also thank ICMAT and CSIC for financial support during the research trimester Combinatorics and Control in Madrid in 2010. During this research trimester, A.L. received financial support from the NILS Mobility Project, UCM-EEA Abel Extraordinary Chair 2010. We thank Steven Gray, Luis Duffaut Espinosa and Domonique Manchon for lively and useful conversations during the trimester. A.W. and S.J.A.M. would like to thank the Institute of Mathematics and its Applications and the Centre for Numerical Analysis and Intelligent Software, respectively, for financial support. S.J.A.M. and A.W. thank Charles Curry for useful conversations. Finally, we are also grateful to the anonymous referees whose comments and suggestions helped to significantly improve the original paper.

- Received January 16, 2012.
- Accepted March 13, 2012.

- This journal is © 2012 The Royal Society