We present an analytical technique for solving Fokker–Planck equations that have a steady-state solution by representing the solution as an infinite product rather than, as usual, an infinite sum. This method has many advantages: automatically ensuring positivity of the resulting approximation, and by design exactly matching both the short- and long-term behaviour. The efficacy of the technique is demonstrated via comparisons with computations of typical examples.
In this paper, we propose a novel analytical procedure for solving diffusion equations of the form 1.1 where the steady-state solution is well defined, i.e. a normalizable probability density (we call this the ‘stable’ case):
Generally, this partial differential equation (PDE) describes diffusion in the presence of a potential. For example, if it governs the concentration of diffusing charged ions in an electrostatic potential well it is generally referred to as the Nernst–Planck equation . If the potential is not quadratic then one has (1.1) with A nonlinear and the equation is analytically intractable; techniques are therefore required to deal with it so that one does not have to fall back on full numerical solutions of the PDE.
More often, it arises as the forward (Fokker–Planck, FP) equation associated with the mean-reverting diffusion process 1.2 in dimensionless form.
It is perhaps worth clarifying what we mean by ‘nonlinear diffusion’ as the term is ambiguous. A diffusion process such as (1.2) is said to be nonlinear if the drift term is nonlinear or the volatility term is non-constant; hence whenever A is nonlinear we have a nonlinear diffusion process. But the FP equation (1.1) that arises from it is always a linear diffusion equation (linear in f). That does not mean, however, that it is easy to solve. What is unusual about this paper is that we will be modifying (1.1) through a nonlinear change of the dependent variable f so that the diffusion equation does become nonlinear, and our thesis is that this nonlinear PDE is actually easier to handle than its linear counterpart.
In general, (1.2) is an important model of physical phenomena such as electronic noise and kinetics , electronic circuits with nonlinear resistance , and systems with overdamped Langevin dynamics  or from nonlinear friction [5,6]. Another example of application has been a model of variability of chemical concentration in ice-core data as a proxy of climate variability , where it is necessary to model large excursions. If A(y)=−y, we are left with the familiar Ornstein–Uhlenbeck (OU) process , or Langevin equation , but the nonlinear examples have A(y)≠−y and these require attention.
An analytical approximation to a PDE has considerable practical value aside from providing an immediate insight into the form of the PDE's solution as there is a considerable improvement in computation speed over the standard numerical grid-based methods. This is particularly true in the context of statistical signal processing techniques such as Markov chain Monte Carlo , where one wishes to sample from the distribution of Yt2 given Yt1, for t2>t1; knowing the density, one can sample using the rejection method . Then an approximate solution that captures the short- and long-term behaviour is considerably more practical than having to regenerate the solution by numerically solving the PDE at each step. The time saved in that calculation can then be spent on running more Monte Carlo simulations. In fact, we suggest that even our leading-order term (3.23) is sufficiently accurate for the vast majority of practical applications, whatever the discipline.
Although there are physics applications of (1.2), as we have said, in fact it was mathematical finance that provided the main impetus for this research. The process (1.2) is a model of mean reversion and in the case when A(y)=−y it is the OU process which in dimensional coordinates is 1.3 But this has an important disadvantage, in that its steady-state distribution is Normal, so that large excursions are very unlikely in the model. Despite the fact that this has been known for years, the Normal distribution is still used in risk management in areas where it should not be. For example, investigation of JP Morgan Chase's ‘London whale’ trading losses, estimated at at least $5bn, shows that the Normal distribution function was used to transform a number of standard deviations into a loss probability even at high levels of confidence.1 Before that, the demise of Long Term Capital Management (LTCM) in 1998 can be directly attributed to over-leveraged ‘convergence trades’ . In reality, the following two mechanisms occur when the market is far from equilibrium, and these cause such excursions to occur much more often than in the simple OU model.
The first mechanism is that the volatility is likely to be higher. This is seen in the so-called basis risks, in which the price difference between two closely related financial instruments should be zero: in times of market stress, when the distance from equilibrium is large, market liquidity is lower and the volatility higher, so large excursions become likely. One recent example is the so-called credit default swap (CDS) index basis, which is the difference between the index level of a CDS index contract and the level implied by its constituents, which should theoretically be nearly zero (see  for a general discussion on this). Another is the CDS-bond basis during the 2007–2009 financial crisis, where the credit spread of credit-risky cash bonds deviated violently from the level implied by the CDS market (e.g. , fig. 1, , fig. 1). The effect is clear in figure 1a,b, with pronounced departure from the mean during the 2007–2009 financial crisis. A convenient formulation increasing the volatility away from equilibrium is 1.4 Models such as this, in which the volatility depends deterministically on the spot price Xt and/or time, are called local volatility models (as distinct from stochastic volatility models; e.g. ). In fact (1.4) is precisely the same recipe as suggested in the aforementioned climate change paper . It is also encountered as one of the Pearson–Wong diffusions [18,19] but has received less attention than the more commonly encountered OU and square-root processes also in that class, because it is less analytically tractable.
The second mechanism is that the reversion speed may decline. This is seen in examples where there is no strict requirement that prices must converge, and arises if the market volume directed towards a convergence bet declines, even if the total liquidity remains the same. An example is figure 1c, which shows the deviation between interest swap rates of three different tenors; this is the so-called ‘butterfly’ trade (e.g. ). We calculate the 5Y rate minus the weighted average, weights 0.3 and 0.7, respectively, of the 2Y and 10Y rates, after which the 10-year exponentially weighted moving average is subtracted so as to take out the long-term average; not the length of the excursion from 2009 (post-crisis) onwards. This second mechanism is captured by the following models: 1.5 and 1.6 and also by (1.4) after transformation by , , giving 1.7 The same effects can also easily be seen in simulation (figure 2a–d).
We have distinguished two mechanisms, but although the distinction is important for motivating an underlying model, it is unimportant to the issue of solving the forward equation. Indeed, as we just did with (1.4)(1.7), we can transform the process to arrange for the volatility to be constant and thereby work with the canonical form (1.2). Examples (1.5) and (1.6) simply require a trivial rescaling ; also τ=κt. In general, the transformation Y =η(X) with does this, provided σX is non-zero. The canonical forms are therefore (where we have non-dimensionalized (1.5) and (1.6) to use rather than X) and the corresponding invariant densities are (B and Kν denoting as usual the Beta function and the modified Bessel function of the second kind; see ) and throughout so that the non-dimensional parameter or ν measures the deviation from the OU model (, ).
In each case, the density is fatter tailed than Gaussian. In (1.6), the density is fatter tailed than in (1.5) because the reversion disappears completely as the diffusion process moves far from equilibrium, and so it is left to wander around aimlessly.
There is a reasonable amount of similarity, at least visually, between figures 2b and 1a,b. We therefore tested the model (1.4), using the standard likelihood-ratio test, with the null hypothesis against the alternative . The method yields a maximum-likelihood estimate for the parameter in question and an estimate of the relative likelihood of H0 versus H1 given the data (the p-value). For figure 1a, we found ν=3.8 with p-value 1×10−8, and for figure 1b we found ν=3.2 with p-value 1×10−32. The rejection of the basic OU model is unsurprising, given the huge excursions from equilibrium in both datasets. Similarly, figure 2d bears a resemblance to figure 1c. Testing (1.6) in the same way on figure 1c, we found ν=4.4 with p-value 5×10−6. This suggests that (1.6) is preferable to the simple OU model. (For the purposes of risk management, it is clearly prudent to use any of (1.5)–(1.7) in place of the basic OU, even if there is no firm statistical evidence.)
We round off this section by returning to physics, with an example that is considerably further from the OU process, qualitatively, at least in the sense of not being in a one-parameter deformation of the OU model. This is the double-well potential, where the invariant density is bimodal. This has been studied by Caroli et al. , who used the Wentzel–Kramers–Brillouin (WKB) approximation to study the behaviour in different time regimes; here we shall demonstrate a method of solution valid on all time scales simultaneously. For the sake of concreteness, we confine ourselves to a particular example 1.8 where we require c1>1 (so that there are two wells), c2≠0 (for asymptotic stability). Figure 3 shows typical realizations of the corresponding Ito process, for σ=2, κ=2, c1=4, , τ=κt. In some realizations such as figure 3a the process hops from one well to the other (these are centred at ); but it can spend a long time in one well (figure 3b). In non-dimensional form, this is 1.9 in general, , with the polynomial factorizable as a product of strictly positive quadratics, is a form that will construct multi-modal examples of arbitrary complexity.
(b) Infinite products
The main thrust of this paper is that one benefits significantly from studying the logarithmic derivative, in the spatial direction, of the solution. An immediate advantage is that upon integration and re-exponentiation a positive solution must be obtained (there is still a time-dependent normalizing factor to obtain, but as we show later it is easy to ensure that this is positive).
Next, the logarithmic derivative is an algebraically simpler construction. Indeed, for the OU model one has 1.10 whereas (In the case of an arithmetic Brownian motion with no reversion and starting from the origin, this would simply read x/σ2t.) We wish to replicate this exactly when A is linear, and this solution is a starting point in our analysis. Note that the form of (1.10) and the stochastic representation of the OU process as a scaled time-transformed Wiener process  suggest a substitution q=e−2κt, and indeed we use this as an ansatz for mean-reverting processes in general.
The next issue is that we want a technique in which short- and long-time behaviour can be specified explicitly, and that these be replicated. This is because the former must be a Gaussian distribution as t→0, while the latter is the already-identified invariant density. Thus arrives the notion of an infinite product expansion, in the sense of writing the logarithmic derivative as the sum of a term that replicates the short-time behaviour, another that replicates the long-time behaviour, and a series that corrects the middle.
The infinite product is a significant departure from ‘standard’ techniques for solving parabolic PDEs; these have in common that they are all infinite sums. Three such are: spectral methods (Fourier transformation in the spatial coordinate ); orthogonal expansions (expand the spatial dependence as a time-weighted sum of eigenfunctions of the infinitesimal generator); and Laplace transformation in time. All suffer from the problem that it is difficult to represent the initial condition effectively. As a case in point, take the Mehler series expansion , in terms of Hermite polynomials Her(x), to the OU PDE: 1.11 For κt≪1, it is clear from the sum that the convergence is very slow, because the basis functions, which are oscillatory, are required as t→0 to sum to give a delta function; truncation therefore generates oscillatory artefacts. Figure 4 shows the situation for t=0; indeed, the truncated Mehler series (taking only terms 0≤r<N) can be approximated around the spike x=X0 by the sinc function This behaviour is not specific to the OU process, and the slowness of convergence and the concomitant Gibbs phenomenon are a consequence of the Riemann–Lebesgue lemma. Yet for an infinite product, it is easy to make the short-time solution zero at all x≠X0, and hence like a delta function. One simply arranges for one of the terms in the product to be zero for t=0, x≠X0, and then it does not matter what the rest of the terms are: so another can be responsible for modelling the long-term behaviour, and the others for the middle-time region—as we intimated above. Even if one does not bother to correct the ‘middle-time’ behaviour, the results are remarkably good: this is seen later in equation (3.23) and in §3e and accompanying figures.
Probabilistically, infinite products have the attractive interpretation that each term in the product is the Radon–Nikodym derivative of the successive partial products. In statistical physics, the entropy is an important concept, and an infinite product represents this much more effectively than does a sum; from the point of view of an approximation, it is clearly disastrous to have a negative probability density anywhere. Note also that under a change of ‘spatial coordinate’ X, to Y =η(X) say, a multiplicative factor of |η′(X)| enters, so multiplicative representations are preserved under change of variable (because the Jacobian is just another term in the product) in a way that additive ones are not. Accordingly, it is quite incorrect to view an infinite product as a minor variant on an infinite sum: rather, we argue, the infinite product is fundamentally different and also a clearer way of thinking about the problem. On the other hand, the introduction of a logarithm causes a quadratic nonlinearity to appear in the FP equation, making the algebra considerably more difficult. In fact, the series expansion of its solution generates a quadratic recurrence of self-convolutive type somewhat analogous to that discussed in .
By substituting or and working with g instead, we arrive at the adjoint FP equation. This is similar to (1.1): 2.1 and is also known as the backward equation (a term that we do not use here, because the equation arises as a forward equation: τ represents calendar time). This development is useful because we are going to ensure that, when we approximate gY (and hence gX also), its long-term asymptote is unity. By so doing, we ensure that the approximated f tends to the correct invariant density.
As motivated in the Introduction, we deal with the logarithmic derivative of gY rather than with gY itself. We therefore introduce a function hY, so that hY satisfies an equation analogous to Burgers': 2.2 Conventionally, the Hopf–Cole transformation  is used to convert the (nonlinear) Burgers equation into the (linear) diffusion equation, which is regarded as being more easily analysed. We are doing it backwards here, which may seem perverse: but note that this is also how the WKB approximation works , so the logarithmic transformation to a nonlinear equation has a well-established precedent.
The apparent similarity to the WKB expansion is worth discussing briefly. The similarity in form of the recursion is similar, as the logarithmic-derivative transformation also gives rise to what is, in essence, a Riccati equation. However, WKB is most naturally applied to problems in which the coefficient of the second derivative—here, the volatility (noise) term—is small. It gives rise to a singular expansion, because for zero noise the order of the differential equation reduces from two to one. That is not what we do: our approach is to take the OU solution for hY(τ,y) and add terms to correct it. Thus whereas WKB expands the solution around a deterministic system, our techniques expand it around a known stochastic one. If the noise is very small, WKB may be more effective, but we have not observed a general failure of our methods in that limit.
3. Infinite product expansion
(a) Solution of nonlinear partial differential equation by series expansion
We now seek a series expansion as a solution of (2.2). In the regular OU case, A(y)=−θy, we have Now for general A, we still have a diffusion starting from Y0, in which the distribution's variance initially grows with time as 2τ, so if we expand around τ=0, i.e. q=1, in powers of (1−q), we are led to consider a development where θ is an arbitrary constant, and will be set later. This can also be established by dominant balance in (2.2), as on the RHS the term predominates as τ→0. The basic OU solution is corrected by a series expansion thus: 3.1 with RN denoting the remainder. Note that N=0, which we call the leading-order approximation, means that no (br) terms enter. Many of the manipulations in the first part of this section serve to exchange terms of the form 1, , q, the point simply being that any two of these differ by o(1−q) in the vicinity of q=1.
The form of (3.1) is very important. Any truncation of the series must vanish as , which is what we need, because all terms contain a factor of q or . Note also that the truncated error is uniformly bounded in τ, because the expansion is in q(1−q)r and q lies in the compact interval [0,1]. Thus, for each N and each y the error in hY(τ,y) is bounded in . Such uniformity is obviously not shared by, for example, a series expansion in powers of τ, which is why we dismissed that idea out of hand.
The functions (br), and the remainder, are dependent on Y0, and should therefore be thought of as br(y | Y0): for reasons of conciseness we just write br(y) or just br, and when we write br′ it means that the y-dependence, not the Y0-dependence, is being differentiated.
We are shortly going to equate coefficients of powers of (1−q), and for that reason the term in (3.1) is unwelcome; we therefore exchange for q plus a Taylor series around q=1. Writing and using the binomial theorem, we find 3.2 Note also that and , which is needed later. Thus, the revised series is 3.3 with a modified remainder term (the term now contains all of RN, plus the r≥N part of the infinite sum in (3.2)). The next step is to compute the (br) by comparing coefficients in q(1−q)r. Note that the recurrence we are about to derive for the (br) will be obtained from (3.3), but for computational purposes we use (3.1), of course omitting the remainder term.
It is convenient to write b−1(y)≡θ(y−Y0) so that the first term can be absorbed into the summations, and also δ−1=0. We then substitute (3.3) into 3.4 and equate terms in q(1−q)r. (Note that terms of the form q2(1−q)r, which arise from the quadratic term in (3.4), have to be written as q(1−q)r−q(1−q)r+1.) This gives For r=−1, we deduce so 3.5 with the last part being discarded as it is singular at y=Y0. Note that b0 happens not to depend on Y0, but all the others generally will. Thus, for r≥0, 3.6 (The summation is void if r<1.) Accordingly, 3.7 When y=Y0, this is just 1/θ(r+2)×r.h.s.(3.6). The lower limit in the integral must be Y0 as otherwise br+1 becomes singular at y=Y0 in a similar manner to (3.5). Successive terms may be extracted recursively; and fortunately, just as with the WKB expansion, the (differential) equation for br+1 is first-order linear, even though it is nonlinear in the ‘known’ terms . Thus, the recurrence involves no more than a succession of integrals.
(b) The parameter θ
The parameter θ allows us to expand the FP equation at hand around that of any of a one-parameter family of OU equations in which θ governs the reversion speed. We want to know what is the best θ to use for any given A, and intuitively it is clear that one should use a θ that best approximates, in some sense, the (y-dependent) rate of mean reversion that A gives. In so doing, we will have matched not just the short- and long-time behaviour of the function h, but also the rate of transition from one regime to the other.
The bound states of (1.1) are the normalizable eigenfunctions of . In the OU case with Y0=0, it is plain from the Mehler expansion (1.11) that the eigenfunctions ψr(y) and associated eigenvalues λr are Importantly, up to normalization one has ψr+1=ψr′.
Now in the general case we know ψ0, which is the invariant density, and λ0=0, but we do not know any of the other eigenfunctions or eigenvalues. However, let us assume that the first eigenfunction is approximately ψ0′, and write for this approximated eigenfunction. Then Let us assume this to be roughly , i.e. λ1ψ0′. Then integrating once, we have A′ψ0≈λ1ψ0, and then integration from to gives 3.8 given that in the OU case λ1=−θ, we use 3.9 This has considerable intuitive appeal, as 〈−A′〉 is, in a sense, the average rate of mean reversion, and by construction is necessarily positive. Of the ‘OU deformations’, it is easily calculated in two of the four cases, but (1.5) requires an integral, to which we provide the first term in a Padé approximation: 3.10 In the double-well example, 3.11 with Φ the cumulative Normal distribution function.
The above discussion gives an unambiguous prescription for θ, but it should not be thought that the value is critical. For example, one can expand the OU model A(y)=−θ*y using any θ>0, thereby picking up a series of correction terms that arise from the inequality of θ and θ*. It is easily verified that the resulting series is convergent for |1−q|<1, i.e. all τ≥0.
(c) Initial results; the remainder term
For a numerical demonstration, we set , i.e. ν=5. Figure 5 shows the functions br(y) for r=0,1,2,… for each of the three models ((1.5)–(1.7)), and with Y0=0,−2 in each. (From (3.10) we have θ=0.727,0.50,1.04, respectively.)
Apparently, the partial sums converge at power law in r, i.e. br(y)∝r−λ, as is corroborated by figure 5c. This is consistent with hY having a singularity of the form qλ at q=0, which in turn implies exponential decay in t as (which seems reasonable, given our previous discussion on spectral theory). There is no universal scaling law, so λ depends on the problem at hand.
An estimate of λ, even if rudimentary, can allow us to estimate the discarded part of the summation, at least to the extent that we get a better estimate than assuming it to be zero. Using the result given in appendix A, the assumption br(y)∝r−λ leads to the approximation 3.12 In the examples we have investigated, we have found λ to lie between 0.2 and 0.6. One can use an empirical estimate, but we suggest using throughout—a principle to which we will have recourse later—and applying (3.12) when .
We have also computed the partial sums hY with the solution obtained from a PDE solver to satisfy ourselves that they converge to the correct result. Wherever we refer to a PDE solver, we have used the method of lines with finite differences in space and DASSL (see ) as the time solver; this is a well-established, highly effective, and often used approach for numerically solving potentially highly nonlinear and stiff diffusion-like equations in fluid mechanics and elsewhere . To compute hY, we compute gY and then take the logarithmic derivative, rather than solving the nonlinear PDE (2.2) directly.
(d) Derivation of the normalizing factor
We now know hY(τ,y), but this only allows us to reconstruct gY(τ,y) up to an arbitrary multiplicative time-dependent factor nY(τ) say, which we must now obtain: 3.13 (The lower limit of the integral is arbitrary, and taken as Y0 for convenience.) Inserting this into (2.1) gives a first-order linear differential equation for nY: and its solution must be, as , Note that the r.h.s. seems to depend on y, but does not actually do so, because hY obeys (2.2). Thus any y can be chosen, and using the same value as the lower limit of the y-integral causes the term to vanish: 3.14 Substituting (3.1) into (3.14) and performing the time integral (note dτ=−dq/2θq) gives 3.15 where the coefficients (βr), (β′r) are defined by 3.16 Remember, as emphasized by the ⋅ | Y0 notation in (3.16), that the functions (br), for r≥1, depend parametrically on Y0. Thus, if Y0 is altered then one must recalculate all the (br) from r=1 upwards, using (3.7), for (3.15) to remain correct: it is insufficient to keep the same (br) and evaluate them at a different point. The symbol B(a,b;q) denotes the incomplete Beta function, Note that and whereas requires a recurrence: In the OU case, A(y)≡−θy, the only effective terms in (3.15) are the (1−q)−1/2 prefactor and the first exponential term, with the in it; this is as it should be. In the special case where A is an odd function, as it is with all of our examples, and Y0=0 also, this simplifies to 3.17
Now (3.15) and its special case (3.17) are rather slowly convergent for small τ, i.e. q→1. This difficulty can in fact be finessed, as we now show. Multiply both sides in (3.13) by , and set y=Y0 and τ→0, getting The * symbol signifies the following: when r=0 the Beta function B(a,r) is undefined, but is taken to mean the value of in the limit q↗1, which is well defined and in fact equals . But by direct analysis of the FP equation, we find By comparison of these two results, we are able to establish two things, or, rather, one thing that can be used in two ways. First, we have an infinite product representation of the invariant density at the point Y0: 3.18 As we said above, altering Y0 requires a complete recalculation of the (βr). But in fact only one point is needed to determine for all y, since In the symmetrical case where A is an odd function, it is obvious to choose Y0=0, and then (3.18) simplifies to 3.19
However, is often known, as we said earlier with (1.5)–(1.7), or else can be calculated as an exponential integral followed by normalization to unit probability mass. The second idea, therefore, is to reverse the previous logic and use to obtain ρN(0): 3.20 or, when A is odd, 3.21 The first term in either of the above is the logarithm of the quotient of the Normal distribution (mean 0, variance 1/θ) and the true invariant density.
We can use this (exact) expression for ρN(0) to sharpen the approximation in (3.15), thereby making it match at τ=0; recall that it already does so at . Judging from (3.17) and the discussion surrounding (3.12), βr′=O(r−λ) as for some positive λ, and the expression defines an analytic function of q that is of order qλ as q↘0, by the Tauberian theorem. Accordingly, we are motivated to write 3.22 Thus, for all N. As we can easily compute ρN(0), we reduce the truncation error in (3.15) from ρN(τ) to . As any λ>0 will exactly remove the truncation error at τ=0, we are free to choose throughout, as previously intimated.
(e) Leading-order expansion
We are now ready to put everything together, and in the first instance we use no correction terms, thereby ignoring all the (br). We have in this approximation, by (3.1), and by (3.15), (3.20) and (3.22), Multiplying these two to get gY(τ,y), and then by , gives us what we are after: 3.23 In effect, this is the invariant density multiplied by a Gaussian of time-dependent width and centre; we recall that θ comes from (3.9) and (3.10), and we are standardizing on .
We are now in a position to explore the efficacy of the expansion scheme versus full numerical solutions and we proceed to do so in figures 6–8. Each shows a numerical simulation of the full PDE, using the same parameters as in §3c, i.e. , for Y0=0,−2; the shift in the source position illustrates that, in each case, the solution drifts back to the origin as it simultaneously diffuses outward. In each case, we illustrate for f, on log axes to accentuate any error, the numerical solution of the PDE for τ=0.1,1,5 versus the N=0 (just the leading-order) solution (3.23). The solutions are virtually indistinguishable. As a more stringent, and demanding, test upon the methodology the double-well example (1.9) is also evaluated both numerically and via the expansion; the results shown in figure 9 are again remarkably accurate, particularly considering that it is just the N=0 approximation that is shown.
(f) Higher order development
We can, however, pursue the approximation to higher order in an attempt to squeeze the error down further. This we illustrate by plotting the difference between the numerical solution for the density and the approximation (summarized by equations (3.1), (3.13), (3.15), (3.20) and (3.22)) in figure 10. The numerical solution is evaluated using a highly accurate implicit time solver and uses central differences that are accurate to O(δy3)∼O(10−6), where δy is the gridspacing that is 10−2 in the simulations shown; thus the observed difference of O(10−6) is to be expected. Figure 10 shows the N=0 approximation together with increasing values of N (2,5,10,20,50) and it appears that the error converges to zero as . For reasons of space, we just show the results for model (1.5), but the picture is similar for the other cases.
The integrals in (3.7) are algebraically intractable in general, and by way of numerical techniques we suggest the use of Chebyshev approximation. Set up an interval , approximate br at the Chebyshev nodes (yk), which allows its derivatives to be evaluated at any point as a function of the expansion coefficients ; then evaluate at each k the integral (3.7) by Gauss–Legendre quadrature (which, apart from the term containing A, will be exact provided the quadrature order is , where m is the degree of the Chebyshev approximation,2 and r* is the highest value of r desired in (3.7)), and use these values br(yk) as the samples of an interpolating Chebyshev approximation. The recurrence is initialized by approximating b0. Incidentally, this method was used to generate the results in figure 5 using degree 21 on the interval [−10,10]. Note that y↦hY(τ,y) is approximated as a polynomial, for each τ; the integral in (3.13) can be done by Gaussian quadratures again or directly from the Chebyshev expansion of the (br).
(g) Divergent (unstable) diffusions
An intriguing question is whether the methods described here also work for ‘unstable’ (non-reverting) diffusions, i.e. those with no invariant density. Hongler & Desai  point out that the so-called repulsive Wong model admits a closed-form solution in one isolated case. This is whose FP equation solves for α=1 as Now the above diffusion is not reconcilable with (1.7), on account of the sign in the drift being wrong. Inasmuch as it is related to the OU process, one has to take (1.4) with negative mean reversion (κ=−1, σ=1, γ=1), and then transform by . There is, of course, no invariant density. There is an unphysical steady-state solution to the FP equation, which is . However, division of fY by this to get the solution to the adjoint forward equation does not give a solution that tends to 1 as , so the methods given in this paper, and in particular (3.23), fall apart. Note that the form of the solution is two diverging blobs of probability mass. This is important: whereas mean-reverting diffusions do look like the standard OU process, allowing the solution to the FP equation to be expanded around the solution of the OU process, divergent ones may exhibit structural features particular to themselves and cannot be regarded in the same way.
That said, the quantity is simple, and it does have a large-time limit, . But this limit is not obtainable from the logarithmic derivative of the unphysical steady-state solution, as that is We may fairly state the following conclusions: (i) the expansion shown here is inapplicable to divergent diffusions; (ii) an alternative form of expansion of the logarithmic derivative of the density may well prove useful; and (iii) it is an interesting avenue for further research.
The series (3.1) is derived as an expansion in the limit t→0. Thus, it may or may not be convergent in the classical sense, i.e. for each τ,y. For the examples we have considered, the empirical evidence is that br(y) asymptotically decays at power order in r as , and this is sufficient for the purpose.
Clearly, some conditions must be obeyed by A for convergence to hold. While the N=0 approximation (3.23) is generally applicable, the usefulness of higher order terms is predicated on differentiability of A, because, as the order of the approximation (N) is increased, successively higher derivatives of A are invoked, on account of the br′′ term. It seems likely that a necessary condition on A is that it be analytically continuable to the open strip |Im y|<η, for some η>0. What further conditions are required for hY to have the posited Laurent expansion (3.1) convergent in the punctured disc |q−1|<1 is a matter for further research.
We reiterate that, whether the series is classically convergent or not, the error is uniformly bounded in τ, by contrast with an expansion in powers of τ.
(i) Fusion with existing techniques
We argue that it is possible to combine the best features of the series expansion shown here with those of classical methods of solving PDEs. If (3.1) is non-convergent, it will be impossible to obtain arbitrary accuracy (in general, this is a perennial problem with asymptotic expansions: they give excellent accuracy in certain regions but cannot achieve arbitrary accuracy at any given point)—and, even if it does converge, its convergence may be inconveniently slow. Classical techniques and the existence theorems that relate to them are not reliant on analyticity, but can potentially obtain arbitrary accuracy. (From a more general context: on a compact interval any continuous function may be approximated to arbitrary accuracy by a polynomial or a Fourier series—but neither constitutes any sort of power series expansion of the function, as it may well not be analytic. In approximation theory, one does not want to be forced to assume analyticity of the function being approximated.)
Consider an approximate solution f♯(τ,y), such as the N=0 approximation though any other N would do, and define the relative error ψY via the substitution then writing down the PDE obeyed by ψY. As we have extracted the τ=0 and behaviour correctly in , the initial condition is ψY≡0 (and we expect ψY to vanish as ). The resulting equation for ψY, though nonlinear, can still be solved by grid-based methods or by spectral methods. But whereas, before, we said that spectral methods have difficulty coping with a singular initial condition, they are now being applied to the slowly varying function ψY, and so their performance will be much better. In essence, as far as numerics are concerned, all the hard work has been done by extracting .
This idea has been used in different guises for many decades. For example in , it is common to see special functions approximated similarly. After studying the expansion's behaviour and/or singularities, apply an appropriate transformation to take these out and map the domain to [−1,1]. The transformed function, which by construction is slowly varying and properly behaved at the endpoints, is then ideally approximated as a Chebyshev series—and this is the kind of problem for which spectral methods are ideal.
We have shown how to expand the solution of the forward and backward equations of modified OU processes ‘around’ the OU case in a way that respects the characteristics of the problem. The method generalizes to other processes, and our conclusions are summarized as follows.
— It is more convenient to work in normalized coordinates, hence the transformation from X to Y and the time change from t to τ. By this transformation, we can assume that the volatility term is constant. Throughout, q=e−2θτ so that q∈[0,1] and we expand in q(1−q)r. The suggested value of θ is that given by (3.9).
— The solution to the forward equation is given by , where gY solves the adjoint forward equation (also known as the backward equation).
— The case with no correction terms, N=0, is given by (3.23).
In terms of computation, we have found that this method improves upon the use of a standard PDE solver. One reason for this is that, in common with spectral methods, it gives arbitrary spatial resolution, whereas to get a spatial resolution of O(δx) using a PDE solver with, say, standard finite differences typically requires a time step of O(δx)2. We have found that the approximation, with a handful of correction terms, typically represents an improvement in speed of a couple of orders of magnitude over the PDE solver; using N=0 is obviously even faster.
5. Further developments
This section discusses possible avenues for further research, which we think are numerous and varied.
— Multi-dimensional analogues. These would require the expansion of a multi-dimensional problem around its associated multi-dimensional OU ansatz.
— Time-varying analogues. In principle, the logarithmic derivative of the associated FP equation has a useful structure even when the coefficients of the underlying stochastic differential equation are time-varying, but it is less clear how to proceed as a steady-state solution can no longer be identified: thus one might have to work directly with the FP equation rather than its adjoint.
— If the process is strictly positive, as happens in certain financial models, then the natural ansatz is no longer the OU process, but instead the square-root process for which the FP equation has a known solution (e.g. ). This would allow such processes as to be analysed. A further development is when the process is bounded on both sides, such as
— The representation of the solution to the forward equation arising from a Lévy process.
— Barrier problems. These fall into two broad categories. One possibility is to solve the FP equation with a delta-function initial condition, in the presence of one or more absorbing barriers. In that case, an infinite product solution can be thought of as a term corresponding to the initial condition, a set of terms that vanish on the boundary(ies), and the exponential of a remainder series. An alternative is to solve the backward equation, giving the density of the stopping-time conditionally on starting from some point X0. This would allow insight to be gained into such problems as the first passage time of an OU process through a barrier: no simple solution is available, but approximate analytical forms are known, and an infinite product expansion should allow the errors in these to be quantified and corrected in a sensible way.
This work did not involve any research on humans or animals.
The datasets supporting this article have been uploaded as part of the electronic supplementary material.
R.J.M. framed and developed the problem, and did some of the numerical simulations, the data gathering and some of the write-up. R.V.C. did all simulations involving the PDE solver, and those that required comparison with it, and the corresponding part of the write-up. M.J.K. worked with R.J.M. on most of the algebra, and assisted in the outer sections of the write-up. All the authors give approval for publication.
We declare we have no competing interests.
R.V.C. thanks the EPSRC for support under grant no. EP/J009636/1.
The authors thank the referees for their helpful comments and suggestions for improvement.
Appendix A. Remainder term approximation
Let We derive the approximation, in the limits q→0, :
Following an argument of Riemann employed in a similar context, we use the identity summing over r, we find, on writing ξ=Nx, It is necessary to approximate the term in the denominator for , and matching the function and its first derivative at 0 gives this allows the integral to be performed, with the desired result.
- Received February 7, 2015.
- Accepted May 22, 2015.
© 2015 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.