## Abstract

Computing averages over a target probability density by statistical re-weighting of a set of samples with a different distribution is a strategy which is commonly adopted in fields as diverse as atomistic simulation and finance. Here we present a very general analysis of the accuracy and efficiency of this approach, highlighting some of its weaknesses. We then give an example of how our results can be used, specifically to assess the feasibility of high-order path integral methods. We demonstrate that the most promising of these techniques—which is based on re-weighted sampling—is bound to fail as the size of the system is increased, because of the exponential growth of the statistical uncertainty in the re-weighted average.

## 1. Introduction

Averaging is to the theoretical and computational sciences what measuring is to empirical science, and it is hard to imagine a problem with more general implications. Formally, the problem of computing an expectation value for a quantity *a*(*x*) over a given probability distribution *p*(*x*) can be stated as
1.1

Very often it turns out that *x* spans a very high-dimensional space, which makes it impossible to evaluate the integral on a grid of points. Instead, importance sampling algorithms (Metropolis *et al.* 1953; Hastings 1970; Frenkel & Smit 2002) are used which generate a set of points distributed in accordance with the target probability *p*. Then, the expectation value can be computed as an average over that set of points,
1.2

The error in the estimate (1.2) decreases with *n*^{−1/2}, *provided that the sample points are uncorrelated*. In a number of circumstances obtaining uncorrelated samples from *p* is an exceedingly difficult problem, either because generating an individual point is computationally demanding, or because the sampling algorithm produces strongly correlated points, so that each one contributes very little to the reduction of the statistical error.

In these cases, one is led to explore the possibility of performing re-weighted sampling, i.e. generating points according to a different distribution *p*_{0}(*x*), which is easier to sample efficiently, and then computing the average relative to *p* using a correction weight *w*(*x*)=*p*(*x*)/*p*_{0}(*x*) involving the ratio of the probability densities:
1.3Many methods have been developed that are fundamentally based on the re-weighted average (1.3), which has found applications in physical chemistry (Duane *et al.* 1987; Kumar *et al.* 1992; Bonomi *et al.* 2009), theoretical physics (Barbour *et al.* 1998; Fodor & Katz 2002), economics and statistics (Hastings 1970; Smith & Roberts 1993). Section 2 will be devoted to a thorough analysis of the statistical properties of re-weighted sampling, and we will discuss under quite general assumptions the conditions required to apply this technique successfully.

Our analysis allows one to assess the efficacy of re-weighting in each of the contexts in which it is applied. One instance of an important problem which has recently been tackled by re-weighted sampling concerns the use of high-order path integration techniques in computer simulations that allow for the quantum nature of atomic motion. Conventional path integral (PI) methods (Feynman & Hibbs 1964; Parrinello & Rahman 1984; Ceperley 1995) involve a significant overhead compared with a purely classical treatment, and a considerable effort has been devoted to the quest for less computationally demanding approaches. High-order discretizations of the path have been available for some time (Takahashi & Imada 1984; Suzuki 1995; Chin 1997), but it has only recently become apparent that one can circumvent the calculation of the bothersome second derivatives of the physical potential that arise in these discretizations by using re-weighted sampling (Jang *et al.* 2001; Yamamoto 2005). If successful, this approach would make high-order path integration generally applicable and very attractive for many applications.

To provide an example of the utility of the results established in §2, we shall therefore proceed in §3 to focus on the role of re-weighted sampling in high-order PI simulations. We shall show in particular that this form of path integration is eventually bound to fail because of the reduced statistical efficiency introduced by the re-weighting. Simulations of small clusters and/or mildly quantum mechanical problems constitute a niche in which this approach might be beneficial (Yamamoto 2005), but its performance is bound to degrade for larger systems.

## 2. Statistics of re-weighting

A key issue that is often overlooked is that averaging according to equation (1.3) will have a lower statistical efficiency than sampling *p* directly as in equation (1.2), i.e. for the same number of *uncorrelated* sample points, the statistical error in the re-weighted average will often be larger. Working out a simple, analytical expression for this drop in sampling efficiency would be extremely useful, as it would allow one to make an informed choice as to whether the computational gain of sampling *p*_{0} is overshadowed by the concomitant statistical inefficiency. A similar analysis has been performed in the related case of free-energy perturbation methods, resulting in guidelines that are useful to assess the reliability of simulations and to optimize the use of computing time (Pohorille *et al.* 2010).

In order to evaluate such an estimate, let us first simplify the notation of equation (1.3) by labelling *a*_{i} the *i*th sample for the observable *a*, and *w*_{i} the ratio of the probability densities *p*(*x*_{i})/*p*_{0}(*x*_{i}). Averages with respect to the target distribution *p* will be written as 〈⋅〉_{p}, while averages relative to *p*_{0} will be written as 〈⋅〉. With this notation,
2.1Our objective is to unravel the statistical properties of the weighted average (2.1) for a finite number *n* of sample points, and to compare them with those resulting from sampling the target distribution directly. We will assume that different samples are completely uncorrelated, since different sampling distributions and algorithms can be compared on the basis of the computational cost of generating a new uncorrelated sample point.

It is shown in the appendix that an asymptotic expression for the expectation value and the variance of can be obtained under very weak assumptions about the joint probability distribution
of *a* and *w*. To first order in *n*^{−1},
2.2and
2.3Equation (2.2) shows that for any finite number of samples, the re-weighted average is a biased estimator of 〈*a*〉_{p}, i.e. that is affected by a systematic error which decreases with *n*^{−1}, while equation (2.3) shows that the statistical error in the re-weighted mean decreases asymptotically as *n*^{−1/2}.

In order to perform a comparison between the weighted and the un-weighted sampling strategies, it is necessary to evaluate the coefficients which enter equations (2.2) and (2.3). To do this, we introduce an additional assumption, namely that *a* and are correlated Gaussian variates, with means 〈*a*〉 and 0, variances *σ*^{2}(*a*) and *σ*^{2}(*h*), and covariance 〈*ah*〉, all with respect to *p*_{0}. We introduce *h* because in many cases—certainly the vast majority of those occurring in the physical sciences—the weight will be proportional to the exponential of an extensive state function (e.g. a difference Hamiltonian). It is not difficult to justify the assumption of Gaussian statistics, since the observable and the logarithm of the weight are often computed by summing a large number of fluctuating contributions, and will therefore have nearly Gaussian statistics by virtue of the central limit theorem. In any case, the objective here is simply to discuss the qualitative features of re-weighted sampling, and the Gaussian limit provides a convenient framework to do this.

Under the assumption of a Gaussian joint probability distribution for (*a*,*h*), all expectation values of the form 〈*a*^{p}*w*^{q}〉 can be computed analytically in terms of the parameters of the distribution. Equations (2.2) and (2.3) then simplify to
2.4and
2.5These equations convey in a more transparent way the message that is inherent in their more general counterparts. The systematic bias depends on the cross correlation between the observable *a* and , and both the systematic and statistical error grow exponentially with *σ*^{2}(*h*). Whenever the fluctuations of *h* are smaller than 1, there will be little difference between the efficiency of sampling directly based on the target distribution and that of the re-weighted method. However, as soon as *σ*(*h*) becomes larger than one, both the systematic bias and the statistical error of the re-weighted approach will require a much larger number of samples for convergence to within a given threshold.

In figure 1, we present the results of some numerical experiments in which we generated correlated Gaussian variables *a* and *h*, and computed the weighted average of *a* according to equation (1.3). The results confirm the asymptotic nature of our estimates, which describe the behaviour of and in the large-*n* regime. The small-*n* limit can instead be pin-pointed very easily by considering the *n*=1 case, in which the average converges to the un-weighted value 〈*a*〉, and the variance is just *σ*^{2}(*a*).

While it is common knowledge that the fluctuations of the ‘difference Hamiltonian’ must be small in order to apply re-weighting successfully, we believe it will come as a surprise to many that fluctuations of the order of a few units in *h* will decrease the sampling efficiency by orders of magnitude. The message here is that in re-weighted simulations the joint probability distribution of *a* and *h* should be monitored carefully. The value of provides a direct indication of the statistical inefficiency owing to re-weighting, and the cross-correlation 〈*ah*〉 provides an equally direct indication of the systematic bias in . Both of these are straightforward to converge, as they involve un-weighted averages.

While equations (2.4) and (2.5) cannot be used quantitatively in the case of non-Gaussian statistics, they provide useful guidelines to judge the quality of sampling in a re-weighted calculation. Whenever the fluctuations of *h* are large and the weighted average is much closer to 〈*a*〉 than to 〈*a*〉−〈*ah*〉, it is very likely that the numerical results obtained from equation (1.3) will be meaningless. In order to provide an illustration of the utility of these estimates, we will now show that they can be used to predict the situations in which high-order PI schemes based on re-weighted sampling can be used successfully, and those in which they cannot.

## 3. High-order path integration

### (a) Suzuki–Chin path integration

Path integral methods aim at approximating the quantum mechanical partition function, *Z*=*Tr* e^{−βH}, by factoring it in such a way that the resulting expression can be evaluated within a classical framework. The simplest such factorization makes use of the well-known Trotter product formula (Schulman 1996), and results in an expression that is equivalent to a purely classical partition function in an extended phase space consisting of many replicas (beads) of the system, connected by harmonic springs to form a closed path (necklace) (Feynman & Hibbs 1964; Chandler & Wolynes 1981; Parrinello & Rahman 1984; Ceperley 1995). The overall Trotter Hamiltonian reads
3.1where {*q*_{j}} and {*p*_{j}} are the coordinates and momenta of the different beads, and *V* (*q*) is the physical potential. The classical partition function computed with this Hamiltonian at *P* times the physical temperature converges to the fully quantum mechanical partition function *Z*, with an error that is second order in *β*/*P*. The number of beads necessary for convergence is in general a small multiple of , where is the fastest physical vibration in the system. The computational cost therefore becomes very large for low temperatures and/or stiff vibrational modes, and considerable effort has been devoted to the quest for faster converging factorizations in order to reduce this cost.

It has been shown (Chin 2006) that factorizations with an error that is of an order higher than two in *β*/*P* must contain composite operators which have the form of nested commutators between the kinetic energy operator and the potential . The complexity of the nested commutators increases with the order of the factorization and is only really manageable up to fourth order. Among the different fourth-order factorizations developed so far (Takahashi & Imada 1984; Li & Broughton 1987; Suzuki 1995; Chin 1997; Jang *et al.* 2001), the one proposed by Suzuki (1995) and simplified by Chin (1997) has better convergence properties than most others. It also contains only the double commutator , which is proportional to the square modulus of the force and hence of a manageable complexity.

This Suzuki–Chin (SC) factorization results in a Hamiltonian
3.2which contains a bead-dependent modified potential
3.3The coefficients *w*_{j} and *d*_{j} take different values for odd and even beads, namely *w*_{2j}=2/3, *w*_{2j+1}=4/3, *d*_{2j}=*α*/6 and *d*_{2j+1}=(1−*α*)/12, where *α* is a free parameter that can be varied in the range [0,1]. While tuning *α* can improve the convergence in specific cases (Jang *et al.* 2001; Yamamoto 2005), there is no universally optimal choice, so in the present work we used *α*=0.

The modified Hamiltonian resulting from the SC factorization contains the square modulus of the physical force. A path integral molecular dynamics (PIMD) calculation based on the corresponding classical equations of motion therefore requires one to evaluate the derivative of the squared force term, which contains the Hessian. Unfortunately, this is prohibitively expensive to evaluate for large systems with all but the simplest inter-atomic potentials. One way around this difficulty is to exploit the possibility of sampling based on a quasi-Trotter Hamiltonian, which resembles equation (3.1) but has the physical potential on each bead weighted by *w*_{j} (Jang *et al.* 2001). The statistics corresponding to the SC Hamiltonian can then be recovered by re-weighting configurations with the weighting factor , where the difference Hamiltonian has the form
3.4The effect that re-weighting configurations based on equation (3.4) has on the statistical efficiency of sampling is the fundamental drawback of this approach, as will be discussed in the next subsection, based on the general results of §2.

### (b) The curse of system size

The high-order PI strategy we have just described may appear to be superior to Trotter PI in all respects (Jang *et al.* 2001). However, one must not overlook the question of statistical efficiency, which could imply that longer trajectories are required to obtain the same level of sampling accuracy, a factor that must be considered when comparing the computational viability of different approaches.

Given the analysis in §2, all we need to evaluate the errors owing to finite sampling in this context is an estimate of the variance of Δ*H*. Consider first the case of a simple harmonic oscillator with frequency *ω*, in the strongly quantum regime, where the distribution of configurations is well approximated by the ground state density . Then, the squared fluctuations of |*V* ′(*q*)|^{2} read
Under the simplifying assumption that the statistics of *q* for a finite-*P* PI simulation are the same as those of the quantum oscillator at full convergence, one can write down the squared fluctuations of the temperature-scaled difference Hamiltonian *h*≡*β*Δ*H*/*P* as
3.5where *k* varies between one (if one assumes that there is no correlation between the forces on different beads) and two (if one assumes that the forces on all beads are the same).

Equation (3.5) implies that it is possible to reduce the fluctuations of the weights *w*=e^{−h} at will by increasing the number of replicas. However, one should remember that keeping this number low was the primary reason for attempting a high-order factorization in the first place. Since the relative error in the energy owing to the discretization of the fourth-order PI grows as , it is clear that—for a given discretization error—the sampling problems inherent in re-weighted SC PIMD are bound to get worse as the quantum nature of the problem becomes more pronounced.

Moreover, the issue of statistical efficiency is exacerbated once one considers a system with *f* degrees of freedom. Since the difference Hamiltonian is size-extensive, its variance will scale linearly with *f*, leading to an exponential dependence of the statistical and systematic finite-sampling errors on the size of the system. For a multi-dimensional harmonic system with frequencies *ω*_{i} one can estimate—in the worst-case scenario of complete correlation of the forces on different replicas—that the total-squared fluctuations will be of the order of
3.6

In practice, this means that the high-order PI integration scheme which we have described in the previous section will yield a small statistical uncertainty up to a critical system size, above which the statistical efficiency will decrease exponentially. Our analysis of the harmonic limit yields an estimate of this critical size for a given value of . For instance, flexible water can be simulated successfully by Trotter PI with , and has . For *T*=300 *K* and *P*=32, equation (3.6) suggests that the fluctuations in the difference Hamiltonian will be of the order of one in a simulation of a few tens of water molecules. This is the sort of system size up to which re-weighted SC PIMD is therefore predicted to be advantageous.

### (c) An example: liquid water

In order to verify the predictions of equation (3.6) and those of §2 for a realistic condensed-phase example, we have examined the case of a flexible water model (Habershon *et al.* 2009) simulated at a temperature of 298 K and at the experimental density. We chose to perform simulations with 216 water molecules, so as to be in a regime in which—according to the arguments of §3*b*—the statistical inefficiency of re-weighting should be apparent.

We explored numbers of beads ranging between *P*=4 and *P*=128, and for each set of parameters performed 16 independent simulations, each of which comprised 20 ps for the initial equilibration and 1 ns during which averages were accumulated. The calculations were accelerated using the ring-polymer contraction technique (Markland & Manolopoulos 2008), with the same parameters as we have employed in previous simulations (Ceriotti *et al.* 2011). Canonical sampling was enforced by applying targeted Langevin thermostats to the internal modes of the ring polymer, and a global stochastic velocity rescaling with a time constant of 10 fs to the centroid (Bussi *et al.* 2007; Ceriotti *et al.* 2010). The quantum mechanical kinetic energy of the water molecules was accumulated using the modified form of the centroid virial estimator proposed by Yamamoto (2005). This estimator, which is based on a scaling of fluctuation coordinates (Predescu *et al.* 2003), can be evaluated by finite differences thereby avoiding the need for high-order derivatives of the potential.

As an initial test, we examined the convergence of the average kinetic and potential energies of the liquid as a function of the number of replicas, for both Trotter PI and re-weighted SC PI (figure 2). As expected, and as witnessed in previous calculations with a similar water model (Jang *et al.* 2001), the higher order of the PI factorization does translate into faster convergence. However, for the simulations with , the statistical error in the mean for the re-weighted SC method was found to be considerably larger than that of Trotter PIMD. Even more worrying, for re-weighted SC PIMD there is a significant difference between the weighted average of the kinetic energy and the asymptotic result predicted for Gaussian statistics (the limit of equation (2.4)). These issues will be considered in more detail below. As a final remark, it can be seen from figure 2 that the recently developed stochastic (Ceriotti *et al.* 2011) approach based on a generalized Langevin equation manages to obtain even faster convergence than SC PIMD, without any of the sampling issues that we are about to elucidate.

It remains to be verified whether the scenario presented in §3*b* based on a harmonic model and an analysis of the sampling efficiency in the Gaussian limit corresponds to the numerical finding in this real example. First of all, one needs to verify whether the scaling of the variance of the logarithm of the weights follows equation (3.6). Figure 3 demonstrates that both the scaling with *P* and with the number of degrees of freedom correspond to the predictions we have made, even if the quantitative values for *σ*^{2}(*β*Δ*H*/*P*) are lower by about a factor of 3 than the analytical estimates obtained from the harmonic model. Taking this correction into account increases to about 100 the number of water molecules for which re-weighted SC PIMD is computationally reasonable. This implies that such an approach may provide a marginal advantage over conventional Trotter factorization for *ab initio* PIMD simulations of moderate size.

Having verified that the fluctuations in the difference Hamiltonian closely obey equation (3.6), let us discuss the validity of the relations between these fluctuations and the error in the mean that we have worked out in §2. Figure 4 shows that—for the case of *P*=16, where the variance of the temperature-scaled difference Hamiltonian *h*=*β*Δ*H*/*P* is much larger than one—the trends do indeed correspond to those derived in the limit of Gaussian statistics. Averages computed from short trajectories suffer a large systematic bias, and they slowly approach the limit predicted from the cross correlation between the observable and the difference Hamiltonian as the number of samples is increased. To obtain a converged expectation value and therefore verify the accuracy of the prediction based on Gaussian statistics, a prohibitively long trajectory would be required. Also as anticipated, figure 4*a* clearly demonstrates that the statistical variance in the mean for re-weighted SC PIMD decreases much more slowly than for Trotter PIMD, completely offsetting the benefits of using the higher order propagator.

It is important to remark that we have been able to make these issues stand out clearly because we have considered a relatively simple problem, for which we could run long trajectories and gather extensive statistics. In real applications, one can rarely afford to perform such a detailed error analysis, and in this respect the analytical results we have derived provide a useful litmus test. The fluctuations of the temperature-scaled difference Hamiltonian *h* should be smaller than one, since for larger fluctuations the statistical inefficiency rapidly becomes unmanageable. Even more interesting is the cross correlation between the observable *a* and *h*, which indicates the extent of the systematic bias.

## 4. Conclusions

In this paper, we have performed a careful assessment of the statistical efficiency of re-weighted sampling. Our analysis shows a dramatic degradation in performance when the squared fluctuations of the logarithm of the weight exceed one. While it is common knowledge that large fluctuations in the weights lead to failure of such techniques, and that the size-extensivity of the difference Hamiltonian makes these methods unsuitable for the study of large-scale problems, our analysis shows clearly how dramatic this effect can be, and also that the problem is quite insidious.

In fact, insufficient sampling in a re-weighted calculation is very difficult to detect. It mainly manifests itself through the error in the mean decreasing more slowly than the inverse square root of the number of samples even after a very large number of samples have been accumulated, indicating that the asymptotic regime has not yet been reached (figure 4*a*). An even greater concern is that finite sampling implies a bias in the estimate of the observable, which decreases very slowly and would be exceedingly hard to detect in a more complex—or computationally demanding—problem (figure 4*b*). As a demonstration of how these results can be used, we have examined the applicability of high-order PI factorizations to condensed-phase simulations. The most promising of these techniques relies on re-weighting based on a difference Hamiltonian (Jang *et al.* 2001; Yamamoto 2005). We have worked out the dependence of the fluctuations in this quantity on the properties of the system being studied, the number of degrees of freedom involved and the number of PI beads. We have shown that for large and/or strongly quantum systems, a large number of replicas is required, not so much in order to converge the asymptotic expectation values of quantum observables, but to keep the fluctuations in the re-weighted average under control. Our analytical results allow one to predict the break-even point at which the advantages of high-order PIs are lost owing to statistical inefficiency. These predictions have been qualitatively verified by numerical experiments on the simulation of a flexible model of water under ambient conditions. For this system, high-order SC PIMD is advantageous for simulations of up to around 100 water molecules, but no more.

While a niche for the use of Hessian-free high-order PIMD therefore exists for the study of small clusters and for *ab initio* PIMD, one must consider that accelerated PIMD based on stochastic dynamics (Ceriotti *et al.* 2011) exhibits comparable or faster convergence of quantum observables, without being affected by any of the statistical issues we have focused on here. We firmly believe that this stochastic acceleration is therefore a far more promising approach to large-scale PI simulations.

## Acknowledgements

The authors gratefully acknowledge an insightful discussion with Michele Parrinello, and funding from the Wolfson Foundation, the Royal Society, and the Swiss National Science Foundation.

## Appendix A. An asymptotic expression for re-weighted averages

In this appendix, we present a derivation of the asymptotic expansion of the expectation value of a re-weighted average of *n* terms from a general distribution. As usual, we say that the formal sum is an asymptotic expansion for *f*_{n} if for any fixed *k* we have as . This does not imply that the sum converges. Our main result is as follows:

### Theorem A.1

*Let* *be n independent samples from a (correlated) joint distribution p*_{0}*(a,w). Suppose that w takes only positive values, and that all moments of a, w and w*^{−1} *are finite. Let
*
A1*Then* *and* *have asymptotic expansions in n whose coefficients are determined by the joint moments of p*_{0}*(a,w), with the leading terms in these expansions being given by equations (2.2) and (2.3).*

The argument below will show how to calculate the higher order coefficients of the expansion, although there does not seem to be a simple explicit formula for the general coefficient *c*_{j}.

Multiplying the distribution *w* by a constant (i.e. multiplying all *w*_{i} by the same constant) does not affect , so we assume in the proof that 〈*w*〉=1. With this assumption, equations (2.2) and (2.3) simplify to
A2and
A3Explicit calculations can be further simplified by subtracting a constant from *a* (and hence from ) to ensure that 〈*aw*〉=0, but we shall not do this in the following.

Since all the samples are independent and identically distributed, one can write
A4where is the sum of *n* independent copies of *w*, and *z*_{n−1} (but not *z*_{n}) is independent of the correlated pair (*a*,*w*).

Let and be the centralized versions of *w* and *z*_{n}. We first consider the properties of . Even in the (canonical) special case, where *w* is log-normal, not much can be obtained analytically about the distribution of (Fenton 1960). However, one can obtain its moments using the general formula
A5relating the cumulants *κ*^{(k)} and moments *μ*^{(k)} of a distribution. When *μ*^{(1)}=0 (and hence *κ*^{(1)}=0), as is the case for the centred variables and , equations (A5) can be written as
A6We therefore first calculate the cumulants of the distribution of using equation (A6). The cumulants of are simply , and the moments of can be recovered by inverting equation (A6). By induction on *k* we find that is a polynomial in *n* of degree ⌊*k*/2⌋; this can also be seen directly by expanding the expectation of the *k*th power of the sum of the .

An asymptotic expansion for can now be developed by expanding the denominator in equation (A4) about its mean. Let . Using the identity , we have
A7Since , and is independent of (*a*_{n},*w*_{n}), the terms in the sum can be written out in terms of the moments of :
A8For each *j*, the term described by equation (A8) is a (Laurent) polynomial in *n* with powers between *n*^{−⌈j/2⌉} and *n*^{−j}. We claim that
A9so the residual term shrinks with *n*. Moreover, summing equation (A8) over *j*≤2*d* and collecting coefficients gives the first terms of the asymptotic expansion of down to order *n*^{−d}. For example, setting *d*=1 and discarding the terms with an *n*^{−2} dependence yields equation (A2).

The bound equation (A9) is perhaps most easily seen by applying the Cauchy–Schwartz inequality twice, or Hölder's inequality once, to show that in general . Here *A*_{1}=*a*_{n} and *A*_{2}=*w*_{n} contribute constant factors to the product. With , we have , so has the required order . Finally, . Since the function *x*^{−4} is convex, and *z*_{n}/*n* is just the average of the *w*_{i}, we have , so , which is by assumption a finite constant, and the bound equation (A9) follows.

For the variance, the argument is very similar, now using the fact that and the identity A similar method applies to higher moments of .

Note that the assumption that all negative moments of *w* are finite was only used to control the tail behaviour of 1/(1+*X*)=(*z*_{n}/*n*)^{−1} near 0. In fact, this has much better tail behaviour than *w*^{−1}, so much weaker conditions suffice. For example, splitting the sum *z*_{n} into groups of *m* and noting that , one can check that the result holds assuming only that 〈*w*^{−α}〉 is finite for *some* positive *α*.

Finally, we should emphasize that one cannot expect the infinite sum corresponding to equation (A7), or the asymptotic expansion of , to converge for any fixed *n*. Indeed, in the log-normal case where *w*=e^{−h}, we have , which becomes after normalizing so that 〈*w*〉=1. Moreover, the *k*th central moment also contains a component proportional to (it is a linear combination of this and lower moments). Hence, equation (A8) will contain a component proportional to . For any *n*, this eventually increases with *j*, so the infinite sum does not converge. Similar considerations strongly suggest that the first few terms in the asymptotic expansion will be a good approximation only if *σ*^{2}(*h*) is small compared with (at most a certain constant times) .

- Received July 7, 2011.
- Accepted August 10, 2011.

- This journal is © 2011 The Royal Society