## Abstract

Covariance inflation is an ad hoc treatment that is widely used in practical real-time data assimilation algorithms to mitigate covariance underestimation owing to model errors, nonlinearity, or/and, in the context of ensemble filters, insufficient ensemble size. In this paper, we systematically derive an effective ‘statistical’ inflation for filtering multi-scale dynamical systems with moderate scale gap, , to the case of no scale gap with , in the presence of model errors through reduced dynamics from rigorous stochastic subgrid-scale parametrizations. We will demonstrate that for linear problems, an effective covariance inflation is achieved by a systematically derived additive noise in the forecast model, producing superior filtering skill. For nonlinear problems, we will study an analytically solvable stochastic test model, mimicking turbulent signals in regimes ranging from a turbulent energy transfer range to a dissipative range to a laminar regime. In this context, we will show that multiplicative noise naturally arises in addition to additive noise in a reduced stochastic forecast model. Subsequently, we will show that a ‘statistical’ inflation factor that involves mean correction in addition to covariance inflation is necessary to achieve accurate filtering in the presence of intermittent instability in both the turbulent energy transfer range and the dissipative range.

## 1. Introduction

An integral part of each numerical weather forecasting scheme is the estimation of the most accurate possible initial conditions given a forecast model with possible model error and noisy observations at discrete observation intervals; this process is called *data assimilation* [1,2]. The presence of the often chaotic multi-scale nature of the underlying nonlinear dynamics significantly complicates this process. Each forecast estimate, or *background state*, is an approximation of the current atmospheric state, with a whole range of sources for error and uncertainty such as model error, errors owing to the nonlinear chaotic nature of the model, the unavoidable presence of unresolved subgrid-scales, as well as errors caused by the numerical discretization (e.g. [3]).

An attractive method for data assimilation is the ensemble Kalman filter introduced by Evensen [4,5] which computes a Monte Carlo approximation of the forecast error covariance on-the-fly as an estimate for the degree of uncertainty of the forecast caused by the model dynamics. The idea behind ensemble-based methods is that the nonlinear chaotic dynamics of the underlying forecast model and the associated sensitivity to initial conditions cause an ensemble of trajectories to explore sufficiently large parts of the phase space in order to deduce meaningful statistical properties of the dynamics. A requirement for a reliable estimate is an adequate size of the ensemble [6–8]. However, all currently operational ensemble systems suffer from insufficient ensemble sizes causing sampling errors. The associated underdispersiveness implies that the true atmospheric state is on average outside the statistically expected range of the forecast or analysis [9,10]. An underdispersive ensemble usually underestimates the forecast error covariance which potentially leads to filter divergence whereby the filter trusts its own forecast and ignores the information provided by the observations. This filter divergence is caused by ensemble members aligning with the most unstable direction of the forecast dynamics as shown by Ng *et al.* [11]. To mitigate this underdispersiveness of the forecast ensemble and avoid filter divergence, the method of covariance inflation was introduced [12] whereby the prior forecast error covariance is artificially increased in each assimilation cycle. Covariance inflation can either be done in a multiplicative [13] or in an additive fashion [14,15]. This is usually done globally and involves careful and computationally expensive tuning of the inflation factor (for recent methods on adaptive estimation of the inflation factor from the innovation statistics, see [16–19]). Although covariance inflation is widely used in the context of ensemble-based filters, it can also be used to mitigate covariance underestimation in the presence of model errors in a non-ensemble-based setting. For example, Majda & Harlim [2] discuss an effective covariance inflation using an information theory criterion to compensate model errors owing to numerical discretization in linear filtering problems. From the perspective of linear Kalman filter theory, the covariance inflation improves the linear controllability condition for accurate filtered solutions [20]. Law & Stuart [21] inflate the static background covariance in a three-dimensional variational (3DVAR) setting to stabilize the filter.

The goal of this paper is to compute an effective ‘statistical inflation’ to achieve accurate filtering in the presence of model errors. In particular, we study filtering multi-scale dynamics with model errors arising from a misrepresentation of unresolved sub-grid scale processes in the regime of moderate time-scale separation. The filtering problem, where the fast degrees of freedom are not observable, has been considered by Pavliotis & Stuart [22], Zhang [23], Mitchell & Gottwald [24] and Imkeller *et al.* [25], with a large scale gap assumption, *ϵ*≪1. Several numerical methods to address the same problem for moderate scale gap were developed by Harlim [26], Kang & Harlim [27] and Harlim & Majda [28]. While these methods produce encouraging results on non-trivial applications, unfortunately, they are difficult to justify in a rigorous fashion. Averaging and homogenization techniques developed by Khasminskii [29], Kurtz [30] and Papanicolaou [31] provide a rigorous backbone for developing reduced slow dynamics for certain classes of multi-scale systems. Following the idea of Mitchell & Gottwald [24], we will study how these stochastic model reductions can be used in data assimilation and how they act as a dynamically consistent way of statistical inflation. Our particular emphasis here is to understand the role of additive and multiplicative noise in stochastic model reduction in improving filtering skill. We employ theoretically well-understood stochastic model reductions such as averaging, formulated in the framework of singular perturbation theory and stochastic invariant manifold theory. We consider multi-scale systems whose averaging limit does not generate additional stochastic diffusion terms; as an innovative feature, we reintroduce stochastic effects induced by the fast dynamics by going one order higher in the time-scale separation parameter *ϵ* in the classical averaging theory. The question we address here is, in what way do these stochastic effects improve the process of data assimilation for multi-scale systems.

The outline of the paper is the following. In §2, we consider a linear model and show how additive inflation naturally arises as judicious model error for the resolved slow scales. We will show that while singular perturbation expansion allows us to improve the estimates of the statistical moments when compared with the averaged system, the reduced Fokker–Planck equation for the slow variable does not support a non-negative probability density function. Consequently, this method does not provide any means to estimate the temporal evolution of the mean and the variance needed for data assimilation. As a remedy, we will derive an approximate reduced stochastic model via invariant manifold theory [32–35]. This yields a Langevin equation for the slow variable the variance of which is one order less accurate than that obtained with the singular perturbation theory, but provides an explicit expression for the temporal evolution which can be used to propagate mean and variance in the data assimilation step. The reduced stochastic differential equation (SDE) obtained through stochastic invariant manifold theory produces significantly better filtering skills when compared with the classical averaged equation. The additional additive noise correction of the reduced stochastic system provides an effective covariance inflation factor. Nonlinear problems will be discussed in §3. There, we will consider an analytically solvable nonlinear stochastic test model, mimicking turbulent signals in regimes ranging from the turbulent energy transfer range to the dissipative range to laminar regime [2,36]. We will show that the approximate reduced model involves both, additive and multiplicative noise, and improves the accuracy in the actual error convergence rate of solutions when compared with the classical averaged equations. For data assimilation applications, we will derive an analytically solvable non-Gaussian filter based on the approximate reduced model with additive and multiplicative noise corrections which effectively inflates the first- and second-order statistics. We will demonstrate that this non-Gaussian filter produces accurate filtered solutions comparable with those of the true filter in the presence of intermittent instability in both the turbulent energy transfer range and the dissipative range. We conclude with a short summary and discussion in §4. We accompany this article with an electronic supplementary material that discusses the detailed calculations.

## 2. Linear model

We first consider the linear multi-scale model
2.1
2.2for a slow variable and fast variable . Here, *W*_{x} and *W*_{y} are independent Wiener processes and the parameter *ϵ* characterizes the time-scale gap. We assume throughout that *σ*_{x},*σ*_{y}≠0 and that the eigenvalues of the matrix
are strictly negative, to assure the existence of a unique invariant joint density. Furthermore, we require to assure that the leading-order slow dynamics supports an invariant density (cf. (2.3)).

### (a) Singular perturbation theory: averaging and beyond

The multi-scale linear system (2.1) and (2.2) is amenable to averaging. A recent exposition of the theory of averaging and its applications is provided, for example, in Givon *et al.* [37] and Pavliotis & Stuart [38]. The goal of this section is to apply singular perturbation theory to obtain one order higher than the usual averaging limit. We will derive an approximation for the joint probability density function *ρ*(*x*,*y*,*t*) of the full multi-scale system (2.1) and (2.2), when marginalized over the fast variable *y*. The resulting higher-order approximation of the probability density will turn out to be not strictly non-negative which implies that it is not a proper probability density function, and therefore cannot be associated with a SDE for the slow variable *x*. However, it will enable us to find corrections to the mean and variance and all higher-order moments.

The standard effective averaged slow dynamics,
2.3is obtained by averaging the slow equation (2.1) over the conditional invariant measure induced by the fast dynamics where the slow variable *x* is assumed fixed in the limit of infinite time-scale separation *ϵ*→0. Formally, the effective averaged slow dynamics can be derived by finding a solution of the Fokker–Planck equation associated with (2.1) and (2.2) with the following multi-scale expansion:
2.4and the reduced system in (2.3) is the Langevin equation corresponding to the marginal distribution, (see the electronic supplementary material, appendix A). Here, we are looking for a higher-order correction through the contribution from *ρ*_{1}(*x*,*y*,*t*). In order for *ρ*_{0}+*ϵρ*_{1} to be a density, we require
2.5Again, see the electronic supplementary material, appendix A (and equation (A 14)) for a solution *ρ*_{1} that satisfies (2.5).

We find that for each value of *ϵ* the expression *ρ*=*ρ*_{0}+*ϵρ*_{1} is not a strictly positive function of *x* and *y*. This implies that, contrary to *ρ*_{0}, the next-order probability density function *ρ*=*ρ*_{0}+*ϵρ*_{1} does not satisfy a Fokker–Planck equation which describes the time evolution of a probability density function. As such it is not possible to find a reduced Langevin equation corresponding to *ρ*=*ρ*_{0}+*ϵρ*_{1}. This indicates that in general memory effects are too significant to allow for a decorrelation at that order. In the next subsection, we will explore a different approximative approach which allows to derive approximate slow Langevin equations. This as we will see will come at the cost of a less accurate estimate of the variance of the slow variable.

Despite the lack of a probabilistic interpretation of the Fokker–Planck equation associated with *ρ*_{0}+*ϵρ*_{1}, we can use the higher-order approximation of the probability density function to calculate -corrections to the variance of the slow variable. We find that , which implies that *ρ*_{1} does not contribute to the mean solution, and
2.6Note that for sufficiently small values of *ϵ*, the approximation to the variance is always positive. In figure 1, we show how the approximation to the variance (2.6) converges with *ϵ* to the variance of the slow variable *x* of the full multi-scale system (2.1) and (2.2). We plot the absolute error of the variances using only the averaged system (i.e. *Q*_{0}), with partial correction (i.e. *Q*_{0}+*ϵQ*_{1}), and with full higher-order correction (i.e. *Q*_{0}+*ϵ*(*Q*_{1}+*Q*_{2})). As expected from the singular perturbation theory, we observe linear scaling behaviour for the averaged system without *ϵ*(*Q*_{1}+*Q*_{2}) correction and quadratic scaling behaviour with the higher-order correction in (2.6). We should also point out that the covariances *Q*_{1}+*Q*_{2} are not Gaussian statistics (one can verify that the kurtosis is not equal to three times of the square of the variance for a generic choice of parameters), and therefore it is mathematically unclear how to use this statistical correction in the data assimilation setting since we have no information about the temporal evolution of these non-Gaussian statistics without a Fokker–Planck equation which supports a probabilistic solution.

### (b) Stochastic invariant manifold theory: an approximate reduced slow diffusive model

In this section, we will derive an approximate reduced SDE for the general linear system (2.1) and (2.2). The resulting equation will produce inferior second-order statistics compared with the singular perturbation theory described in §2a (cf. (2.6)), but will be advantageous in the context of data assimilation as we will demonstrate in §2c.

We employ here invariant manifold theory [32–35]. We perform a formal perturbation theory directly with equations (2.1) and (2.2). Ignoring terms of in (2.2), we obtain
The slow stochastic invariant manifold *h*(*x*,*t*) is hyperbolic, satisfying the requirements outlined in Fenichel [33]. Substituting into (2.1) and ignoring the contribution, we arrive at a reduced equation for the slow variable
2.7which contains an additional additive noise term when compared with the averaged reduced equation (2.3). Note that, whereas, in the framework of the Fokker–Planck equation used in singular perturbation theory, the diffusion term and the drift term of the fast equation are of the same order in *ϵ*; invariant manifold theory works directly with the SDE and hence allocates different orders of *ϵ* to the drift and diffusion terms.

We now show that solutions of (2.7) converge to solutions *x*^{ϵ}(*t*) of the full system (2.1) and (2.2) in a pathwise sense. There exists a vast literature on stochastic invariant manifold theory (see, for example, the monograph by Berglund & Gentz [32]). There are, however, not many convergence results which establish the scaling with the time-scale separation *ϵ*. In the electronic supplementary material, appendix B, we show that the error is bounded for finite time *T* by
2.8This states that the rate of convergence of solutions of the stochastically reduced system (2.7) is one order better than for the averaged system (2.3). Figure 2 illustrates the convergence of solutions of the averaged model (2.3), and the reduced model (2.7) obtained from stochastic invariant manifold theory confirming the estimate in (2.8). Figure 1, however, shows that this superior scaling of the solutions does not imply better scaling in the convergence of the variances. Recall that the variance of the reduced system (2.7) constitutes a truncation of the -correction (2.6), but note the improvement in the actual error when compared with the averaged system with (again, see figure 2).

### (c) Data assimilation for the linear model

In this section, we compare numerical results of filtering the linear system in (2.1) and (2.2), assimilating noisy observations,
2.9of the slow variable *x* at discrete time step *t*_{m} with constant observation time interval *Δt*=*t*_{m+1}−*t*_{m}. For convenience, we define *u*=(*x*,*y*)^{T} and the observation operator *G*=[1,0]. The observations in (2.9) are corrupted by unbiased Gaussian noise with variance *r*^{o}. In our numerical simulation below, we define signal-to-noise ratio, SNR=*Var*(*x*)/*r*^{o}, such that SNR^{−1} indicates the relative observation error variance compared with the temporal (climatological) variance of *x*.

Suppose that *u*^{−}_{m} and *R*^{−}_{m} are the prior mean and error covariance estimates at time *t*_{m}. In this Gaussian and linear setting, the optimal posterior mean, *u*^{+}_{m}, and error covariance, *R*^{+}_{m}, estimates are obtained by the standard Kalman filter formula [39]
2.10To complete one cycle of filtering (or data assimilation), the next prior statistical estimates are obtained by propagating the posterior according to
2.11and
2.12where for the true filter, *F* and *Q* are 2×2-matrices associated with the dynamical propagation operator, corresponding to the full linear multi-scale system (2.1) and (2.2) and the model error covariance, respectively. We will compare the true filter with two reduced filtering strategies: (i) Reduced stochastic filter (RSF) which implements a one-dimensional Kalman filter on Gaussian prior statistics produced by the standard averaging model (2.3) and (ii) Reduced stochastic filter with additive correction (RSFA) which implements a one-dimensional Kalman filter on Gaussian prior statistics produced by model (2.7) obtained from stochastic invariant manifold theory. For both reduced filters (RSF and RSFA), we implement the same formulae (2.10) and (2.11), replacing *u* with *x*, *G*=[1,0] with *G*=1 and using . The difference between RSF and RSFA is on the covariance update where *Q*=*Q*_{0} for RSF and *Q*=*Q*_{0}+*ϵQ*_{1} for RSFA. Therefore, RSFA is nothing but applying RSF with an effective additive covariance inflation factor, *ϵQ*_{1}.

To measure the filter accuracy, we compute a temporally averaged root mean square (r.m.s.) error between the posterior mean estimate, *x*^{+}_{m}, and the underlying true signal, *x*(*t*_{m}), for numerical simulations up to time *T*=10 000. In figure 3, we show the average r.m.s. errors as functions of *ϵ* for *Δt*=1 and SNR^{−1}=0.5 (figure 3*a*); as functions of SNR^{−1} for *ϵ*=1 and *Δt*=1 (figure 3*b*) and as functions of *Δt* for *ϵ*=1 and SNR^{−1}=0.5 (figure 3*c*). Note that the average r.m.s. error for RSFA is almost identical to that of the true filter. More importantly, the additive inflation in RSFA improves the filtered accuracy compared with RSF.

## 3. Nonlinear model

In this section, we consider solutions of the nonlinear stochastic parametrized extended Kalman filter (SPEKF) model as true signals. This nonlinear test model was introduced in Gershgorin *et al.* [40,41] for filtering multi-scale turbulent signals with hidden intermittent instabilities. In particular, the SPEKF model is given by the following system of SDEs:
3.1with and *λ*_{b}=*γ*_{b}−*iω*_{b}. Here, *u* represents a resolved mode in a turbulent signal with the nonlinear mode-interaction terms replaced by an additive complex-valued noise term and multiplicative stochastic damping as is often done in turbulence modelling [42–46]. In (3.1), *W*_{u} and *W*_{b} are independent complex-valued Wiener processes and *W*_{γ} is a real-valued standard Wiener process. The nonlinear stochastic system in (3.1) involves the following parameters: stationary mean damping and mean bias and two oscillation frequencies *ω* and *ω*_{b} for the slow mode *u*; two damping parameters *γ*_{b} and *d*_{γ} for the fast variables and ; three noise amplitudes *σ*_{u},*σ*_{b},*σ*_{γ}>0 and deterministic forcing *f*(*t*) of the slow variable *u*. Here, two stochastic parameters and are modelled with Ornstein–Uhlenbeck processes, rather than treated as constant or as a Wiener processes [47,48]. For our purpose, we consider the temporal scales of to be of order *t*/*ϵ*, which leaves the system amenable to averaging for *ϵ*≪1.

The nonlinear system in (3.1) has several attractive features as a test model. First, it has exactly solvable statistical solutions which are non-Gaussian, allowing study of the non-Gaussian prior statistics conditioned on the Gaussian posterior statistics in a Kalman filter. This filtering method is called SPEKF. Note that it is different from the classical extended Kalman filter that produces a Gaussian prior covariance matrix through the corresponding linearized tangent model [1,39]. Second, a recent study by Branicki *et al.* [36] suggests that the system (3.1) can reproduce signals in various turbulent regimes such as intermittent instabilities in a turbulent energy transfer range and in a dissipative range, as well as laminar dynamics. In figure 4, we show pathwise solutions *Re*[*u*(*t*)] of the system in (3.1) without deterministic forcing (*f*(*t*)=0), unbiased , frequencies *ω*=1.78 and *ω*_{b}=1, for a non-time scale separated situation with *ϵ*=1, in three turbulent regimes considered in Branicki *et al.* [36]:

I. Turbulent transfer energy range. The dynamics of

*u*(*t*) is dominated by frequent rapid transient instabilities. The parameters are: ,*γ*_{b}=0.5,*d*_{γ}=20,*σ*_{u}=0.5,*σ*_{b}=0.5 and*σ*_{γ}=20. Here, decays faster than*u*.II. Dissipative range. The dynamics of

*u*(*t*) exhibits intermittent burst of transient instabilities followed by quiescent phases. The parameters are: ,*γ*_{b}=0.4,*d*_{γ}=0.5,*σ*_{u}=0.1,*σ*_{b}=0.4 and*σ*_{γ}=0.5. Here,*u*and have comparable decaying time scales.III. Laminar mode. Here,

*u*(*t*) has no transient instabilities (almost surely). The parameters are: ,*γ*_{b}=0.5,*d*_{γ}=0.25,*σ*_{u}=0.25,*σ*_{b}=0.5 and*σ*_{γ}=1. Here,*u*decays much faster than .

We remark that regimes I and II exist only for sufficiently large values of *ϵ*. For smaller *ϵ*, the solutions in these two regimes qualitatively look like a laminar mode. Second, for *ϵ*=1, the following inequality is satisfied:
3.2for *n*=1 in all three regimes; this is a sufficient condition for stable mean solutions [36]. For *n*=2, the condition in (3.2) is only satisfied in regimes I and III. In regime I, where decays much faster than *u*, this condition implies bounded first- and second-order statistics (see appendix D of [49]).

Next, we will find a reduced stochastic prior model corresponding to the nonlinear system in (3.1). Subsequently, we will discuss the strong error convergence rate. Finally, we will compare the filtered solutions from the proposed reduced stochastic model with the true filter with exact statistical prior solutions and various classical approximate filters.

### (a) Reduced stochastic models

The goal of this section is to derive an approximate -correction for the effective stochastic model in (3.1). As in the linear case, the dynamics is given by the averaged dynamics, where the average is taken over the unique invariant density generated by the fast dynamics of and , which is
3.3To recover stochastic effects of the fast dynamics neglected in (3.3), we use invariant manifold theory. We do not pursue, here, singular perturbation theory as it does not yield information on the temporal evolution of the statistics as already encountered in the linear case of the previous section. We will show that invariant manifold theory naturally produces order-*ϵ* correction factors that consist of both additive and multiplicative noise.

Proceeding as in §2*b*, we consider an -approximation of the invariant manifold of the fast subsystem
Inserting into the equation for the slow complex-valued variable *u* in (3.1), we obtain an approximate reduced model,
3.4Note that the reduced system (3.4) can be written as the averaged system (3.3) with added additive and multiplicative noise. This model allows for a complete analytical description of the statistics. In the electronic supplementary material, appendix C, we present explicit expressions for the first and second moments of (3.4). In figure 5, we show the numerically simulated errors of solutions of the approximate reduced models when compared with solutions of the full multi-scale system (3.1). All reduced models, the classical averaged model (3.3), the reduced stochastic model (3.4) and the reduced model (3.4) with only additive noise exhibit linear scaling with *ϵ*. Note that the absolute error is smaller though for the higher-order models (3.4). Note that the multiplicative noise does not contribute, here, much to the closeness of solutions; however, we will see below that it will be significant for filtering turbulent signals when *ϵ*=1. In the electronic supplementary material, appendix D, we present a convergence proof for solutions of the reduced model (3.4).

### (b) Data assimilation for the nonlinear model

In this section, we report numerical results from filtering the SPEKF model (3.1). We assume partial observations *v*_{m} of *u* only,
3.5at discrete time step *t*_{m} with observation time interval *Δt*=*t*_{m+1}−*t*_{m}. We choose *Δt*=0.5 in regimes I and II so that it is shorter than the decorrelation times, 0.833 and 1.81, respectively. In regime III, since the decorrelation time is much shorter, 0.12, we choose *Δt*=0.05. The observations in (3.5) are corrupted by unbiased Gaussian noise with variance *r*^{o}. We report on observation noise variances corresponding to SNR^{−1}=0.5, that is, *r*^{o} is 50 per cent of the climatological variance of *u*. We have checked that our results are robust when changing to smaller observational noise with SNR^{−1}=0.1.

We will investigate the performance of our reduced models as forecast models in the Kalman filter for strong, moderate and no time-scale separation. We recall that the intermittent regime II does only exist for sufficiently large values of . We compare four filtering schemes, one perfect and three imperfect model experiments:

I.

*The true filter*. This filtering scheme applies the classical Kalman filter formula (2.10) to update the first- and second-order non-Gaussian prior statistical solutions at each data assimilation step. These non-Gaussian prior statistics are semi-analytical statistical solutions of the nonlinear SPEKF model in (3.1) (see [2,41] for the detailed statistics).II.

*RSF*. This approach implements a one-dimensional Kalman filter on Gaussian prior statistical solutions of the averaged system (3.3).III.

*RFSA*. This method implements a one-dimensional Kalman filter on Gaussian prior statistical solutions of the reduced stochastic model in (3.4) ignoring the multiplicative noise term. Technically, this method inflates the prior covariance of the reduced filtering approach in (II) with an additive correction factor, , associated to the order- additive noise term in (3.4).IV.

*Reduced stochastic filter with combined, additive and multiplicative, inflations*(*RSFC*). This filtering scheme applies a one-dimensional Kalman filter formula to update the non-Gaussian statistical solutions (see the electronic supplementary material, appendix C) of the reduced stochastic model in (3.4). The presence of the multiplicative noise correction term yields a statistical correction on both the mean and covariance. This is what we refer as to ‘statistical inflation’ in the abstract and in §1.

In figure 6, we show the average r.m.s. error as a function of the time-scale separation parameter *ϵ* in regimes I and III. For sufficiently small values of *ϵ* all filters perform equally well, consistent with the asymptotic theory; here, the higher-order correction is negligible since *ϵ* is small. For moderate time-scale separation with *ϵ*=0.5, the two higher-order filters produce better skills than RSF which was based on classical averaging theory. RSFC, which includes multiplicative noise, performed slightly better than RSFA; here, the transient instabilities have smaller magnitude compared with those with *ϵ*=1 shown in figure 4, and the r.m.s. error suggests that RSFA has relatively high filtering skill. Here, we ignore showing regime II for smaller *ϵ* since the solutions look like a laminar mode.

In the remainder of this section, we consider the three turbulent regimes in a tough test bed without time-scale separation with *ϵ*=1. In the turbulent energy transfer regime I, decays much faster than *u*. The numerical result suggests that RSFC, including multiplicative noise, produces significantly better filtered estimates than all other filters with r.m.s. errors comparable with the true filter. Note that for this regime, the optimal inflation factor for which the smallest r.m.s. errors are obtained is unphysically large with a value of 7.1, and still produces r.m.s. errors 10 per cent larger than those of the true filter (results are not shown). This is analogous to the situation in ensemble filters found by Mitchell & Gottwald [24]. In figure 7, we show the posterior estimates for regime I for observation error corresponding to SNR^{−1}=0.5, compared with the true signal. Note that the estimates from RFSC and the true filter are nearly identical; they capture almost every intermittent instability in this turbulent energy transfer range. The other two methods, RSF and RSFA, do not capture these intermittent instabilities. In this regime, we also see an improved covariance estimate with RSFC compared with RSF and RSFA (results are not shown).

In the dissipative regime II, we consider observation error ; this choice corresponds to SNR^{−1}=0.5 of the temporal average over time *t*=[0,400], ignoring the immense intermittent burst that occurs at time interval [540,580] (figure 8). This regime is an extremely difficult test bed since the variance for the chosen parameters is unstable (i.e. *Ξ*_{2}<0), and furthermore the decaying time scales for *u* and are comparable. Nevertheless, RSFC performs extremely well for observation times not too large when compared with the growth rate of the instability, *Ξ*_{2}. RSFC exhibits an r.m.s. error of 0.70 which is close to the true filter with an r.m.s. error of 0.58 and much lower than the observation error, . The filters without multiplicative noise, RSF and RSFA, have r.m.s. errors of one magnitude higher with 14.9 and 13.3, respectively. In figure 8, we show the posterior estimates for regime II. The multiplicative, i.e. amplitude-dependent, nature of the noise clearly enables the filter to follow abrupt large amplitude excursions of the signal. This illustrates well the necessity to incorporate multiplicative noise in modelling highly non-Gaussian intermittent signals.

In the laminar regime III, all reduced filters are doing equally well with r.m.s. errors lying between the true filter and the observational noise (figure 6). The filtered estimate from RSFC is slightly less accurate than those of the true filter; we suspect that this deterioration is because *u* decays much faster than in this parameter regime, and hence the reduced stochastic systems are not dynamically consistent with the full dynamics. We should also mention that for all these three regimes with *ϵ*=1, we also simulate the classical EKF with a linear tangent model and its solutions diverge to infinity in finite time; this is due to the practical violation of the observability condition in filtering intermittent unstable dynamics in these turbulent regimes (see [2], ch. 3 in a simpler context).

## 4. Summary and discussion

In this paper, we presented a study of filtering partially observed multi-scale systems with reduced stochastic models obtained from a systematic closure on the unresolved fast processes. In particular, we considered stochastic reduced models derived by singular perturbation theory as well as invariant manifold theory. Here, we were not only showing convergence of solutions in the limit of large time-scale separation, but we also tackled the question of how the stochasticity induced by the unresolved scales can enhance the filtering skill, and how their diffusive behaviour can be translated into effective inflation. Our work suggests that by incorporating results from stochastic perturbation theory, one may guide inflation strategies, which so far have been mostly ad hoc, to more efficiently filter multi-scale systems in regimes where the asymptotic theory usually fails.

We focused, here, on the realistic case of moderate and non-existing time-scale separation in our data assimilation experiments, and showed how judicious model error, in particular additive and multiplicative noise, can enhance filter performance by providing sufficient dynamically adaptive statistical inflation. We systematically derived higher-order additive noise in a reduced stochastic model that does not only improve the pathwise estimate of solutions, but also improves the filtered estimates through a covariance inflation for the Kalman update in the linear case and in a nonlinear example that mimics a laminar mode of a turbulent signal. We also systematically derived multiplicative noise, which implies both mean and covariance corrections for the Kalman update, with a significant improvement of the filtering skill for intermittent nonlinear dynamics with sudden large amplitude excursions.

The main message of this work here is that reduced stochastic models can be viewed as a dynamically consistent way to introduce covariance inflation as well as mean correction, guiding the filtering process. As already found in Mitchell & Gottwald [24] in the context of ensemble filters, the improved skill of reduced stochastic models is not because of their ability to accurately reproduce the slow dynamics of the full system, but rather by providing additional covariance inflation. It is pertinent to mention though that skill improvement is not observed in regimes where the stochastic reduced model fails to sufficiently approximate the statistics of the full dynamics. For example, the stochastic reduced model (3.4) produces inferior skill to the classical averaged system (3.3) for large values of *ϵ* in the laminar regime in which resolved variables decay much faster than the unresolved variables. Hence, it is an important feature of the effective stochastic inflation that it is dynamically consistent with the full dynamics and state-adaptable, contrary to ad hoc inflation techniques.

Finally, we should mention that although the study in this paper demonstrates the importance of a statistical inflation in the form of additive and multiplicative noise stochastic forcings for optimal filtering, it does not tell us how to choose the parameters associated with these noises (*ϵ*,*γ*_{b},*σ*_{b},*d*_{γ},*σ*_{γ}) for real applications. This issue will be addressed in the future.

## Acknowledgements

The authors thank Andrew Stuart for insightful discussions. G.A.G. acknowledges support from the Australian Research Council. The research of J.H. was partially supported by the ONR grant no. N00014-11-1-0310 and the ONR MURI grant no. N00014-12-1-0912.

- Received February 15, 2013.
- Accepted April 15, 2013.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.