## Abstract

In this paper, we develop reduced-order models for dynamic, parameter-dependent, linear and nonlinear partial differential equations using proper orthogonal decomposition (POD). The main challenges are to accurately and efficiently approximate the POD bases for new parameter values and, in the case of nonlinear problems, to efficiently handle the nonlinear terms. We use a Bayesian nonlinear regression approach to learn the snapshots of the solutions and the nonlinearities for new parameter values. Computational efficiency is ensured by using manifold learning to perform the emulation in a low-dimensional space. The accuracy of the method is demonstrated on a linear and a nonlinear example, with comparisons with a global basis approach.

## 1. Introduction

Computational modelling is an indispensable tool for analysis, design, optimization and control. For applications that require a high number of model evaluations at different inputs (e.g. uncertainty analysis and inverse parameter estimation) the computational expense of a computer model is often prohibitive. In such cases, the original computer model can be replaced with a *surrogate model* (or *emulator*) [1]. The simplest approach to surrogate modelling consists of simplifying the mathematical model or numerical formulation, e.g. by assuming spatial homogeneity or using coarse grids.

The other two main approaches are based on: (i) supervised machine learning methods to learn the model input–output relationship (so-called data-driven models) or (ii) Galerkin projection schemes, to yield *reduced-order models* (ROMs). The projecting basis in ROMs can be obtained through balanced truncation, Krylov subspaces or proper orthogonal decomposition (POD) (for a recent survey we refer the reader to [2]). For partial differential equation (PDE) models, the Galerkin projection can be performed on the original equations (strong or a weak form) or on the spatially discretized system. The final form is an algebraic system for steady problems or an ordinary differential equation (ODE) system in time for dynamical problems.

The most widely used technique for PDE systems is POD [3–5], in which the approximating subspace is obtained from solutions (called *snapshots*) generated by the discretized full-order model (FOM) at selected time instances. Application of POD to dynamic, nonlinear parametrized PDEs presents a number of challenges: (i) constructing a basis that is valid across parameter space; (ii) dealing with high-dimensional parameter spaces; (iii) using data parsimoniously; and (iv) efficiently computing the reduced-order system matrices and reduced-order nonlinearities in the state variable during use of the surrogate, i.e. the so-called *online* phase (we may also mention the development of stable POD schemes to overcome instabilities in the original formulations).

There are several approaches to incorporating parametric dependence: (i) to use a global basis (meaning across parameter space); (ii) interpolation of local bases (meaning for particular parameter values); and (iii) interpolation of local system matrices. For linear time-invariant systems, the system matrices often take the form of affine combinations of constant matrices with parameter-dependent coefficients. In such cases, the reduced-order system is quickly and easily assembled for a new parameter value [6,7]. Affine forms can also be realized by using a Taylor series expansion [8] or an empirical interpolation strategy [9]. Global basis methods extract a single basis from multiple local snapshot matrices [6,10,11]. Obvious drawbacks are the violation of POD optimality and the growth in the size of the global matrix with the number of samples. There are, however, efficient sampling strategies for constructing global bases, such as the greedy approach of [6] or by using a local sensitivity analysis [12].

An alternative approach is interpolation of local bases or local reduced-model matrices. Lieu *et al.* [13] used the principal angles between two POD bases, pertaining to different Mach numbers, to linearly interpolate a local basis for intermediate Mach numbers in a linearized fluid-structure ROM. This method is restricted to single-parameter systems and small parameter changes. Amsallem & Farhat [14] considered local bases as members of a Grassmann manifold, the set of all subspaces (of a chosen low dimension) of the state space. The local bases are mapped to a tangent space of the Grassman manifold using a logarithmic map and Lagrange interpolation is performed in the tangent space. An inverse exponential map provides the required local bases.

Interpolation methods can also be used to approximate the reduced-order system matrices, in order to circumvent the problem of computing these matrices for each new parameter value. Degroote *et al.* [11] proposed two methods: element-wise direct spline interpolation of the reduced-order matrices or spline interpolation of the matrices in a tangent space of a Riemannian manifold on which the matrices are assumed to lie (a similar method was proposed in [15]). When a global basis is not used to build the ROM, a straightforward interpolation is not possible because the reduced-dimensional coordinates do not (in general) have the same physical meaning from one local basis to another. Thus, a congruency transformation to a common basis is required before direct interpolation [16] or interpolation in a tangent space [17].

Lieberman *et al.* [18] used a greedy algorithm to construct projections for both the state variable and the parameters simultaneously, minimizing a measure of the error between the ROM and FOM outputs at each iteration (different error measures were considered in [19]). Hay *et al.* [20] used sensitivities (derivatives) of the POD basis with respect to (w.r.t.) the parameters either to linearly extrapolate the POD basis for a new parameter value or to expand the POD basis by augmenting it with the corresponding sensitivities. The growth in the basis dimension with the number of parameters is a limitation of this approach.

The computational cost of evaluating a strong (high-order polynomial or non-polynomial) nonlinearity in the state variable in a ROM depends on the dimension of the original state space. Linearization methods [21,22] are only applicable to weak nonlinearities or confined regions of state space. Moreover, the computational cost grows exponentially with the order of the approximating expansion. Recently, a number of *hyper-reduction* methods have been developed to overcome the limitations of linearization approaches (see also the tensorial POD approach recently developed in [23]). An early method was developed by Astrid *et al.* [24], based on selecting a subset of the FOM equations corresponding to heuristically chosen spatial grid points, followed by a Galerkin projection of the resulting reduced system onto the POD basis.

The empirical interpolation method interpolates the nonlinear function at selected spatial locations using an empirically derived basis, and is applied directly to the governing PDE [7], while the *discrete empirical interpolation method* (DEIM) is applicable to general ODE or algebraic systems arising from a spatial discretization [25]. Both methods construct a subspace for the approximation of the nonlinear term and use a greedy algorithm to select interpolation points. An extension of the DEIM [26] generates several local subspaces via clustering and uses classification in the online phase to select one of the subspaces. These approaches can also be used for approximating (vectorized) non-affine system matrices [2]. The Gauss–Newton with approximated tensors method operates at the fully discrete level in space and time, and is based on satisfying consistency and discrete-optimality conditions by solving a residual-minimization problem [27]. This leads to a Petrov–Galerkin (rather than Galerkin) problem with a test basis that depends on the residual derivatives w.r.t. the state variable.

In this paper, we introduce an extension of POD for dynamic, parametrized, linear and nonlinear PDEs. The method we develop involves a computationally efficient approximation of the POD basis and the nonlinearity for new parameter values. It can be used in conjunction with many of the methods described above, e.g. greedy sampling and methods for approximating non-affine system matrices. In order to avoid inconsistencies and to reduce the loss of information in the construction of new bases, we take the approach of approximating the snapshots rather than the bases or system matrices directly. The snapshots, however, lie in high-dimensional spaces so that direct approximations are computationally unfeasible. We overcome this issue by using manifold learning techniques [28] to map the snapshots to a low-dimensional feature space. We then use Gaussian process emulation (GPE) to infer values of the mapped snapshots for new parameter values, followed by an inverse map to obtain the snapshots in physical space. For nonlinear problems, we extend DEIM by using the same emulation approach to approximate snapshots of the nonlinearity at desired locations in parameter space.

In the next section, we outline the procedures for generating ROMs and POD bases. We provide brief details of the DEIM and explain the issues associated with parametrized and/or nonlinear problems. In §3, we present the snapshot emulation strategy and summarize our approach to linear and nonlinear parametrized problems. In §4, we present one linear and one nonlinear example, comparing the results with a global basis approach.

## 2. Reduced-order models for parametrized dynamic partial differential equations using proper orthogonal decomposition

### (a) Problem definition and Galerkin projection

Let ** x**=(

*x*

_{1},…,

*x*

_{L}) denote a point in a bounded, regular domain

*L*=1,2,3), let

*t*∈[0,

*T*] denote time and let

*u*(

**,**

*x**t*;

**):**

*ξ**g*(

**;**

*x***) or the initial/boundary conditions.**

*ξ*Let ** ξ**,

*t*↦

*u*(⋅,

*t*;

**) is a Lebesgue measurable map from (0,**

*ξ**T*) to

*t*∈(0,

*T*). A spatial discretization of (2.1) leads to a system of ODEs:

**(**

*u**t*;

**)=(**

*ξ**u*

_{1}(

*t*;

**),…,**

*ξ**u*

_{d}(

*t*;

**))**

*ξ*^{T}, which we call the solution vector. Here

*d*is the number of degrees of freedom, e.g. the number of grid points in a finite-difference (FD) approximation, the number of cells in a cell-centred finite-volume (FV) approximation or the number of nodes (basis functions) in a finite-element (FE) approximation. The matrix

*g*(

**;**

*x***) and possibly the boundary conditions. The latter is nonlinear for**

*ξ*The precise relationship between ** u**(

*t*;

**) and**

*ξ**u*(

**,**

*x**t*;

**), the forms of**

*ξ***A**(

**) and**

*ξ***(**

*f***;**

*u***), and the incorporation of boundary conditions depend on the method used. For an FD approximation, problem (2.1) is solved directly and the boundary conditions are incorporated in**

*ξ***(**

*f***;**

*u***). In an FE approximation a weak form is solved with test functions in**

*ξ***and/or the definition of**

*f***A**(

**) is determined by the dependence of**

*ξ***. The simplest case is an affine form:**

*ξ**c*

_{i}(

**) are known and the matrices**

*ξ***A**

_{i}are constant.

For FD, FV and nodal-basis FE discretizations, the coefficients *u*_{i}(*t*;** ξ**) of

**(**

*u**t*;

**) correspond to the values of**

*ξ**u*(

**,**

*x**t*;

**) at locations**

*ξ**i*=1,…,

*d*, i.e.

*u*

_{i}(

*t*;

**)=**

*ξ**u*(

*x*_{i},

*t*;

**). We will assume this to be the case throughout. A numerical solution of (2.2) yields the solution vector**

*ξ*

*u*_{i}(

**):=**

*ξ***(**

*u**t*

_{i};

**) at times**

*ξ**snapshot*.

For a fixed input *j*=1,…,*r*, be an orthonormal basis for ** u** in the space span(

*v*_{1}(

**),…,**

*ξ*

*v*_{r}(

**))**

*ξ***=(**

*a**a*

_{1}(

*t*;

**),…,**

*ξ**a*

_{r}(

*t*;

**))**

*ξ*^{T}and

**V**

_{r}(

**)=[**

*ξ*

*v*_{1}(

**)…**

*ξ*

*v*_{r}(

**)]. The Galerkin projection of equations (2.2) onto the basis vectors**

*ξ*

*v*_{i}(

**),**

*ξ**i*=1,…,

*r*, yields (replacing

**with**

*u***u**)

**A**

_{r}(

**):=**

*ξ***V**

_{r}(

**)**

*ξ*^{T}

**A**(

**)**

*ξ***V**

_{r}(

**) and**

*ξ***f**

_{r}(

**(**

*a**t*;

**);**

*ξ***):=**

*ξ***V**

_{r}(

**)**

*ξ*^{T}

**(**

*f***V**

_{r}(

**)**

*ξ***(**

*a**t*;

**);**

*ξ***). Equations (2.4) represent a system of**

*ξ**r*ODEs in time for the coefficients

*a*

_{i}(

*t*;

**). The basic goal of POD (outlined below) is the construction of a basis**

*ξ*### (b) Proper orthogonal decomposition

POD is presented in a number of ways (e.g. error minimization, ‘variance’ maximization) in the literature and often under different names. In this section, we provide a brief description of the motivation and practical (discrete) implementation. A complete summary of the underlying theory, alternative approaches, the links between the various interpretations and the optimality of the chosen basis can be found in appendix A.

For a fixed *u*(** x**,

*t*;

**),**

*ξ*

*u*_{j}(

**).**

*ξ**u*(

**,**

*x**t*;

**) can be regarded as a realization of a stationary (w.r.t.**

*ξ**t*) random field indexed by (

**,**

*x**t*) [3,4,29]. Applying Karhunen–Loèéve (KL) theory [30]

*for a fixed*

*t*yields

*v*

_{i}(

**;**

*x***) form an**

*ξ**spatial autocovariance function*

*C*(

**,**

*x***′;**

*x***),**

*ξ*In practice, we must work within a finite-dimensional setting. Defining **U**(** ξ**):=[

*u*_{1}(

**)…**

*ξ*

*u*_{m}(

**)], the**

*ξ**spatial variance–covariance matrix*is given by

**C**(

**)**

*ξ*

*v*_{i}(

**)=**

*ξ**λ*

_{i}(

**)**

*ξ*

*v*_{i}(

**) for eigenvectors**

*ξ**λ*

_{i}(

**)>0,**

*ξ**i*=1,…,

*d*, arranged in decreasing order. The first

*r*of these vectors define the space

*v*_{i}(

**). In appendix A, we provide details of the**

*ξ**method of snapshots*and

*singular value decomposition*(SVD), the latter of which we use in practice.

## 3. Basis emulation and discrete empirical interpolation method extension

For each input/parameter ** ξ**, the snapshot matrix

**U**(

**) is obtained from the FOM and the basis**

*ξ***V**

_{r}(

**) is constructed according to §2b. To perform an analysis w.r.t. the inputs, this procedure is computationally prohibitive. A global basis across the parameter space of interest [10] can be constructed by computing a set of snapshot matrices**

*ξ***U**(

*ξ*_{j}) for

*j*=1,…,

*n*. The

*v*_{i}(

**) are extracted from a global snapshot matrix**

*ξ*The global basis method uses information only from the ‘truth approximation’, i.e. the FOM. The optimality of the POD method, on the other hand, is violated since the snapshots used to derive the basis do not pertain to the parameter value of interest (the particular dynamical system under consideration) during the online phase. Furthermore, the range of validity of the global basis could be limited for complex mappings between the parameters and the outputs [13]. Interpolation methods (and the method we propose) violate the truth approximation in the sense that the snapshots or quantities derived therein are not obtained from the original model. In contrast to the global basis, however, these methods attempt to construct more accurate ROMs during the online phase. The main limitation is the accuracy of the interpolation or emulation, which depends on the data available and on the method itself. Moreover, it may not be possible to obtain sharp error bounds using such methods (in cases where the underlying PDE problem is amenable to a rigorous analysis).

Another problem associated with the standard POD–Galerkin approach is that the computational efficiency is compromised when **f**_{r} in equation (2.4) has a computational complexity that depends on *d* [31]. The DEIM [25] seeks a set of vectors *i*=1,…,*d*, such that the subspace *s*≪*d* well approximates ** f**(

**(**

*u**t*;

**);**

*ξ***) for an arbitrary**

*ξ**t*. That is, an approximation

**(**

*f***(**

*u**t*;

**);**

*ξ***)≈**

*ξ***W**(

**)**

*ξ***(**

*h**t*;

**), where**

*ξ***W**(

**)=[**

*ξ*

*w*_{1}(

**)…**

*ξ*

*w*_{s}(

**)] and**

*ξ*

*f*_{i}(

**)=**

*ξ***(**

*f*

*u*_{i}(

**);**

*ξ***), from which we form the matrix**

*ξ***F**(

**)=[**

*ξ*

*f*_{1}(

**)…**

*ξ*

*f*_{m}(

**)]. A PCA on**

*ξ***F**(

**)**

*ξ***F**(

**)**

*ξ*^{T}or SVD of

**F**(

**) yields the**

*ξ**i*.

Since the system ** f**(

**(**

*u**t*;

**);**

*ξ***)=**

*ξ***W**(

**)**

*ξ***(**

*h**t*;

**) is overdetermined in**

*ξ***(**

*h**t*;

**), the DEIM selects**

*ξ**s*of the

*d*equations to obtain an ‘optimal’ solution. Let us introduce the matrix

*e*_{pi}is the standard Euclidean basis vector in

*p*

_{i}th coordinate. Assuming

**P**

^{T}

**W**(

**) is non-singular, we obtain**

*ξ***(⋅;**

*f***) acts pointwise. The indices**

*ξ**p*

_{i}∈{1,2,…,

*d*},

*i*=1,…,

*s*are specified by a greedy algorithm [25] that satisfies the following error bound (for a given

*s*):

**. This estimate is valid for a given**

*f**t*(considering

**as a function of**

*f**t*) by virtue of the second factor on the r.h.s., which is the error in the best 2-norm approximation of

**in range(**

*f***W**(

**)).**

*ξ*In this paper, we introduce a systematic and rigorous method to approximate the local basis and the nonlinearity by first approximating the snapshots ** ξ** using Bayesian nonlinear regression. These snapshots lie in very high-dimensional spaces and thus we use a recently developed method that exploits manifold learning to yield a computationally feasible Gaussian process (GP) model. Below we describe the components of this emulation method and subsequently explain how it can be used for a POD analysis of parameterized, dynamic problems.

### (a) Formulation and solution of the learning problem

For an arbitrary input ** ξ**, consider the mapping

**. We can define a similar map**

*ξ*

*y*^{f}=

*η*^{f}(

**) for snapshots of the nonlinearity**

*ξ*We aim to approximate the mapping ** η**(⋅) given

*training points*

*design points*

*j*=1,…,

*n*. One of the main methods for dealing with such high-dimensional outputs is to define approximate outputs in a

*q*-dimensional subset

*q*≪

*md*) using PCA and independently emulate the

*q*coefficients of the points in

**[32]. Shah and co-workers [33,34] extended the latter method by replacing PCA with manifold learning methods, making it applicable to a broader class of output spaces**

*ξ**q*-dimensional feature space. The coordinates

*z*

_{i}(

**) of points**

*y*

*ϕ*_{q}(

**) in**

*y**i*=1,…,

*q*. We place independent GP priors over these maps, justified by the properties of kPCA.

The approximation of *i*=1,…,*q*, given training data ** ξ** is inferred from scalar GPE (outlined in appendix C) as the mean of a posterior distribution. Given

*md*is not a limitation for the manifold learning methods employed in this paper, in which the eigenvalue problems are primarily dependent on the number of training points

*n*.

### (b) Main algorithm

Once the snapshots ** ξ**, POD can be performed in the usual manner (with the extended DEIM for nonlinear problems). The entire procedure is outlined in algorithm 1. We mention that kPCA can be replaced with other manifold learning methods, e.g. diffusion maps or Isomap [33,34]. We introduce the terminology ‘kGPE-POD’ to denote the method of algorithm 1 without the extended DEIM (i.e. steps 1a–7a alone). Similarly, we use the terminology ‘kGPE-POD-DEIM’ to denote the method of algorithm 1 with the extended DEIM (steps 1a–7a

*and*steps 1b–7b together).

## 4. Results and discussion

### (a) Two-dimensional contaminant transport

We consider the transport of a contaminant governed by a convection–diffusion equation. This model can be used, for example, for real-time prediction or for quantifying uncertainty in the concentration to support decision-making [11]. The problem is specified as follows:
*u*(** x**,

*t*;

**) denotes the contaminant concentration (mol m**

*ξ*^{−3}),

**is the fluid velocity (m s**

*q*^{−1}) and

*μ*is the contaminant diffusion coefficient (m

^{2}s

^{−1}). The input

**is defined below. The initial concentration is given by**

*ξ*

*x*_{1}=(0.2,0.2)

^{T},

*x*_{2}=(0.2,0.8)

^{T},

*x*_{3}=(0.8,0.8)

^{T},

*k*

_{0}=0.01,

*k*

_{1}=1,

*k*

_{2}=2 and

*k*

_{3}=3. The magnitude of the velocity field is inversely proportional to the distance from

*e*_{1}and

*e*_{2}are unit vectors in the

*x*

_{1}- and

*x*

_{2}-directions, respectively, and

*a*

_{1}=

*a*

_{2}=1 and

*μ*=1, and consider variations in the input

The problem was discretized in space using a cell-centred FV method with *d*=2500 square cells (control volumes). Central differencing was used for the diffusive term and a first-order upwind scheme for the convective term, defining the velocity values on a staggered grid. A fully implicit Euler method was used to solve the resulting semi-discrete linear problem with 100 equal time steps in *t*∈[0,*T*], *T*=0.2 s. A total of 500 inputs *j*=1,…,500, were generated using a Sobol sequence [36]. For each input, the FOM was solved to yield solution vectors (snapshots) *i*=1,…,100, *j*=1,…,500. The data points (vectorized snapshots) *y*_{j}=** η**(

*ξ*_{j}),

*j*=1,…,500, were obtained using equation (3.3). Referring to appendix A, we set

*n*

_{t}=300 were reserved for testing. Training points were selected from the remaining 200 data points (

*n*≤200).

A Gaussian kernel was used for kPCA. The free parameter *s*^{2} was taken to be the average square distance between observations in the original space [37]: *N*_{n}=*n* was used (i.e. all training points). For the GPE, we used a squared exponential covariance function and a zero mean function (after centring). The hyperparameters were found using a maximum-likelihood estimate (MLE) (gradient descent). Errors in the predictions of the vectorized snapshots *y*_{j} were measured using a normalized error: *y*_{j}=** η**(

*ξ*_{j}),

*j*=1,…,

*n*

_{t}, using steps 1a–6a of algorithm 1. Errors in the predictions using kGPE-POD/kGPE-POD-DEIM at

*ξ*_{j}were measured using a relative error

*ϵ*

_{r},

*u*_{i}(

*ξ*_{j}).

We first examine the normalized errors *ϵ* in the predictions of the test data points *y*_{j}=** η**(

*ξ*_{j}),

*j*=1,…,

*n*

_{t}. Using

*m*=10 of the snapshots (selecting every 10), figure 1 shows Tukey box plots of

*ϵ*for the

*n*

_{t}=300 test cases as the manifold dimension

*q*is increased, using

*n*=80 training points. Outliers are plotted individually using a ‘+’ symbol. We note that when predicting the training set in this case using

*q*=10 the maximum value of

*ϵ*was around 10

^{−11}, while the median was around 10

^{−12}. As a comparison we also include the result for Isomap (replacing kPCA in algorithm 1). The best results were obtained with kPCA, for which the errors converge after

*q*=6 dimensions (negligible further decrease). Diffusion maps were also tested and gave results similar to kPCA. The same pattern was observed at

*n*=40, 120 and 200 training points and also for all values of

*m*up to 100. Based on the results, the approximating manifold dimension was set to

*q*=10 for all values of

*n*and

*m*(using kPCA).

Figure 2 compares kGPE-POD with a global basis method for increasing POD dimension *r*. In the global basis method, the snapshot matrices constituting the global snapshot matrix corresponded to the *n*=80 training points used for kGPE-POD. An SVD was performed on the global matrix before extracting the POD basis. For *n*=40, the results were similar to the results depicted in figure 2, with a slight decrease in accuracy for both methods. Using *m*=10 snapshots, the decrease in the relative errors *ϵ*_{r} in kGPE-POD is negligible for *r*>15, while the global basis method continues to improve beyond *r*=50. In principle, kGPE-POD uses the correct bases for the test parameter values. It is possible, therefore, that kGPE-POD would approach the true result for a smaller value of *r* than the global basis approach, which uses a single basis extracted from snapshots that do not pertain to the test parameter values.

For *m*=10, kGPE-POD exhibits a minimum *ϵ*_{r} that is lower by more than an order of magnitude, while the maximum *ϵ*_{r} for both methods is roughly the same (0.04 for *r*≥15). At *r*=15 in figure 2*a*,*b*, the value of *ϵ*_{r} using kGPE-POD is lower than the minimum *ϵ*_{r} in the global basis method in 109 of the 300 test cases. For the global basis at *r*=15, there are 131 cases with an error below the median (3.9×10^{−3}), while for kGPE-POD 217 cases have errors below this value. kGPE-POD clearly exhibits a broader range of *ϵ*_{r} values, with a higher median for *r*>25. Figure 3 shows histograms of *ϵ*_{r} for the two methods in the case of *r*=15, *m*=10. The broader range of *ϵ*_{r} is due to the much lower minimum and to the presence of a greater number of cases with *ϵ*_{r}>0.012. The number of such cases (13) is, however, small. For *m*=100 snapshots, both methods improve, with the global basis method exhibiting the greater improvement (e.g. the maximum *ϵ*_{r} is decreased by around an order of magnitude, whereas for kGPE-POD the decrease is by a factor of 4 at *r*=15). The global basis method has a lower median *ϵ*_{r} for *r*≥20, but also again a considerably higher minimum (more than an order of magnitude at *r*=25). At *r*=30, for example, there are 77 cases in kGPE-POD with a lower *ϵ*_{r} than the minimum for the global basis.

To gain an indication of the actual quality of the predictions for different *ϵ*_{r}, figure 4 compares the predicted kGPE-POD concentration fields in two test cases: (i) near the median (*ϵ*_{r}≈0.0021) and (ii) near the upper whisker (*ϵ*_{r}≈0.0127) at *r*=10 in figure 2*a*. The change in the profiles from one input to the other is well captured. Figure 4*e*,*f* shows the absolute pointwise errors for the two examples. It can be seen that there are localized regions of high error. For the first case (** ξ**=(0.7382,0.4179)

^{T}), a comparison of the region of highest error (lower right quadrant) with the test is shown in figure 5, which clearly highlights the fine-scale differences leading to the error. The trends and general profile (and in most of the domain the actual concentration values) are nevertheless well captured even with a small value of

*r*.

In order to assess the generalization accuracy more fully, we considered an uncertainty quantification problem for the accumulated contaminant concentration *x*_{c}=(0.5,0.5)^{T}, by considering ** ξ** to be a random vector distributed according to

**=(0.5,0.5)**

*μ*^{T}and

*σ*

^{2}=0.1. The distribution of

*N*

_{M}samples

*ξ*^{i}(this notation is to avoid confusion with the design points) drawn from

*p*(

**). We set**

*ξ**q*=6,

*n*=80,

*N*

_{M}=3000 and approximated

*m*=10 snapshots. The FOM took 55.18 h to complete and yielded

*μ*

_{ac}=0.011087 and

*σ*

_{ac}=0.001218, obtained from

*r*=10, kGPE-POD exhibited reasonable accuracy with regards to

*μ*

_{ac}(within 0.2%) and

*σ*

_{ac}(within 8.7%), while the global basis method was inaccurate (50% error in

*σ*

_{ac}). For

*m*=10,

*r*=50, both methods were accurate, with kGPE-POD still providing better estimates of

*μ*

_{ac}and

*σ*

_{ac}. For

*m*=100, the results are shown in figure 7. kGPE-POD was again more accurate for

*r*=10, while for

*r*=30 the two methods exhibited a similar level of accuracy.

### (b) Burgers equation

We consider a one-dimensional Burgers equation, with inputs ** ξ** to be defined later:

*u*(

*x*,

*t*;

**) is the flow velocity,**

*ξ**Re*is Reynold's number and

*g*(

*x*) is a source term. We seek a weak solution

*a*(

*φ*

_{1},

*φ*

_{2}):=(

*φ*

_{1}′,

*φ*

_{2}′),

*x*. The interval

*N*+1 equally sized subintervals [

*x*

_{i},

*x*

_{i+1}], where

*x*

_{i}=(

*i*−1)/(

*N*+1),

*i*=1,…,

*d*=

*N*+2. A standard piecewise linear basis

The FE approximation *group (product) approximation* [38]: *u*=*u*^{h} and *v*^{h}=*ψ*_{j} in (4.5), we obtain the semi-discrete problem
*j*=1,…,*d*. Defining the solution vector ** u**(

*t*;

**)=(**

*ξ**u*

_{1}(

*t*;

**),…,**

*ξ**u*

_{d}(

*t*;

**))**

*ξ*^{T}, equation (4.6) and the initial condition lead to

*i*,

*j*)th elements of the mass and stiffness matrices

**M**and

**S**are given by (

*ψ*

_{i},

*ψ*

_{j}) and (

*ψ*

_{i}′,

*ψ*

_{j}′), respectively, and the

*j*th components of

*u*_{0}and

**g**are (

*u*

_{0},

*ψ*

_{j}) and (

*g*,

*ψ*

_{j}), respectively. The nonlinear vector function

**(**

*b***(**

*u**t*;

**)) arises from the second term in (4.6). We used a Runge–Kutta method with a variable time step to solve the semi-discrete problems in this example.**

*ξ*The coefficients *u*_{i,j}(** ξ**),

*j*=1,…,

*d*, of the snapshots

*u*_{i}(

**)=**

*ξ***(**

*u**t*

_{i};

**),**

*ξ**i*=1,…,

*m*, for an arbitrary value of

**are the nodal coefficients in the FE method solution, and thus correspond to functions**

*ξ**j*=1,…,

*d*, in which the nodal coefficient

*v*

_{j,i}(

**) is the**

*ξ**i*th component of the POD basis vector

*v*_{j}(

**), given by**

*ξ***M**

^{1/2}

**C**(

**)**

*ξ***M**

^{1/2}. Note that

*v*_{j}(

**) w.r.t. 〈**

*ξ*

*v*_{j}(

**),**

*ξ*

*v*_{i}(

**)〉**

*ξ*_{M}:=

*v*_{j}(

**)**

*ξ*^{T}

**M**

*v*_{i}(

**). The solution vector is then expanded as in equations (2.3):**

*ξ**a*(⋅,⋅) as the inner product and associated semi-norm |

*φ*|

_{1}=

*a*(

*φ*,

*φ*)

^{1/2}. The POD eigenvalue problem

**C**(

**)**

*ξ*^{T}

**S**

*v*_{j}(

**)=**

*ξ**λ*

*v*_{j}(

**). The POD basis vectors are then given by**

*ξ***S**

^{1/2}

**C**(

**)**

*ξ***S**

^{1/2}, and are mutually orthogonal w.r.t. 〈⋅,⋅〉

_{S}. In the present example, this approach gave almost identical results.

*Case* 1. In the first example, we set *g*(*x*)≡0 and *k*=1. The inputs were defined as *d*=64 nodes, for each *j*=1,…,500, to obtain the solution vectors ** u**(

*t*

_{i};

*ξ*_{j}) and nonlinearity

**(**

*b***(**

*u**t*

_{i};

*ξ*_{j})) at times

*t*

_{i}=0.25

*i*,

*i*=1,…,40 (

*m*=40). This yielded the data points (vectorized snapshots)

*y*_{j}=

**(**

*η*

*ξ*_{j}) and

*j*=1,…,500, according to equation (3.3). Of the 500 data points,

*n*

_{t}=300 were reserved for testing, and training points were selected from the remaining 200 points. The details of kPCA and GPE were as described in the previous example.

Analysis of the normalized errors *ϵ* for the *n*_{t} test cases with *n*=160 training points showed convergence after *q*=8 dimensions. Isomap gave similar results while Diffusion maps was inferior. We used *q*=9 (kPCA) in the results presented below. Figure 8*a* shows the results of kGPE-POD-DEIM for an increasing *r* (with *s*=*r*). The relative errors converge after *r*=30, i.e. further decreases are negligible. Figure 8*b* compares the predicted velocity profiles at *t*=0,0.5,1,1.5,2,2.5,5,7.5,10 s from kGPE-POD-DEIM and the FOM for a point (*ϵ*_{r}≈0.041) above the upper whisker at *r*=10 in figure 8*a*. The two sets of profiles are very close. The inset in figure 8*b* shows the absolute pointwise error at *t*=2.5, 5 and 10 s. Inspection of the full set of profiles showed that the error grew with time until the front developed, after which the error decayed. The highest absolute error was around 8.62×10^{−4} at *x*=0.703, *t*=5.65 s, for which *u*(*x*,*t*)≈0.103 m s^{−1}. Thus, the maximum error was around 0.84%.

With no approximation of the nonlinearity, a comparison between kGPE-POD and the global basis method exhibited trends similar to those seen in the previous example. For *m*<30 and *n*≤200, kGPE-POD required fewer POD vectors to achieve a given level of accuracy; the lower bound for *ϵ*_{r} at *r*=10 was one order of magnitude smaller for kGPE-POD. Both methods improved with increasing *m*, with the global basis method showing a greater improvement, especially in the lower bound for *ϵ*_{r}. For *m*=30 and *n*=180 the results are illustrated in figure 9, which shows that around *r*=28 both methods exhibit similar levels of accuracy in terms of the maximum, minimum and median *ϵ*_{r}.

*Case* 2. In a second case, we set *g*(*x*)=0.02*e*^{x}, *k*=3 and *c*_{2}=0.2, with inputs *n*_{t}=300 reserved for testing. In this case, we use *d*=128 nodes and after inspection of the normalized errors *ϵ* we set *q*=9. In contrast to the previous case, a large *m* (*m*>120) was required for accurate results.

Figure 10 shows the trends in the kGPE-POD-DEIM relative error *ϵ*_{r} on the *n*_{t}=300 test points with increasing *s* for two values of *r*, using *n*=180 and *m*=200. For a fixed *r*, the errors decrease with an increasing *s*. For a fixed *s*, the errors were seen to decrease as *r* was increased up to a certain value. For higher values of *r* the solutions became less stable, with a corresponding increase in the error. This was more pronounced for small values of *s*. The optimal distribution of errors (in terms of the median, quartiles and extrema) was achieved for values of *s* between 5 and 10 higher than the value of *r*. Similar results for Burgers equation can be found in [39,40].

For *r*=30 and *s*=40, figure 11*a*,*b* compares the FOM and kGPE-POD-DEIM profiles at *t*=0,0.5,1,1.5,2,2.5,5,7.5,10 s. The first of these corresponds to a point near the median of the relevant box plot in figure 10*a*, while the second corresponds to a point near the upper whisker. Figure 11*c* shows the point with the highest error using the same values of *r* and *s*. In this case, instability develops as the front forms but eventually settles. Using *r*=50 and *s*=55, the case with the highest error is shown in figure 11*d*. In figure 11*d*, we see that the solutions at early times are more stable. The observed instability is a common feature of POD models [27,41,42]. Stabilization schemes, e.g. alternative inner products, post-processing steps and modification of the underlying model [42–44], can be incorporated within the framework we have developed in order to eliminate or minimize such problems.

## 5. Conclusion

In this paper, we introduced a new POD-ROM method for dynamic, parameter-dependent linear and nonlinear PDEs. The method uses a Bayesian nonlinear regression to infer the basis for new parameter values and is thus potentially applicable to a broader window of parameter space than existing methods. Compared with a global POD method, our method requires extra computational effort on each run to diagonalize the snapshot matrix. The actual influence of this is small (as the uncertainty quantification in the first example demonstrates) since most of the computational time is spent on solving the ROM. In the examples considered (and others not presented) the global basis method requires a high value of *r* to reach the same level of performance (in terms of the minimum and maximum relative errors) as our method, particularly for low values of *m*. At these high values of *r*, much of the benefits of the global basis method as a surrogate model would be eliminated.

Since the method introduced here is a general framework, a number of modifications could easily be made, e.g. changing the manifold learning or machine learning method, and incorporating stabilization schemes, according to different types of problems. The manifold-learning-based GP emulator could be treated as a general data-driven technique to interpolate properties other than the snapshots, as has been accomplished with the method of Amsallem & Farhat [14]. For instance, we could employ it to directly learn the POD basis **V**_{r}(** ξ**) in equation (2.3) or the system matrix

**A**

_{r}(

**) in equation (2.4), both of which would further reduce the computational time.**

*ξ*## Authors' contributions

The idea was conceived by A.A.S., who coordinated the study. The implementation and coding was carried out primarily by W.W.X. and V.T. The manuscript was written by A.A.S. All authors gave final approval for publication.

## Competing interests

We have no competing interests.

## Funding

W.W.X. received funding from the Chinese Scholarship Council. A.A.S. received funding from the Engineering and Physical Sciences Research Council, UK (grant nos. EP/L027682/1, EP/P012620/1 and EP/L018330/1).

## Appendix A. Variants of POD

We regard *u*(** x**,

*t*;

**) as a realization of a zero-mean random field indexed by (**

*ξ***,**

*x**t*) [3,4,29], with an underlying probability space

*Ω*is the sample space,

*u*(

**,**

*x**t*;

**) is continuous in quadratic mean (q.m.) w.r.t.**

*ξ***(assumption (i)) and stationary w.r.t.**

*x**t*(assumption (ii)). The spatial autocovariance function then takes the form

*t*∈[0,

*T*],

*u*(

**,**

*x**t*;

**) defines a**

*ξ**one-parameter*random field indexed by

*ω*∈

*Ω*) are deterministic functions

*t*∈[0,

*T*], so that

*t*shows that

*u*(

**,**

*x**t*;

**) is the q.m. limit of the sequence of partial sums**

*ξ**t*. The

*v*

_{i}(

**;**

*x***) form an**

*ξ**λ*

_{i}′(

**)>**

*ξ**λ*

_{i+1}′(

**)**

*ξ**t*is arbitrary, they can be interpreted as uncorrelated random processes. The expectation operator

The ‘optimality’ of the basis

Defining quadrature points **C**(** ξ**)

*v*_{i}(

**)=**

*ξ**λ*

_{i}(

**)**

*ξ*

*v*_{i}(

**) for eigenvectors**

*ξ**λ*

_{i}(

**),**

*ξ**i*=1,…,

*d*. This is a PCA with a spatial variance–covariance matrix

**U**(

**):=[**

*ξ*

*u*_{1}(

**)…**

*ξ*

*u*_{m}(

**)]. The**

*ξ**j*th component

*v*

_{i,j}(

**) of**

*ξ*

*v*_{i}(

**) can be identified with**

*ξ**v*

_{i}(

*x*_{j};

**) and the (**

*ξ**i*,

*j*)th entry of

**C**(

**) approximates**

*ξ**C*(

*x*_{i},

*x*_{j};

**) as the time average**

*ξ**u*(

**,**

*x**t*;

**) and**

*ξ**v*

_{i}(

**;**

*x***) using a standard piecewise linear basis**

*ξ***C**(

**)**

*ξ***M**

*v*_{i}(

**)=**

*ξ**λ*

_{i}(

**)**

*ξ*

*v*_{i}(

**), where**

*ξ***M**is a mass matrix with entries

*M*

_{ij}=(

*ψ*

_{i}(

**),**

*x**ψ*

_{j}(

**)). Defining**

*x***M**

^{1/2}

**C**(

**)**

*ξ***M**

^{1/2}yield the POD basis vectors

The *method of snapshots* is used when *m*≪*d*. The temporal autocovariance function *a*_{i}(*t*;** ξ**) of

**U**(

**)**

*ξ*^{T}

**U**(

**)**

*ξ*

*a*_{i}(

**)=**

*ξ**λ*

_{i}

*a*_{i}(

**), where**

*ξ***K**(

**):=**

*ξ***U**(

**)**

*ξ*^{T}

**U**(

**) is a kernel matrix with entries**

*ξ**K*

_{ij}=

*u*_{i}(

**)**

*ξ*^{T}

*u*_{j}(

**), i.e. the space-discrete form of**

*ξ**K*(

*t*

_{i},

*t*

_{j};

**). The eigendecomposition is**

*ξ***K**(

**)=**

*ξ***A**(

**)**

*ξ***(**

*Λ***)**

*ξ***A**(

**)**

*ξ*^{T}, where

**(**

*Λ***)=diag(**

*ξ**λ*

_{1}(

**),…,**

*ξ**λ*

_{m}(

**)) and the columns of**

*ξ***A**(

**) are given by the**

*ξ*

*a*_{i}(

**). The**

*ξ**j*th component

*a*

_{i,j}(

**) of**

*ξ*

*a*_{i}(

**) approximates**

*ξ**a*

_{i}(

*t*

_{j};

**), yielding the discrete-time approximation**

*ξ***U**(

**), that is,**

*ξ***U**(

**)=**

*ξ***A**′(

**)**

*ξ***(**

*Λ***)**

*ξ*^{1/2}

**V**(

**)**

*ξ*^{T}, where the columns of

**V**(

**) are given by the**

*ξ*

*v*_{i}(

**) and the columns of**

*ξ***A**′(

**) are given by the**

*ξ*

*a*_{i}′(

**). In this context, the columns of**

*ξ***A**′(

**) and**

*ξ***V**(

**), given, respectively, by the eigenvectors of**

*ξ***K**(

**) and**

*ξ***C**(

**), are referred to as the left and right singular vectors. It is straightforward to show that**

*ξ*

*v*_{i}(

**)=**

*ξ**k*

**U**(

**)**

*ξ*

*a*_{i}′(

**) for**

*ξ*

*v*_{i}(

**).**

*ξ*## Appendix B. Kernel principal component analysis and an inverse mapping

**(a) Kernel principal component analysis on the training data**

Given training data *i*=1,…,*n*, kPCA [35] defines a map ** ϕ**(⋅) is specified implicitly via a kernel function

*k*(

*y*_{i},

*y*_{j})=

**(**

*ϕ*

*y*_{i})

^{T}

**(**

*ϕ*

*y*_{j}), e.g. the Gaussian kernel

*s*is a scale factor. Defining a kernel matrix

**H**=

**I**−

*n*

^{−1}

**1**

**1**

^{T}and

We assume that ** w** of

*α*_{i}=(

*α*

_{1i},…,

*α*

_{ni})

^{T},

*i*=1,…,

*n*. The common eigenvalues

*β*

_{i}of

*k*_{j}=(

*K*

_{1j},…,

*K*

_{nj})

^{T}and

*q*<

*n*is the approximate dimension of the manifold on which the points reside.

The variance in the data along *β*_{i}, which decreases as *i* increases, and the coefficients *z*_{i}(⋅) are mutually uncorrelated [45]. A point *y*_{j} is approximated by the projection of its image *i*=1,…,*n*.

**(b) Inverse map**

To define an approximate inverse map *N*_{n}≤*n* ‘neighbouring’ points of ** y** taken from the dataset:

*ρ*(

*y*_{j}), where

*d*

_{j,*},

*j*=1,…,

*n*, between

**and**

*y*

*y*_{j}, and use an isotropic kernel density

*χ*(

**,**

*y***′)=**

*y**χ*(∥

**−**

*y***′∥) to weight the samples [34]:**

*y**Φ*=[

**(**

*ϕ*

*y*_{1}),…,

**(**

*ϕ*

*y*_{n})], we have

**(**

*ϕ*

*y*_{j}) and

**(**

*ϕ***) in**

*y***(**

*ϕ***)≈**

*y*

*ϕ*_{q}(

**) and substituting equation (B 2) into equation (B 3) yields**

*y**Φ*

^{T}

*Φ*=

**K**and

*k*_{j}=

*Φ*

^{T}

**(**

*ϕ*

*y*_{j}). For an isotropic kernel normalized such that

*k*(

**′,**

*y***′)=1, equation (B 3) gives**

*y**k*(

*y*_{j},

**). For the Gaussian kernel, therefore, we obtain**

*y*## Appendix C. Gaussian process emulation

We place independent GP priors over the coefficients *i*=1,…,*q*. For any value of *i*, let *j*=1,…,*n*, and we define ** t**:=(

*η*(

*ξ*_{1}),…,

*η*(

*ξ*_{n}))

^{T}. A univariate GP prior indexed by

*η*, namely,

*m*(⋅) and

*m*(

**)≡0 by centring the data**

*ξ***is a vector of hyperparameters that appear in the covariance function and typically have to be inferred from the data. We use a square exponential covariance function**

*θ**θ*

_{1},…,

*θ*

_{l}are the inverse square correlation lengths. The conditional predictive distribution is obtained from

*p*(

*η*(

**),**

*ξ***|**

*t***) in the form**

*θ***, which is given by**

*θ*- Received October 30, 2016.
- Accepted March 29, 2017.

- © 2017 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.