## Abstract

We propose a new framework for design under uncertainty based on stochastic computer simulations and multi-level recursive co-kriging. The proposed methodology simultaneously takes into account multi-fidelity in models, such as direct numerical simulations versus empirical formulae, as well as multi-fidelity in the probability space (e.g. sparse grids versus tensor product multi-element probabilistic collocation). We are able to construct response surfaces of complex dynamical systems by blending multiple information sources via auto-regressive stochastic modelling. A computationally efficient machine learning framework is developed based on multi-level recursive co-kriging with sparse precision matrices of Gaussian–Markov random fields. The effectiveness of the new algorithms is demonstrated in numerical examples involving a prototype problem in *risk-averse* design, regression of random functions, as well as uncertainty quantification in fluid mechanics involving the evolution of a Burgers equation from a random initial state, and random laminar wakes behind circular cylinders.

## 1. Introduction

Progress in perceptibly diverse areas of science, such as numerical analysis and scientific computing, design optimization, uncertainty quantification and statistical learning, has started to carve an emerging trend in engineering design, in which decision-making becomes increasingly more data-driven rather than merely relying on empirical formulae and expert opinion. A set of versatile tools, ranging from experiments to stochastic simulations can offer invaluable input towards not only performance optimization but also risk assessment, cost effectiveness and operational aspects of a design. A smart management of these powerful resources is expected to play a vital role, especially in the case of novel, high-performance unconventional designs where simple empirical extrapolation can provide no guarantee of optimality. However, employing ensembles of high-fidelity experiments or computer simulations for performing data-driven design optimization for realistic cases may easily become a task of prohibitive cost. A viable solution to this problem is offered by considering *surrogate modelling*.

A surrogate model can be simply thought of as an intermediate agent that absorbs information coming from a realization of what we consider to be as the true response of a system (up to measurement error), and is able to perform inexpensive predictions as to what the true response might look like in regions where no *a priori* information is available. Modern statistical learning techniques have opened the path for constructing sophisticated surrogates that are able to accurately capture intrinsic features of the response of complex engineering systems [1–3]. At the heart of surrogate-based modelling, we find classical statistical methods, such as Gaussian process regression, members of which are the so-called *kriging* and *co-kriging* predictors [4,2].

Initially introduced in geostatistics by the pioneering work of Matheron [5] and Krige [6], kriging has been a flexible and reliable tool in spatial statistics and has extensively served a widespread spectrum of applications, ranging from classical data assimilation in environmental sciences to modern machine learning techniques in robotics. However, it was the seminal work of Sacks *et al.* [7] that introduced kriging in the context of design and analysis of computer experiments, and consequently led to a rapidly growing field of application in which kriging has been used as a predictive tool for constructing response surfaces of engineering systems by exploring the spatial correlation between the output of deterministic computer codes. Simpson *et al.* [8] provide a comprehensive summary of the evolution and recent advancements of this metamodelling approach in the context of multidisciplinary design optimization.

Of particular importance to our study is the work of Kennedy & O'Hagan [9] that introduced the use of first-order auto-regressive stochastic models for predicting the output of a complex computer code when fast approximations are available. We recognize that this approach establishes a coherent mathematical framework for blending heterogeneous variable-fidelity information sources, creating a natural setting for multi-fidelity modelling. Forrester *et al.* [10] adopted this methodology to build a two-level co-kriging scheme and successfully applied it for optimizing the drag/dynamic pressure ratio of a transonic aircraft wing.

Although the auto-regressive scheme of Kennedy & O'Hagan [9] has served well many disciplines and applications, it suffers from the same limitations as any kriging/co-kriging method. The biggest practical limitation stems from the need to repeatedly invert large, dense, ill-conditioned covariance matrices at the machine learning stage, where the metamodel is trained on a set of known observations. To address this issue, we have aligned our work with some new interesting findings that can significantly enhance the computational efficiency of kriging/co-kriging. First, the treatise of Le Gratiet & Garnier [11] suggests that co-kriging schemes arising from the Kennedy & O'Hagan model [9] with *s*-levels of variable-fidelity information sources can be effectively decoupled, and equivalently formulated in a recursive fashion as *s*-independent kriging problems. This allows for the construction of predictive co-kriging schemes by solving a sequence of simpler kriging problems that involve covariance matrices of smaller dimensions (and potentially lower condition numbers), which can be learned from the data by performing optimization in lower dimensional spaces (compared with the coupled setting of Kennedy & O'Hagan [9]). Nevertheless, the machine learning problem for fitting a kriging metamodel to the data at each recursive step still requires the factorization of a dense covariance matrix. To this end, we note that the popular choice of Cholesky decomposition scales with the number of *n* observed data points as *n*. An efficient alternative approach for constructing and learning covariance kernels for spatial statistics was recently put forth by Lindgren *et al.* [12]. Their idea uses Hilbert space approximation theory to construct sparse representations of the inverse covariance of a Gaussian–Markov random field (GMRF) that solves a certain stochastic partial differential equation (SPDE). As a consequence, the kriging/co-kriging formulation can be recast in a form that exploits the sparsity in the inverse covariance, leading to significant computational savings at the model learning stage, which now scales as fast as

The aim of this work is to establish a new mathematical framework for surrogate-based building of response surfaces that simultaneously takes into account multi-fidelity in models (e.g. high-fidelity direct numerical simulations versus low-fidelity empirical formulae) as well as multi-fidelity in probability space (e.g. high-fidelity tensor product multi-element probabilistic collocation versus low-fidelity sparse grid quadratures). We incorporate elements of statistical learning in an auto-regressive co-kriging methodology to cross-correlate ensembles of multi-fidelity surrogates from which we can accurately and efficiently extract the response of complex nonlinear stochastic dynamical systems. This framework targets a seamless integration of surrogate-based optimization and uncertainty quantification, providing a launch pad for contemporary engineering design under uncertainty.

This paper is structured as follows. In §2, we outline the basic mathematical concepts upon which the proposed framework is built. In particular, §2a describes the setting of multi-fidelity modelling and introduces the core concept of Gaussian process regression in the form of kriging, co-kriging and multi-level recursive co-kriging. We then define a general auto-regressive setting under which multi-fidelity in models and multi-fidelity in probability space can be addressed in unison. Subsequently, §2c presents an overview of the use of GMRFs through the SPDE approach to construct sparse approximations to the inverse covariance in order to boost the efficiency of machine learning algorithms at the model fitting stage. The capabilities of the proposed methodology are demonstrated through four benchmark problems. First, we consider a prototype problem in risk-averse design, in which we seek the optimal dimensions of a structural column with uncertain material properties that undergoes random bending and axial excitation. Second, we present results on the regression of a random function using an ensemble of multi-fidelity surrogates. Third, we study a prototype problem in multi-fidelity uncertainty quantification involving the evolution of viscous Burgers dynamics from a random initial state. Finally, we conclude with presenting an uncertainty quantification problem in fluid dynamics that involves stochastic incompressible flow past a circular cylinder.

## 2. Methods

We provide a brief overview of the methods we employ for constructing response surfaces by blending information sources of variable fidelity. The core concepts of our methodology lie within the broad topics of Gaussian process regression and machine learning. To this end, we begin our presentation with the univariate regression case, illustrating how spatially correlated variables can be estimated using ordinary kriging. The idea is then extended to the multivariate case, introducing the formulation of ordinary co-kriging. In particular, we focus our attention on the auto-regressive model first put forth by Kennedy & O'Hagan [9] in the context of predicting the output of a complex expensive high-fidelity code by using fast lower fidelity approximations. We show how this model can be generalized to construct a unified framework that can simultaneously address multi-fidelity in models as well as multi-fidelity in probability space. Finally, we conclude our presentation by introducing two concepts that enhance the feasibility and efficiency of computations. The first part relies on the recent contributions of Le Gratiet & Garnier [11], who proved that the *s*-level auto-regressive model of Kennedy & O'Hagan [9] can be equivalently formulated as *s*-independent kriging problems. This could potentially lead to substantial computational savings for cases where one may encounter many information sources of variable fidelity. The second part is based on the findings of Lindgren *et al.* [12], who employed Galerkin projection techniques to construct covariance models of GMRFs. This enables fast computations that leverage on the sparsity of the resulting discrete operators, thus suggesting an efficient algorithmic framework for handling big data at the model inference stage.

### (a) Multi-fidelity modelling

Consider a general system whose response can be evaluated using different models. The response typically depends on a set of design parameters *f*(**Y**(**x**;*ξ*)), on the input design variables **x** and uncertainties *ξ* (figure 1). Design criteria come in different flavours and generally reflect our priorities and objectives in identifying configurations of the design variables **x** that qualify as *optimal*. For example, one may pursue to maximize the performance of a system for a wide range of operating conditions [13], identify a design with a probability density function (PDF) that matches a target performance [14], or minimize the *risk* associated with undesirable realizations [15] (table 1). All concepts introduced in the subsequent sections aim to address single-objective design optimization problems, where the quantity of interest is the output of a scalar function of the inputs. This can serve as a basis for extending the proposed methodology to multi-objective problems, where the quantity of interest could be a multi-dimensional vector, or even a continuous field.

The process usually referred to as multi-fidelity modelling, elaborates on efficiently constructing response surfaces by correlating surrogate models of different fidelity. On one side of the fidelity spectrum, one may have cheaper surrogate models, which are fast to compute but less trustworthy (e.g. potential flow solvers, empirical formulae, etc.), while on the other side we have high-fidelity models that enjoy our outmost trust but can be very expensive to compute (e.g. direct numerical simulations, experiments, etc.). In many cases, exploring the correlation between the two allows us to efficiently construct an accurate representation of the response surface by performing relatively few evaluations of an expensive high-fidelity model and more evaluations of a cheaper surrogate. A linking mechanism that undertakes the task of information fusion naturally arises in the context of multivariate Gaussian regression. Next, we introduce this concept in the context of ordinary kriging and co-kriging.

#### (i) Kriging

Here we present a brief overview of the kriging predictive scheme. The reader is referred to [1,2,4,7] for a detailed exposition of the subject. The main idea here is to model scattered observations of a field *Y* (**x**) as a realization of a Gaussian random field *Z*(**x**). These observations could be corrupted by model error or measurement noise that is described by a Gaussian process *Z*(**x**),*Y* (**x**))^{T} has a multivariate normal distribution
*μ*=*E*[*Z*] is the mean value of *Z*, *Σ* is the covariance of *Z*, *A* is a simple matrix that restricts the random field *Z*(**x**) to the locations of observed data *y*. Additional assumptions on *μ*(**x**) may distinguish a linear spatial prediction model into *simple* Kriging (if *μ* is assumed to be known), *ordinary* Kriging (if *μ* is unknown but independent of **x**), and *universal* Kriging (if *μ* is unknown and is represented as a linear combination of deterministic basis functions) [4].

Starting from the conditional distribution of *Z*|*Y* , we can derive equations for the expectation *μ*_{Z|Y}(**x**), and covariance *Σ*_{Z|Y}(**x**) of the ordinary kriging predictor as
*Σ* and *learned* from the data *y* by maximizing the posterior

The noise process *epistemic* uncertainties that lead to differences between our observables and the real response of the system because of inaccurate or uncertain modelling assumptions. In the experimental setting this may correspond to noise or bias in the measurements, whereas in the numerical setting it may account for round-off or truncation errors of the numerical scheme. In general, modelling the nature of *aleatoric* uncertainties that arise from randomness in the input parameters of the system that will be taken into account by the random vector *ξ* (figure 1 and §2b).

#### (ii) Co-kriging

The idea of kriging is naturally extended to the multivariate setting, in which the observation model for *Y* (**x**) depends on more than one covariates. Here, we present a brief overview of the approach of Kennedy & O'Hagan [9], which introduced a co-kriging model based on a first-order auto-regressive relation between model output of different levels of fidelity.

Suppose we have *s*-levels of variable-fidelity model output *y*_{s}(**x**) denotes the output of the most accurate and expensive model, while *y*_{1}(**x**) is the output of the cheapest least accurate surrogate at our disposal. The auto-regressive co-kriging scheme of Kennedy & O'Hagan [9] reads as
*δ*_{t}(**x**) is a Gaussian field independent of {*Z*_{t−1}(**x**),…,*Z*_{1}(**x**)}, and distributed with expectation *μ*_{δt} and covariance *Σ*_{t}, i.e. *ρ*(**x**) is a scaling factor that quantifies the correlation between the model outputs (*y*_{t}(**x**),*y*_{t−1}(**x**)). In a Bayesian setting, *ρ*(**x**) is treated as a random field with an assigned prior distribution that is later fitted to the data through inference. Here, to simplify the presentation we have assume that *ρ* is a deterministic scalar, independent of **x**, and learned from the data through maximum-likelihood estimation (see §2d).

The derivation of this model is based on the Markov property
*Z*_{t−1}(**x**), we can learn nothing more about *Z*_{t}(**x**) from any other model output *Z*_{t−1}(**x'**), for **x**≠**x'**.

The resulting posterior distribution at the *t*th co-kriging level has a mean and covariance given by
*μ*_{t} is the mean value of *Z*_{t}(**x**), *Σ*_{t} is a covariance matrix comprising *t* blocks, representing all cross-correlations between levels {*t*,*t*−1,…,1} and *t*. Also, *σ*_{t} is the variance of the noise at level *t*, and *A*_{t} is a simple matrix that restricts a realization of the Gaussian field *Z*_{t}(**x**) to the locations of observed data *y*_{t}(**x**) at level *t*. Similar to ordinary kriging, the set of unknown parameters {*μ*_{t},*ρ*_{t−1},…,*ρ*_{1},*θ*_{t},…,*θ*_{1},*σ*_{t},…,*σ*_{1}} are determined from the data using machine learning techniques. To simplify the presentation, we have assumed that the scaling factor *ρ* attains scalar values, hence it does not depend on **x**. In general, we introduce a spatial dependence by using a representation in terms of a basis *ϕ*(*x*), i.e. *M* unknown modal amplitudes *w* can be determined through optimization. For cases that exhibit a smooth distribution of the cross-correlation scaling, only a few number of modal components should suffice for capturing the spatial variability in *ρ*(*x*), thus keeping the computational cost of optimization to tractable levels.

Although this co-kriging framework provides an elegant way of blending variable-fidelity model output, it may easily become computationally infeasible if many levels of fidelity and/or a large number of data observations are available. The computational barrier is imposed by the need to repeatedly invert the full covariance matrix *Σ*_{t} at the model fitting stage, with the number of different fidelity levels resulting to an increasing matrix size, while large observation datasets introduce ill-conditioning. These are well-known limiting factors for kriging but they become even more evident for co-kriging models where the total number of observations is the sum of the observations at all fidelity levels. However, recent findings of Le Gratiet & Garnier [11] provide an equivalent formulation that overcomes the computational complexity issues of the Kennedy and O'Hagan model by decoupling the aforementioned *s*-level co-kriging scheme into *s*-independent kriging problems. The resulting recursive scheme is proved to provide a predictive mean and variance that is identical to the coupled Kennedy and O'Hagan model, although it could potentially lead to a drastic reduction of the size and condition number of the covariance matrices that need to be inverted.

The key idea behind the derivation of Le Gratiet & Garnier [11] is to replace the Gaussian process *Z*_{t−1}(**x**) in equation (2.5) with a process *y*_{t−1},*y*_{t−2},…,*y*_{1}} up to level (*t*−1), while assuming that the corresponding experimental design sets *D*_{1}⊆*D*_{2}⊆⋯⊆*D*_{t−1}. Then, the problem at the *t*th-level reduces to an ordinary kriging predictor with mean and covariance given by

To underline the potential benefits of this approach, we note that the matrix *Σ*_{t} in the Kennedy and O'Hagan model (see equation (2.7)) has size *n*_{t} is the number of observations at the *t*th fidelity level. On the other hand, the recursive co-kriging approach involves the inversion of *s* covariance matrices *n*_{t}×*n*_{t}, where *n*_{t} is the number of observations *y*_{t}(**x**) at level *t* [11]. Moreover, we note that at each recursive level, the number of unknown parameters to be learned from the data reduce to {*μ*_{t},*ρ*_{t−1},*θ*_{t},*σ*_{t}} compared with the large parametric set of {*μ*_{t},*ρ*_{t−1},…,*ρ*_{1},*θ*_{t},…,*θ*_{1},*σ*_{t},…,*σ*_{1}} of the coupled Kennedy and O'Hagan scheme.

### (b) Multi-fidelity in models and in probability space

We can build further upon the presented co-kriging framework to formulate a general methodology that can simultaneously address multi-fidelity in physical models as well as multi-fidelity in probability space. As it is often the case in realistic design scenarios, the output of a system may well be sensitive to a set of inputs *ξ* that exhibit random variability. Consequently, decision-making towards identifying an optimal design is typically informed by exploring the measures of uncertainty that describe the response of the underlying stochastic dynamical system. This response is often characterized by non-Gaussian statistics that can be estimated numerically by using appropriate sampling and integration techniques. The potential non-Gaussianity in the system response should not be confused with the Gaussian nature of the kriging/co-kriging predictors. The former is an inherent property of the dynamical system that generates the observed data, while the latter introduces a modelling framework for information fusion.

Similar to having multi-fidelity in models, methods of different fidelity can also be incorporated in probability space to provide an accurate quantification of uncertainty introduced by random input. This structure is schematically illustrated in figure 2, where *m* models of variable fidelity in physical space are driven by random input, hence producing a random response surface **Y**_{m}(**x**;*ξ*). In return, any uncertainty quantification measure of **Y**_{m}(**x**;*ξ*), such as for e.g. the expectation *p* variable-fidelity methods (figure 2). In our framework, the challenge of high stochastic dimensionality is reflected by the accuracy and efficiency of the approximation methods employed to quantify uncertainty in the system. Essentially, this is a high-dimensional integration problem arising in the computation of quantities such as expectation, variance, risk measures, etc. To this end, the practical scalability of our approach can directly benefit from established techniques for high-dimensional integration, such as MC integration [17] or ANOVA decomposition methods [18] (figure 3).

This construction results in a family of response surfaces that can be organized hierarchically in a *p*×*m* matrix, where physical model fidelity is increased along the columns and probability space model fidelity increases along the rows (figure 4). Then, it is meaningful to allow information fusion along the *k*-index increases with the fidelity of the estimator of *l*-index increases with model fidelity in physical space. Note that the recursive co-kriging scheme allows for sequentially decoupling the inference process to a number of independent steps. The last of these steps should always involve the quantity of interest *k*<*p*,*l*<*m*) by conditioning on observations of a lower level. Essentially, equation (2.9) provides an example of an intermediate step in which the predictor of level (*k*+1,*l*) is enhanced with any correlated information contained in level (*k*,*l*).

This structure gives rise to the very interesting task of identifying an optimal path traversal between different models for building an accurate representation of the target response surface. This is an open question that we plan to address in a future study. A possible way of attacking this problem is through stochastic dynamic programming techniques for guiding an optimal allocation of available computational resources [23]. Alternatively, one could employ random graph theory to identify optimal information/entropy diffusion paths, where each graph node is weighted by the fidelity and corresponding cost of each model, while edge weights represent the degree of correlation between different models.

Here, we underline that the auto-regressive structure essentially explores spatial cross-correlations between two different fidelity levels, with the target goal being the construction of an accurate estimate for the highest fidelity level. This implies that the user should be able to provide a consistent labelling of different fidelity levels, suggesting that information fusion is considered to be meaningful only when moving from a lower to higher fidelity. If the lower fidelity levels cannot provide information that is spatially correlated to the highest fidelity observations, the co-kriging predictor is likely to return an estimate comparable with a standard kriging scheme through the highest fidelity points. However, the proposed auto-regressive framework provides a mechanism for detecting this pathology, and that is done by monitoring the scaling factor *ρ* appearing in the auto-regressive scheme (which is learned from the data). Values of *ρ* away from 1 provide an indication of weak correlation between two levels, thus suggesting revision of the available models.

### (c) Gaussian–Markov random fields and the stochastic partial differential equation approach

Kriging and co-kriging methods provide predictive schemes that are constructed by exploring spatial correlations between variables. A key part of this process is fitting a parametric covariance model to the observed data using machine learning and optimization techniques. The main cost of this procedure is the factorization of dense and often ill-conditioned covariance matrices for estimating the likelihood functions that guide the machine learning algorithm. Here, we provide a brief overview of a method that can effectively reduce the cost of inferring covariance models from input data, leading to efficient kriging and co-kriging predictive schemes.

A popular choice of a kernel characterizing the covariance of a random field *u***(x)** stems from the Matérn family [4,2],
*ν* determines the mean-square differentiability of *u***(x)**, *κ* is a scaling parameter related to the correlation length of *u***(x)**, denoted by *ρ*, and defined as *σ*^{2} is the marginal variance of the process, while *Γ*(⋅) and *K*_{ν}(⋅) are the Euler gamma and modified Bessel function of the second kind, respectively. We note that for *ν*=0.5, the Matérn covariance simply reduces to the exponential covariance kernel, while when

A powerful result by Whittle [24] shows that a random field *u***(x)** with a Matérn covariance is a solution to the fractional SPDE
*τ* is a scaling parameter. Admissible solutions to equation (2.11) are referred to as *Matérn fields* and are proved to be the only *stationary* solutions to this SPDE [24].

A key idea put forth by Lindgren & Rue in [12] is to employ Hilbert space approximation theory to approximate the solution to equation (2.11) using a projection onto a finite dimensional space spanned by basis functions *ϕ*_{1}(**x**),…,*ϕ*_{N}(**x**) as
*w*_{k}} are a set of unknown Galerkin expansion coefficients.

A critical observation of Lindgren & Rue in [12], citing the work of Rozanov [25], states that a random field has a Markov property if and only if the reciprocal of its spectrum is a polynomial. Thereby, the Matérn fields generated by equation (2.11), inherit the Markov property when *α* assumes integer values. Exploiting this Markov property, Lindgren & Rue [12] were able to use the finite dimensional approximation of *u*(**x**) given in equation (2.12) to construct a GMRF with precision matrix *Q* that approximates the Gaussian field *u*(**x**) in the sense that *Q*^{−1} is close to the covariance of *u*(**x**), denoted by *Σ* [12]. The main advantage here is that the Markov property, along with the choice of a suitable basis, results in a sparse precision matrix *Q* (figure 5), and, thus, enables the use of fast numerical methods for sparse matrices. This allows us to recast the kriging/recursive co-kriging formulae (equation (2.3), (2.8)) in terms of sparse GMRF precision matrices obtained from a finite-element discretization of the SPDE (equation (2.11)) in the design space. To this end, the fully discrete form of the SPDE reads as
*H*=*κ*^{2}*M*+*S* is the discrete Helmholtz operator, *M* and *S* are the mass and stiffness matrices, *w* is a vector containing the unknown Galerkin coefficients of the expansion in equation (2.12), while *g* is the projection of the right-hand side Gaussian white noise forcing term. Then, according to Lindgren & Rue [12], the sparse GMRF precision matrix of the finite-element solution to equation (2.11) can be obtained recursively for integer values of *α* as

In what follows, we assume that *α*=2. As the design space for all cases considered in this study is two-dimensional, this corresponds to assuming that the predicted response surfaces have continuous derivatives up to second order in the mean-square sense. Here, we also note that deriving proper stochastic boundary conditions for closing the SPDE of equation (2.11) is an active area of research. In practice, equation (2.11) is typically solved using zero Neumann boundary conditions, although this results in numerical boundary layer errors on the covariance of the solution that decay to zero at a distance equal to the correlation length of the random field *u*(**x**) away from the boundary [12]. This drawback can be alleviated by smoothly extruding the domain boundaries away of the region of interest, and avoiding spatial discretizations with sharp corners.

The main advantage of using GMRFs through the SPDE approach is that the model inference and kriging prediction stages can be completed simultaneously with *m* is the sum of both the number of observed data points as well as the points where kriging/co-kriging predictions are sought. This can be achieved by employing a Cholesky or pivoted Cholesky decomposition for sparse matrices, the efficiency of which depends on the bandwidth of the precision matrix. In order to further exploit this property, we can re-order the degrees of freedom in *Q* using a bandwidth reduction procedure, such as the Cuthill–McKee algorithm [26] (figure 5*b*).

Despite the fact that GMRF-based inference schemes are not a necessary ingredient of the proposed mixed multi-fidelity methodology, they can provide an effective way of handling big data in two and three dimensions, thus circumventing one of the main limitations of the kriging/co-kriging methods, i.e the general

### (d) Sampling strategies and workflow

In this work, we have employed standard infill strategies, mainly relying on uniform random sampling or Latin Hypercube sampling (LHS). In the context of performing inference using GMRFs through the SPDE approach, we construct a tessellation of the input space using triangular/quadrilateral finite-elements in two dimensions, or prismatic/hexahedral/tetrahedral elements in three dimensions, where both sampling and test locations define vertices in the mesh and are chosen with the goal of maintaining a non-degenerate geometric aspect ratio within each element.

In general, a space filling strategy such as LHS can typically serve as a good starting point, followed by a subsequent refinement that is guided by the objectives and problem at hand. For example, if the goal is to locate the global minimum of the response surface one may employ a sampling strategy based on maximizing the expected improvement of the recursive co-kriging predictor [27]. A discussion on possible ways of determining a good allocation of points among different levels has been provided in the treatise of Le Gratiet & Garnier [16], although we believe this is still an open question that is tightly related to the given problem and objectives.

In view of available multi-fidelity data, algorithms 1–4 summarize the main implementation aspects of the proposed multi-level co-kriging framework, formulated using sparse GMRF precision matrices. Namely, we provide pseudocode for the main co-kriging recursion loop (see algorithm 1), the hyperparameter learning through maximum-likelihood estimation (see algorithm 2), the computation of the negative log-likelihood function of the co-kriging posterior (see algorithm 3), and the computation of the co-kriging variance (see algorithm 4).

## 3. Results

### (a) A prototype problem in risk-averse design

Consider a rectangular cross section of a short structural column, with depth *x*_{1} and width *x*_{2}, under uncertain yield stress and uncertain bending and axial loads. Assuming an elastic-perfectly plastic material, a limit state function that quantifies a relationship between loads and capacity is described by the random field [28]
*ξ*_{1}, the axial load *ξ*_{2} and the material yield stress *ξ*_{3} are distributed as
*Y* indicate more favourable capacity-to-load ratios, while high values correspond to critical regimes where failure may occur. Our aim here is to identify an optimal configuration *D*_{1}×*D*_{2}=[2,20]×[2,10] that satisfies a set of prescribed criteria. Here, we let this goal translate to identifying a design that minimizes the area of the column with the associated design *risk* staying below a given threshold *c*. To quantify the risk associated with a given response *Y* (**x**;*ξ*), we employ a *superquantile risk measure*, which for a parameter *α*∈[0,1) is defined as [15]
*α*=0, the superquantile risk measure of a continuously distributed random field *Y* (**x**;*ξ*), reduces to the expectation operator, i.e. *α*∈(0,1), the superquantile risk of *Y* (**x**;*ξ*) returns the average of the (1−*α*)% highest realizations of *Y* (**x**;*ξ*), therefore informing a *risk-averse* decision-making mindset [15].

The design problem can be formulated in a constrained minimization framework which reads as

The expectation operator *ξ*_{1},*ξ*_{2},*ξ*_{3}) using two different models. The high-fidelity model is a probabilistic collocation method (PCM) [20] on a tensor product grid with 10 quadrature points along each random dimension (total of 10^{3} quadrature points), while the low-fidelity model is a Smolyak sparse grid level-2 quadrature (SG-L2) [21] with a total of only 58 quadrature points.

This set-up yields a simple application of the proposed multi-fidelity methodology, where we have a single model in physical space (equation (3.1)) and two levels of fidelity in probability space (PCM, SG-L2). Employing a two-level recursive co-kriging algorithm (see §2a(ii)), we can explore the correlations between the two probabilistic models, and build an accurate representation of the target response surface

In figure 6, we present the computed response surface *x*_{1},*x*_{2}) considering the 20% highest realizations of *Y* . The exact solution corresponds to computing ^{6} MC samples [17]. The resulting response surface becomes singular as

Although this is a pedagogical prototype case in risk-averse design under uncertainty, it does not fully reflect the strengths of the proposed multi-fidelity framework as the two probabilistic methods (PCM, SG-L2) both happen to produce very similar estimations of the mean

### (b) Multi-fidelity regression of a random function

Here, we consider a system with two input design variables **x**=(*x*_{1},*x*_{2}), subject to external uncertainties, described by four standard normal random variables *ξ*=(*ξ*_{1},*ξ*_{2},*ξ*_{3},*ξ*_{4}). Let the response of this system, denoted by *Y* (**x**;*ξ*) be described by the random function

Now, assume that *f*_{e}(**x**;*ξ*) returns the real high-fidelity response but is expensive to evaluate. On the other hand, let *f*_{c}(**x**;*ξ*) be a low-fidelity cheap to evaluate surrogate model that we can sample extensively as

Our goal here is to employ the proposed multi-fidelity framework to construct the response surface of the mean field

Therefore, our multi-fidelity setup consists of two models in physical space (*f*_{e}(**x**;*ξ*),*f*_{c}(**x**;*ξ*)), and two models in probability space (SG-L5, SG-L2). This results in a family of response surfaces *S*_{ij}, that can be organized as

Figure 7 demonstrates the response surface produced using a four-level recursive co-kriging scheme, traversing the available models and data in the order *f*_{e}(**x**;*ξ*) using 10^{6} samples. Evidently, the highly nonlinear response of the mean field is captured remarkably well by just using five observations of the expensive highest fidelity model *S*_{22}, supplemented by a number of inaccurate low-fidelity observations from (*S*_{11},*S*_{12},*S*_{21}). This observation is further confirmed by the uncertainty of the predictor, quantified by the co-kriging variance (figure 7, inset), which is bounded by 10^{−3}. To underline the merits of using the proposed multi-fidelity approach note that the relative ^{−2}. This is in sharp contrast with the corresponding error from fitting a kriging model through the highest fidelity observations *S*_{22}, which here is as high as 0.5.

### (c) Multi-fidelity uncertainty quantification in a stochastic Burgers equation

Let us consider the following prototype initial/boundary value problem for the Burgers equation in one input dimension
*u*_{0}(*x*;*ω*) and the deterministic forcing term *f*(*x*,*t*) are defined as
*u*_{0}(*x*;*ω*), the solution *u*(*x*,*t*;*ω*) takes values in the space *x*.

Our goal here is to use the proposed multi-fidelity framework to get an accurate estimate of the kinetic energy
*N* in the finite-dimensional representation of the solution:
*ξ*_{1},*ξ*_{2}) in equation (3.8) is resolved by different probabilistic approximation methods. Here, we have employed a high-fidelity tensor product PCM, and several low-fidelity approximations based on MC integration using relatively small numbers of samples. In particular, we have considered four levels of mixed fidelity in physical and probability space. For the first level, labelled as *S*_{1}, we have considered a low resolution discretization in physical space using only 10 Fourier modes in equation (3.11). This yields an inaccurate solution of equation (3.7), but results in a very low computational cost per sample, thus motivating the use of a high-fidelity method in probability space that discretizes the parametric space with 1600 PCM quadrature points. At the second level, *S*_{2}, we employ a slightly finer resolution in physical space using 12 Fourier modes, while we decrease the fidelity in probability space by employing MC integration on 60 samples. Similarly, the discretization of the third level, *S*_{3}, output consists of 15 Fourier modes in physical space, and 40 MC samples. Lastly, in the fourth level, *S*_{4}, we employ a costly high-fidelity representation in physical space using 60 Fourier modes, but all probabilistic quantities are inferred by using only 10 MC samples.

Figure 8 presents the resulting four-level recursive co-kriging predictor and variance for the multi-fidelity reconstruction of the kinetic energy *E*_{k}(*t*;*ω*) at *t*=1 as a function of the random variables *ω*=(*ξ*_{1},*ξ*_{2}). The inference scheme at each recursive level is formulated using sparse GMRF precision matrices arising from a regular finite element discretization of the (*ξ*_{1},*ξ*_{2}) plane with 1600 elements. It is evident that the proposed methodology is able to return an accurate estimate primarily based on a large number of inexpensive but inaccurate evaluations, supplemented by just a few realizations of the accurate but expensive high-fidelity model.

In table 2, we present a comparison of the computed errors for the mean and variance of *E*_{k}(*t*;*ω*) for several cases. Here, we would like to underline that the four-level recursive co-kriging predictor (Case III) is able to return an accurate estimate for both quantities of interest, while resulting in substantial computational savings, as it required about *two orders of magnitude* less resources compared with a full blown high-fidelity analysis (Case IV).

### (d) Stochastic incompressible flow

We consider two-dimensional unsteady incompressible flow past a circular cylinder of diameter *D*, subject to a random inflow boundary condition. The flow is governed by the Navier–Stokes equations subject to the incompressibility constraint
**v** is the fluid velocity vector, *p* is the fluid pressure, *t* is time, and *ν* is the kinematic viscosity of the fluid. The system is discretized in space using the spectral/*hp* element method (SEM), according to which the computational domain is decomposed into a set of polymorphic non-overlapping elements [26]. Within each element, the solution is approximated as a linear combination of hierarchical, mixed-order, semi-orthogonal Jacobi polynomial expansions [26]. Temporal integration is based on decoupling the velocity and pressure by applying a high-order time-splitting scheme [26].

The random inflow has a parametric expression of the form
*σ*_{1},*σ*_{2}) are parameters controlling the amplitude of the skewness of the inflow noise (design variables), *y* is the coordinate transverse to the flow and (*ξ*_{1},*ξ*_{2}) are random variables with standard normal distributions.

Our goal here is to construct the response surface for the 0.6-superquantile risk of the base pressure coefficient

The proposed multi-fidelity modelling framework is illustrated here considering a single model in physical space and three models in probability space. The physical model returns realizations of the flow field solution produced by direct numerical simulations of equation (3.12) using the SEM. All simulations are started from a zero-velocity initial condition, and are integrated until a stable limit-cycle state is reached. This state is characterized by the well-known von Karman vortex shedding pattern in which the unsteady separation of the flow around the cylinder gives rise to a time-periodic base pressure signal. We choose three models in probability space to compute the expectation operator in equation (3.2). The highest fidelity model is probabilistic collocation on a tensor product grid (PCM) [20], the intermediate fidelity is a SG-L2 [21] and lowest fidelity model is MC integration [17]. For each model, we construct a uniform grid in the (*σ*_{1},*σ*_{2})-plane, and perform an ensemble of simulations at each grid-point according to the corresponding sampling strategy for integrating out (*ξ*_{1},*ξ*_{2}) (table 3). Figure 10 illustrates the PCM samples of the inflow boundary condition for different values of (*σ*_{1},*σ*_{2}). This uncertainty quantification task results to a total of 5526 Navier–Stokes simulations that were performed in parallel on one rack of IBM BG/Q (16 384 cores) with a CPU time of about 40 min.

Figure 11 shows the response surface for the 0.6-superquantile risk of the base pressure coefficient *σ*_{1},*σ*_{2})-plane. The accuracy of the co-kriging predictor is assessed against an ‘exact’ response surface constructed by using 357 uniformly distributed observations of the highest fidelity model (PCM), returning an error of 0.051 in the ^{−5}. Note that if we only used a one-level kriging scheme trained on the five high-fidelity observations, the resulting

## 4. Discussion

The scope of this work is to define a comprehensive mathematical framework that allows for the seamless integration of surrogate-based optimization and uncertainty quantification in engineering design problems. The proposed methodology targets the construction of response surfaces of complex stochastic dynamical systems by blending multiple information sources via auto-regressive stochastic modelling. We develop a general setting, under which multi-fidelity in models and multi-fidelity in probability space can coexist in tandem. A computationally efficient framework is developed using multi-level recursive co-kriging and GMRFs, resulting in very robust schemes in two- and three dimensions.

The capabilities of our implementation were tested for three different cases. In all cases, we were able to successfully exploit cross-correlations within ensembles of surrogate models and efficiently construct an accurate estimate for the response of the given stochastic system. In particular, we were able to construct the response surface and identify an optimal risk-averse design for a benchmark problem involving a short structural column with random material properties being subject to random loads. Next, we considered the regression of a random function, for which different fidelity models and information sources were available both in physical and probability spaces. An accurate representation is obtained by only using a minimal amount of expensive high-fidelity observations along with many cheap, low-quality realizations of different surrogates. The third benchmark case we considered involved evolving the Burgers equation from a random initial state and tracking the statistics of the kinetic energy of the system at a later time. Our results demonstrate how the proposed multi-fidelity framework can provide an accurate quantification of uncertainty with orders of magnitude of decrease in computational demands. Finally, we presented a realistic uncertainty quantification case for a laminar stochastic wake past a circular cylinder. For this case, we were able to obtain an accurate and reliable estimation of the response surface for the superquantile risk of the base pressure coefficient—a quantity that is known for its high sensitivity to the flow conditions.

A limiting factor of our presentation stems from the fact that all demonstrations were confined to two-dimensional problems. Although this significantly contributes towards enhancing the clarity of the presented concepts, we acknowledge that it does not establish a direct link to realistic high-dimensional design problems. To this end, the proposed framework can be extended to hypercube design spaces, in which covariance models can be similarly trained on the data through the SPDE approach, that can now be discretized using orthogonal tensor product expansion bases. This technique can provide an adequate extension to handle high-dimensional problems, however, its cost and optimality need to be carefully assessed. Our future plans will focus on evaluating the available tools and identifying an optimal strategy for addressing high-dimensionality in both the design and probability spaces.

## Data accessibility

The datasets supporting this article have been uploaded as part of the electronic supplementary material (ESM).

## Author' contributions

P.P. developed and implemented the methods, designed and performed the numerical experiments and drafted the manuscript; D.V. conceived and supervised the design and implementation of the methods and simulations; J.O.R. provided guidance on the use of risk measures and risk-averse design and helped draft the manuscript; G.E.K. conceived and coordinated the study. All authors gave final approval for publication.

## Competing interests

We declare we report no competing interests linked to this study.

## Funding

P.P., D.V. and G.E.K. gratefully acknowledge support from DARPA grants nos HR0011-14-1-0060, AFOSR grant no. FA9550-12-1-0463 and the resources at the Argonne Leadership Computing Facility (ALCF), through the DOE INCITE program. J.O.R. acknowledges support from DARPA grant no. HR001-14-1-225.

## Acknowledgements

We are thankful to Dr Lucia Parussini (Department of Mechanical Engineering, University of Trieste, Italy) for her help during the initial implementation stages of this work.

- Received January 9, 2015.
- Accepted June 2, 2015.

- © 2015 The Author(s) Published by the Royal Society. All rights reserved.