## Abstract

Complex civil infrastructure systems are typically exposed to random loadings and have a large number of possible failure modes, which often exhibit spatially and temporally variable consequences. Monte Carlo (level III) reliability methods are attractive because of their flexibility and robustness, yet computational expense may be prohibitive, in which case variance reduction methods are required. In the importance sampling methodology presented here, the joint probability distribution of the loading variables is sampled according to the contribution that a given region in the joint space makes to risk, rather than according to probability of failure, which is the conventional importance sampling criterion in structural reliability analysis. Results from simulations are used to intermittently update the importance sampling density function based on the evaluations of the (initially unknown) risk function. The methodology is demonstrated on a propped cantilever beam system and then on a real coastal dike infrastructure system in the UK. The case study demonstrates that risk can be a complex function of loadings, the resistance and interactions of system components and the spatially variable damage associated with different modes of system failure. The methodology is applicable in general to Monte Carlo risk analysis of systems, but it is likely to be most beneficial where consequences of failure are a nonlinear function of load and where system simulation requires significant computational resources.

## 1. Introduction

Risk, here defined as a function of probability and consequence, provides a rational basis for infrastructure system planning, design, operation and maintenance (Royal Society 1983, 1992; NRC 1992; Stewart & Melchers 1997; HM Treasury 2001). Engineering theory and, to some extent, practice dwells upon probability of failure as a measure of the performance of an infrastructure system, enabling engineers, in theory at least, to optimize the behaviour of the system to satisfy specified failure rate conditions. Reliability techniques (e.g. Thoft-Christensen & Baker 1982; Melchers 1999) focus upon accurately estimating the probability of system failure. Many safety critical systems are designed to a very low failure probability, usually because the consequences of any type of failure are unacceptably catastrophic (e.g. loss of containment of a nuclear reactor), so in this case failure probability acts as an operational proxy for risk. This is a special case because often the failure state (the combination of one or more component failures that have caused system failure) and loading which results in system failure will determine the severity of the consequences (and perhaps their spatial heterogeneity). As will be demonstrated in this paper, the conditions resulting in the greatest probability of failure do not necessarily result in the greatest contribution to risk.

For a discrete system of moderate complexity, there are a combinatorially large number of possible system states. If, as is often the case, estimation of the probability of failure and consequences of any given system state involves computationally expensive simulation, the resources required to analyse all possible states may be infeasible. Our objective here is, however, different, aimed at estimating risk. *A priori* we do not know where the maximum risk is located in the space of the basic variables of the system and so we propose an adaptive method for progressively refining the risk estimate.

Following this introductory section, the relevant principles of risk analysis for infrastructure systems are introduced. Aspects of reliability theory and importance sampling are reviewed. Next, the adaptive risk-based importance sampling methodology is introduced. Two example applications are described, beginning with a propped cantilever beam to demonstrate the adaptive nature of the sampling and verify the approach for a simple system with an analytical solution. The methodology is then applied to a substantial practical case of a coastal dike system at Towyn in the UK, before concluding with discussion of the benefits and limitations of the proposed approach.

## 2. Reliability and risk analysis of systems

Consider a system consisting of *t* basic variables in , where . The limit state function separates the ‘failed’ region from the ‘not failed’ region . The probability of failure, , is therefore the probability that the limit state function is less than or equal to zero:(2.1)where is the indicator function and is the joint probability density function (j.p.d.f.) of ** x** on .

Methods of reliability analysis are customarily presented at three levels (JCSS 1981). Level I methods, in which partial safety factors are imposed on the loading and resistance variables, are now widely employed in limit state design codes. For level II methods, the failure surface is approximated with a first- or higher-order Taylor series expansion around the failure point closest to the origin (the ‘design point’) after the j.p.d.f. of the basic variables has been transformed into independent standardized normally distributed variables. Level II methods are efficient and elegant, but the approximation of the failure surface can be inaccurate for highly nonlinear limit state functions and relies upon an explicit limit state function, which may not be available. In level III methods, the integral of the j.p.d.f. for the basic variables (2.1) is solved numerically. These are more flexible than level II methods, as both implicit and explicit limit state functions can be employed. However, level III methods can be computationally demanding.

If it is assumed that damage, *c*, occurs when the system is in a ‘failed’ state, then the risk, *r*, associated with the system is(2.2)where *c* may be expressed in terms of economic damage or some other quantified measurement of harm. In general, however, the amount of damage will depend upon the values of ** x** when failure takes place, which also determines the modes of failure in systems with multiple failure modes. We therefore replace

*c*with a function :(2.3)The risk calculation is therefore(2.4)The existence of a limit state function may be thought of as being implicit in equation (2.4).

Like the reliability problem (2.1), transformation techniques may be applied to yield analytical solutions to equation (2.4), while numerical approximation methods are more flexible. However, for many systems, the function may be implicit and, for each point ** x**, computationally expensive to estimate.

## 3. Integral estimation and variance reduction

Direct sampling, or ‘Monte Carlo’ analysis can be used to estimate the integral, *ξ*, of a function over :(3.1)If is transformed to a unit hypercube, then, given *n* random input vectors, , uniformly distributed over , the estimator of *ξ* is(3.2)The sample variance is given by(3.3)

The variance reduces as *n* increases. A number of variance reduction techniques can be employed to increase the efficiency of the convergence (Morgan 1984; Ripley 1987; Melchers 1999), e.g. antithetic variables, stratified sampling and directional sampling. The goal of importance sampling, another variance reduction technique, is to reweigh the distribution of the random samples used in the Monte Carlo estimate, so that more samples are located in regions where the integrand is larger. Equation (3.1) is rewritten as(3.4)where is an importance sampling function and . More efficient sampling is achieved when resembles the shape of the region of interest. A more efficient estimator compared to equation (3.2) is therefore(3.5)where are random input vectors sampled from . The variance of the importance sampling estimator is(3.6)In the context of the reliability problem (2.1), the importance sampling estimator is(3.7)In this case, the importance sampling function, , is most efficient when it is proportional to .

## 4. Adaptive risk-based sampling methodology

The damage function , introduced earlier, is seldom straightforward and may require expensive numerical solution, making it difficult to presuppose which states of the basic variables contribute most to the risk, *r*. We therefore address the situation where it is infeasible to do a very large number of evaluations of the function .

Algorithms of an adaptive nature have been demonstrated on reliability problems by Bucher (1988) and Melchers (1990). Their method uses sample outcomes to update the location of the mean of the importance sampling function, , such that the mean is gradually relocated nearer the location of the maximum likelihood. While could be multi-modal, the form of the function is not updated during the analysis, and so a successful outcome is still dependent on selecting an initial appropriate form for .

In this section, we propose a method that makes use of a non-parametric importance sampling function that seeks to approximate . We achieve this in the first instance with an exploratory analysis of the function and progressively refine the estimate of risk (the numerical estimate of the integral in equation (2.4)) by means of a non-parametric adaptive risk-based importance sampling function, written as , where *n* signifies the cumulative number of samples in the adaptive process.

Little is known about at the outset of an analysis, so the initial exploration of necessarily spans the whole of the region , where is known to be non-negligible. In the absence of further information, the exploratory sample is uniform throughout . This could be achieved by selecting samples on a regular grid over , or by randomly sampling from a multi-variate uniform distribution over . Starting with a random sample from a known distribution is attractive, because the initial samples can subsequently contribute to the pool of random samples used in the integral estimation without biasing the pool. We would like this initial sample to be reasonably evenly spread over , so in practice we try a large number of sets of random samples from and select the set of samples that minimizes :(4.1)where is the length of the projection of onto the -axis and is the distance on the -axis between points and . By doing so, we ensure that the samples from are reasonably evenly spread over *A* and extend towards the boundaries of the region.

The risk is evaluated at each of the *n* samples. A function based on these *n* evaluations of risk, , is obtained by fitting a surface over the *n* evaluations of risk . This risk function is normalized to give the j.p.d.f. , henceforth referred to as the risk-based importance sampling function. The next set of points, at which is evaluated, are subsequently sampled from .

As more samples of and estimates of are obtained, the importance sampling function can be intermittently updated. The updating should be frequent enough so that the sampling routine benefits from the improved sampling function, but not so frequent that the process of updating becomes a computational burden in its own right, which may be significant for higher-dimensional problems. The update frequencies used in the following case studies are within the range of 20–100 proposed for alternative adaptive methods (Melchers 1990). The estimator of total risk after *n* samples is(4.2)where is the risk-based importance sampling distribution from which sample is chosen. For samples , the (non-risk-based) uniform sampling function, , is employed. The process of estimating the total risk, , and updating the sampling distribution is repeated until the specified convergence criterion is satisfied.

Except in the special case of being a positive constant over and zero elsewhere in , it is clear that the importance sampling function used in the reliability problem, , and the risk-based importance sampling function, , will not be the same. In many cases, will be a poor sampling function for estimating *r*.

This same adaptive approach could also be used in the reliability problem to estimate the probability of failure after *n* samples, where the importance sampling function, , is updated intermittently by fitting a surface to :(4.3)where is the importance sampling function from which is chosen.

## 5. A simple system: propped cantilever beam

The adaptive risk-based importance sampling methodology is demonstrated by considering a simple reliability problem, that of a beam of length with a concentrated vertical random load, *V*, imposed on its cantilever end (figure 1). For this system, damage occurs only when , where is the maximum bending moment in the beam and is the (random) critical moment capacity. Hence, if *s*=5 m the limit state function, , is(5.1)The basic variables *V* and are assumed to be uncorrelated, normally distributed variables: and .

The fragility, , of the beam and the loading probability, , are plotted in figure 2. The system probability of failure is . In this case, damage occurs only when and increases with load:(5.2)The risk function for this problem is therefore(5.3)Figure 2 plots against *v*, given that . The analytical solution to equation (5.3) gives a total risk, .

Four sampling strategies were tested (figure 3).

*Uniform Monte Carlo approach*. Each sample is sampled from and . These distributions were selected to ensure coverage of the whole area, for which the probability and the consequences of failure are non-negligible.*Crude Monte Carlo approach*. Realizations of were sampled randomly from the distributions of the basic variables*V*and .*Adaptive probabilistic importance sampling*. The importance sampling distribution , as defined in equation (3.7) (i.e. based on the probability of structural failure), was used. The starting values and update frequencies were as used in the adaptive risk-based sampling strategy. In this case, after 200 simulations, the projection corresponds well with the analytically derived version of .*Adaptive risk-based importance sampling*. The initial exploratory analysis for the risk-based sampling strategy was simulated at and . These points were chosen from sets of six samples of and minimized the quantity (4.1) at . Subsequently, was updated every 25 realizations. Intermittent plots of the sampling distribution, showing only the projection on*v*for clarity, are shown in figure 4. As with the adaptive probabilistic sampling strategy, after 200 simulations, the projection corresponds well with the analytically derived version of .

An empirical convergence criterion limiting the variation between the minimum and maximum estimate of the last 10% of samples to less than of the estimate after *n* samples (tested when *n* is divisible by 10) was adopted:(5.4)Using this criterion, sampling strategy (i) requires approximately simulations, whereas using the risk-based sampling approach, strategy (iv), needs only 500 simulations to converge upon . Both these compare favourably with strategies (ii) and (iii), which do not meet the convergence criterion even after realizations. In both these cases, the sampling is sparse in the region of maximum risk at .

Conventionally, the structural design point refers to point in the violation space of the limit state function with the greatest probability density. We therefore name the point that maximizes as the risk-based design point of the system. In the case of the probability-based sampling strategy, (iii), the maximum sampling density is centred on the structural design point of the beam at . In this situation, the probability density at the risk-based design point is typically of the order , therefore this strategy biases the sampling away from the region of interest making it difficult for the risk to converge, thus demonstrating the importance of selecting an appropriate sampling distribution (e.g. Schreider 1966; Melchers 1989; McKay 2003). Strategy (ii) is also less efficient than strategy (iv), but perhaps less intuitively it appears to be less efficient than strategy (i). This is because, in this particular example, the sampling density is actually greater in the region of interest under strategy (i) than strategy (ii).

## 6. A real system: coastal dike system

In England and Wales, 1 million properties worth over £130 billion and 430 000 hectares of agricultural land worth approximately £2 billion (Halcrow *et al*. 2001) are protected by some 1300 km of sea dike (Madrell *et al*. 1998). The case study site is Towyn in North Wales (figure 5). The town is built on large areas of coastal lowland that were reclaimed during the eighteenth century and was inundated in 1990 when 467 m of seawall was breached. Typically, a coastal dike system is characterized by

two (partially correlated) loading variables, wave height, , and water level,

*W*,a system of

*q*coastal dike components that may be overtopped by high waves and/or water levels and may fail structurally by breaching, andspatially variable location and type of assets susceptible to damage in the event of a flood.

In order to obtain realistic damage estimates in the event of wave overtopping or breaching failure, computationally demanding inundation modelling is necessary. An efficient sampling strategy is therefore required.

The five main requirements for calculating coastal flood risk are

statistical estimation of extreme loads;

calculation of system reliability;

estimation of dike breach characteristics;

model inundation depth and extent due to wave overtopping or breaching of the dikes; and

estimation of damage to individual properties resulting from inundation to a given depth.

An overview of the methodology as applied to the coastal dike system is shown in figure 6.

### (a) Statistical estimation of extreme loads

A j.p.d.f. for the (non-negative) loading variables was constructed from simultaneous measurements of wave height, , and water level, *W*, using the approach of Hawkes *et al*. (2002).

### (b) Systems reliability analysis

The dike system in the example implementation comprises 12 coastal sections (labelled A–L) and six fluvial sections (labelled 1–6), of which four are raised ground and not prone to failure (figure 5).

For a system of *q* dike sections, labelled , the breaching of section is labelled as and non-breaching as . There are therefore possible system states, labelled , of which refer to states with at least one breached component. The set of states in which one or more components has breached is labelled . Flooding may occur by water overtopping the dike even if the dike has not breached, but the consequences tend to be greater in the event of a breach. The consequence of failure is therefore a function, , of both the system state and the load ** v**.

The probability of each of these states occurring will vary according to the (not necessarily independent) load, ** v**, imposed upon the system. Structural performance is described using a fragility function that describes the probability of breaching, conditional on a specific loading (Casciati & Faravelli 1991; Dawson & Hall 2003). The fragility function of system component

*i*, given load

**, is therefore the conditional probability . The probability, , of breaching of a given component, , can be established by integrating over the j.p.d.f. of loading, :(6.1)The breaching failure modes considered are summarized in table 1 and four representative fragility functions are plotted over in figure 7.**

*v*The main source of dependency between the components in this series systems originates from the loading, while the resistance of each section tends to be independent. The resistance of some dike sections, particularly those located near each other and sharing similar failure modes, may not be completely independent—perhaps due to shared geotechnical conditions. However, by using an appropriate length of dike sections, the resistance variables may be assumed to be independent (Van Gelder & Vrijling 1998), conditional upon loading. Sufficiently dense spatial testing of the strength parameters in the dikes would enable the correlation length to be estimated and expressions (5.4) and (6.1) modified accordingly (CUR and TAW 1990). However, assuming independence, conditional upon loading, the conditional probability of breaching failure of one or more components in series is(6.2)The conditional probability of any given state , for example corresponding to the event , can be calculated as(6.3)The total flood risk is therefore(6.4)In order to reduce the number of system states analysed in evaluating , the *k* system states combinations that make a non-negligible contribution towards (generally ) are selected. The definition of a non-negligible contribution depends on the required precision, and in this analysis a minimum requirement of was set. A lower bound on the total flood risk, expressed in terms of an expected annual economic damage for a coastal dike system, is therefore(6.5)where by definition and .

An upper bound, , associated with not analysing every permutation of can be estimated using(6.6)where is an estimate of the maximum possible damage for a given loading.

### (c) Dike breaching

Prediction of dike breach location, geometry and growth rate is highly uncertain (Wahl 1998). A number of simplified rules for breach width have been proposed, including(6.7)where *B* is the dike length, *h* is the overflow depth and *a* is as little as 3 for cohesive materials (HR Wallingford 2004) and as great as 15 for non-cohesive materials (Visser 1998). Here *a*=12.

### (d) Inundation modelling

Simulation of inundation over low-gradient floodplains with dike structures requires at least a two-dimensional hydrodynamic modelling approach with relatively high spatial resolution to represent the complex geometry of the floodplain. However, full two- or three-dimensional hydrodynamic modelling remains computationally prohibitive if multiple breach or overtopping cases are to be modelled. In order to reduce the computational burden of the hydrodynamic calculations for this study, a simplified two-dimensional inundation model called LISFLOOD-FP (Bates & De Roo 2000), which runs in a matter of seconds, was selected. This model generates a spatial field of water depths in the floodplain and has been shown to perform as well as full two-dimensional codes (Horritt & Bates 2001). It has been successfully demonstrated at a number of coastal sites (Bates *et al*. 2005). The model was calibrated using data from the 1990 flood.

### (e) Damage estimation

A national database combines information on property type and location in England and Wales. For each property type, Penning-Rowsell *et al*. (2003) have defined a depth–damage relationship.

### (f) Implementation of methodology

An exploratory analysis of 49 loading points was simulated from , , with the quantity measured at . Subsequently, was updated every 50 realizations. The risk converged (according to the criterion defined in equation (5.4)) after testing 2000 realizations. The function describing the location of risk, , is shown in figure 8. The total risk, expressed in terms of expected annual damage, is approximately £84 000. The maximum risk occurs at , , with a secondary maxima at , . It is clear from inspection of that a uniform sampling strategy over the loading space would result in over half of the samples contributing no information to the analysis.

The expected annual damages and expected annual inundation probability (defined as the probability of a water level, , greater than zero in a given grid cell) can be reported spatially. The map of inundation probability (figure 9, which also shows dike failure probabilities) shows that the majority of the floodplain has a 0.005–0.01 probability of flooding. This is dominated by the potential for piping failure of the estuary embankments. The West side of the floodplain is significantly less prone to flooding. Comparing this with the map of flood risk, figure 10 (which also shows contributions from each dike section to flood risk) demonstrates that there is not an exact correlation between risk and inundation probability.

### (g) Discussion

Figure 11 compares the conditional systems failure probability, , and . Three points corresponding to maxima are labelled; the largest, (1) , is dominated by , (2) , is dominated by failure mechanisms influenced by joint loading conditions (e.g. beach erosion and revetment damage), (3) , *W*=2 m is dominated by the piping failure of the estuarial dikes (which is influenced by *W* alone). The system design point does not therefore correspond with the maximum risk, because the topography is such that only a storm surge greater than approximately 4 metres Above Ordnance Datum (mAOD) results in any significant inundation, while a surge of approximately 5 mAOD is required for >£1 million. Overtopping events at lower water levels do occur, but do not release sufficient water into the floodplain to contribute substantially towards flood risk. However, the fact that (3) corresponds almost exactly with the maximum of and (2) is near the secondary maxima of suggests that is still, in this study, an influential function for calculating risk.

The importance of topography and the influence of different breach failure states have upon flood risk can be considered by referring to a few simulations. Figure 12 shows four flood outlines and their associated damages, , and risks, , for single breaches, in which and . Despite the same load being imposed on the system, the flood outlines and resultant values of are very different, and a higher does not necessarily correspond to a higher . This is caused by a number of factors: the system state and its probability of occurrence, the loading on the system, the spatial distribution of the floodplain damages and the floodplain topography. Under more severe loading conditions, multiple component failures account for an increasingly significant proportion of the risk. A comparison of the four very different values of in figure 12 justifies the use of a computationally demanding inundation model for this case study site.

## 7. Conclusions

Risk assessment is a well-established technique for measuring the performance of infrastructure systems. A generic methodology for assessing the risk associated with complex infrastructure systems has been developed. The methodology is novel in that the joint space of the loading variables is sampled according to the contribution that a given sub-region of that space makes to *risk* rather than (conventionally) *probability of failure*. This methodology has been applied to a simple beam system and a real coastal dike system.

The use of sampling methods is attractive, as, unlike so-called level II linearization methods, they enable the use of implicit and nonlinear limit state functions. Application of the risk-based sampling methodology on a simple system demonstrates a gain in efficiency of several orders of magnitude relative to a uniform sampling strategy.

In the application to a real coastal dike system in the UK, risk is shown to be a complex function of a number of factors: the system failure state, its probability of occurrence and associated damages and the loading on the system. In this particular case, these are strongly influenced by floodplain topography and the spatial distribution of vulnerable assets in the floodplain. Analysis based only upon conditions at the system design point, or a limited number of breaching failure combinations would not, in the system considered here, have adequately represented important system behaviour. In particular, the location of maximum risk and the design point in the loading space do not coincide.

While the risk assessment methodology has been demonstrated on a coastal dike system, it is more widely applicable than this, but is likely to be most useful where consequences are a nonlinear function of the basic variables and simulation of the system is computationally demanding.

## Acknowledgments

The research described in this paper formed part of the ‘RASP: Risk assessment for flood and coastal defence systems for strategic planning’ project, funded by joint DEFRA/EA Flood and Coastal Defence R&D programme. The project was managed by Paul Sayers at HR Wallingford and data used in the coastal case study were provided by the Environment Agency, Dr Mohamed Hassan of HR Wallingford and Conwy Borough County Council.

## Footnotes

- Received November 1, 2005.
- Accepted March 20, 2006.

- © 2006 The Royal Society