Small islands in the vicinity of the mainland are widely believed to offer protection from wind and waves and thus coastal communities have been developed in mainland areas behind small islands. However, whether they offer protection from tsunamis is unclear. Do islands act as natural barriers? Recent post-tsunami survey data, supported by numerical simulations, reveal that the run-up on coastal areas behind small islands was significantly higher than on neighbouring locations not affected by the presence of the islands. To study the conditions of this run-up amplification, we solve numerically the nonlinear shallow water equations. We use the simplified geometry of a conical island sitting on a flat seafloor in front of a uniform sloping beach. By doing so, the experimental set-up is defined by five physical parameters, namely the island slope, the beach slope, the water depth, the distance between the island and the plane beach and the incoming wavelength, while the wave height was kept fixed. The objective is to find the maximum run-up amplification with the least number of simulations. To achieve this goal, we build an emulator based on Gaussian Processes to guide the selection of the query points in the parameter space. We thus reduce substantially the computations required to identify the run-up amplification. Our results show that the island acts as a focusing lens for energy and amplifies the run-up along the coastline behind its lee side, instead of protecting it, as popular beliefs suggest.
In the last decade, the 26 December 2004 tsunami in Indonesia [1,2] and the 11 March 2011 event in Japan [3–6] spread death and destruction and huge economic loss at affected sites. Both events increased public understanding of tsunamis, and raised awareness and preparedness in at-risk communities. Preparedness remains the only effective countermeasure to save lives.
The most used quantitative indicator for tsunami impact is the run-up. Run-up is defined as the elevation of the maximum wave uprush on a beach or structure above still water level. Since the 1950s tsunami run-up on a plane beach has been extensively studied by Carrier & Greenspan , Keller & Keller , Synolakis , Tadepalli & Synolakis , Brocchini & Peregrine , Kânoğlu & Synolakis , Didenkulova & Pelinovsky , Antuono & Brocchini , Stefanakis et al. [15,16] among others. All these studies dealt with the mathematical description of long wave run-up on uniform sloping beaches.
The catastrophe in Babi Island from the 12 December 1992 Flores tsunami [17,18] focused attention on run-up on the lee side of islands. Laboratory experiments by Briggs et al. , numerical computations Liu et al.  and analytical models  showed that long waves can amplify wave run-up on the lee side of conical islands, compared with the run-up on the side of the island fronting the wave. Earlier studies [22–26] have provided insight on the behaviour of long waves around conical islands, but did not address run-up, directly, having focused instead on wave trapping or scattering from the front.
Interesting recent observations by Hill et al.  have shown enhanced tsunami run-up in coastal areas in the Mentawai islands off Sumatra, in locales behind small islands, supposedly protected by the islands (figure 1). In fact, Satake et al.  recorded one of the highest flow depths in their survey, immediately behind Pulau Sibigau. The western part of Sumatra was largely shielded by the Mentawais and there was negligible run-up in Padang, long believed to be at high risk from megathrust event, and in fact it is Borrero et al. .
Can indeed small islands widely believed to act as natural barriers for tsunamis, transform into amplifiers of wave energy in coastal areas they shadow, which is often where coastal communities thrive? Here, we investigate whether the observation by Hill et al.  for the 25 October 2010 Mentawais tsunami was caused by unusual combinations of bathymetry and tsunami characteristics, or whether it is indicative of a more general phenomenon. Without computational enhancements, we would have to vary island geometries, coastal beach slopes, offshore depths, distance between the islands and the coastline, and tsunami wavelengths, independently, performing literally thousands of computations to identify patterns, and whether combinations of these parameters produce unusual amplification. While this is possible, we opt to use recent developments in machine learning to limit the numbers of combinations and identify run-up extremes to help us better understand the interaction of the physical parameters and thus identify locales which may be at higher risk of inundation.
In recent years, developments in computer science and increases of computational power have further reduced the cost of numerical compared to physical experiments. However, each simulation has computational cost, which increases with model complexity and spatio-temporal resolution. Therefore, a series of experiments with a specific objective, such as maximization/minimization of an output, needs to be carefully designed, in order to reach the desired conclusion with the least number of experiments. Thus, finding the argmaxx f(x) where f(x) is the output of the experiment is not trivial, as the analytical expression of f(x) is not known a priori; x is a n-dimensional vector, where n is the number of parameters on which the output depends. The difficulty of the problem increases with n. The direct approach which involves creating regular grids and testing all possible reasonable combinations of the n parameters is prohibitively expensive.
For this reason, sampling techniques have been developed which aim to reduce the number of numerical experiments by finding a representative sample in the input space. These techniques are commonly referred to as ‘experimental design’, e.g. Sacks et al. , and are static, meaning that the design (sampling) is made initially, before the execution of the experiments, and the selection of the future query points is not guided by the experimental results. In this procedure, all such designed points are queried. This has been a substantial advancement, compared with the regular grids approach, but still involves massive computations.
More recently, adaptive design  and machine (or statistical) learning algorithms have been developed for the ‘active experimental design’. They use already computed results as a guide to select future query points . To achieve this, a statistical model of the experiment is built and is constantly updated as new results arrive. Using the predictions of this statistical model (emulator), future query points are selected according to the objective of the experiment, until the objective has been achieved.
This dynamic approach further reduces the computational costs and allows investigation of larger parameter ranges. Moreover, building such an emulator has further advantages, the most important one being the ability to use it instead of the actual simulator, as it is much less computationally demanding to evaluate, and thus can be applied very rapidly. This can be a substantial advantage in tsunami prediction, when a quick forecast is needed. Depending on the emulator, it is also possible to perform a sensitivity analysis of the model output to the several input parameters. A recent and likely the first example of an emulator built in the context of tsunami research is that of Sarri et al. , who emulated landslide-generated tsunamis on a plane beach, based on the theoretical model of Sammarco & Renzi .
From a physical point of view, in this study we elucidate the tsunami run-up on a plane beach behind a small conical island, compared with an adjacent lateral location on the beach, not directly influenced by the presence of the island. We will focus on small solitary islands. Chains of islands such as the Mentawais will always offer some protection, as demonstrated in the 2010 event. To achieve this, we use numerical simulations of the nonlinear shallow water equations (NSWE). Moreover, we will present a newly developed method for active experimental design , which we will apply to our physical setting, and we will discuss its advantages and limitations. We will present metrics that can be used for the comparison of the performance of different learning strategies, and an empirical stopping criterion—i.e. a criterion which will signal the achievement of the optimization objective. We will end by discussing run-up amplification as a function of the wavelength to the island radius ratio.
In none of our experiments, did our small islands produce amplification less than one. It appears that, contrary to popular belief and intuition, small islands can act as tsunami lenses focusing energy behind them, and thus make the entire phenomenology of tsunami amplification ever richer than envisioned by Berry  and Kânoğlu et al. .
2. Experimental configuration
Our simplified bathymetric profile consists of a conical island sitting on a flat seafloor fronting a plane beach (figure 2). The height of the crest of the island above still water level is always 100 m. A single wave profile η0(t)=1.5 sech2(ωt−2.6) is prescribed as forcing at the seaward boundary. We use this formulation to avoid the solitary wave link between the water depth and the wave amplitude, discussed in Madsen et al.  and in Madsen & Schäffer . The problem is governed by five physical parameters, namely the island slope, the plane beach slope, the water depth, the distance between the island and the beach and the prescribed incident wavelength which is controlled by ω (table 1).
The numerical simulations were performed using VOLNA  which solves the NSWE. VOLNA can simulate the whole life cycle of a tsunami from generation to run-up. It uses a Finite Volume Characteristic Flux scheme [41,42] with a MUSCL type of reconstruction for higher order terms [43–45] and a third-order Runge–Kutta time discretization. The code uses an unstructured triangular mesh, which can handle arbitrary bathymetric profiles and can also be refined in areas of interest. The mesh resolution that we used varied from 500 m at the seaward boundary to 2 m in areas where we calculated run-up (figure 3).
The run-up was measured on the plane beach exactly behind the island and on a lateral location along the beach, which was far enough from the island and thus was not directly affected by its presence (figure 3). To compute run-up, 11 equally spaced virtual wave gauges were positioned initially on dry land, along transects in each of the two locations. Gauge locations have inherent uncertainties due to spatial discretization, which we minimize using higher resolution around these locations (figure 3). The actual horizontal spacing of these wave gauges was dependent on the beach slope. The maximum run-up was defined as the maximum recorded wave height at the wave gauge with the highest elevation. When the wave did not reach the elevation of a gauge, then that gauge did not record any signal.
(b) Experimental design
In order to fill the input parameter space, we chose the input points so that maximal information was obtained with a moderate number of points. This procedure is known as ‘experimental design’ and it is a passive approach as we described in the introduction. This is the first step to reduce the computational cost. For this purpose, we used Latin Hypercube Sampling (LHS) , with maximization of the minimum distance between points. When using the LHS of a function of M variables, the range of each variable is divided into N equally probable, non-overlapping intervals. Then one value from each interval is randomly selected for every variable. Finally, a random combination of N values for M variables is formed. The maximization of the minimum distance between points is added as an extra constraint.
The LHS is found to lead to better predictions than regular grids, when used with multivariate emulators . In order to accurately cover the input space, we ran 200 simulations selected by LHS. For comparison, a regular grid approach with 10 grid points in each dimension would require 105 points per simulation. However, we can further improve the performance of this approach with the ‘active experimental design’, which can suggest the order in which the points will be queried, and is result driven, as we describe in the following section. We clarify that the active experimental design does not require to be initialized with LHS, and it can work on any set of points. The LHS was used for evaluation purposes, to better fill the parameter space with a limited number of points, because we do not have the luxury to use a large number of random points. Finally, we stress that in terms of statistical learning, our strategy is not restrained to the specific tsunami problem and can be applied wherever the objective is scalar optimization with cost constraints.
3. Active experimental design
(a) Active batch optimization
Let be an unknown function, with a finite subset of , which we can evaluate at any location for a fixed cost. We address the problem of sequentially finding the maximizer of f, in the lowest possible number of evaluations. The arbitrary choice of formulating the optimization problem as a maximization is without loss of generality, as we can obviously take the opposite of f, if it is required to find the minimum. We consider the special case where K evaluations of f can be acquired with no increase in cost. This occurs, for example, when f is the result of a numerical experiment which can be computed in parallel on a machine with K cores, and the ‘cost’ to minimize is computation time. At each iteration t, we choose a batch of K points in called the queries , and then calculate simultaneously the observations of f at these points, which may be potentially noisy, where the is independent Gaussian noise . The stochasticity in the measurements is due to the discretization, both spatial and temporal, and not due to the accuracy of the evaluation model, as we consider it to be deterministic.
Assuming that the number of iterations allowed, hereafter called horizon T, is unknown, we develop a strategy that is sufficient at each and every iteration. Care must be taken to grapple the exploration/exploitation trade-off, that is balance learning the function f globally, with focusing around the predicted maximum. The loss incurred at iteration t is the simple regret for the batch , defined as 3.1 We note that f(x⋆) is the maximum. We aim to minimize the cumulative regret  defined by 3.2 A strategy is said to be ‘no-regret’, when
(c) Gaussian processes
In order to analyse the efficiency of a given strategy, we have to make assumptions on f. We want extreme variations of the function to have low probability. We assume that f is a Gaussian process (GP), which is a natural way to formalize the intuition that nearby locations are highly correlated. It can be seen as a continuous extension of multidimensional Gaussian distributions. We said that the random process f is Gaussian, with mean function m and non-negative definite covariance (or kernel) function k, denoted by when for any finite subset of locations the values of the random function form a multivariate Gaussian random variable of mean vector μ and covariance matrix C given by m and k. That is, ,
If we have the prior knowledge that f(x) is drawn from a GP with known kernel function k, then, based on the observations of f after T iterations f(XT), the posterior distribution of f(XT) remains a GP, with mean and variance , which can be computed via Bayesian inference , by 3.3 and 3.4 where are the set of queried locations, and the vector of noisy observations, respectively. is the vector of covariances between x and the queried points and CT=KT+σ2I the kernel matrix with KT=[k(x,x′)]x,x′∈XT. I is the identity matrix.
The three most common kernel functions are:
(a) the polynomial kernels of degree ,
(b) the (Gaussian) Radial Basis Function kernel (RBF or Squared Exponential) with length-scale l>0, 3.5
(c) the Matérn kernel, of length-scale l and parameter ν, 3.6
where Kν is the modified Bessel function of the second kind and order ν.
The Bayesian inference is represented on figure 4 in a sample unidimensional problem. The posteriors are based on four observations of a GP. The vertical height of the grey area is proportional to the posterior deviation at each point.
4. Parallel optimization procedure
We can now describe the learning strategy we used, namely the Gaussian Process Upper Confidence Bound with Pure Exploration algorithm (GP-UCB-PE) .
(a) Confidence region
A key property from the GP framework is that the posterior distribution at a location x has a normal distribution . We can then define an upper confidence bound and a lower confidence bound , such that f is included in the interval with high probability: 4.1 and 4.2 with defined in Contal et al. . The factor βt regulates the width of the confidence region.
and are illustrated in figure 4 by the upper and lower envelope of the grey area, respectively. The region delimited in this way (the high confidence region) contains the unknown f with high probability.
(b) Relevant region
We define the relevant region as the region which contains x⋆ with high probability. As an illustration of the concept, refer to the green region in figure 5. Let be the lower confidence bound on the maximum, is represented by the horizontal dotted green line on figure 5. is defined by 4.3 discards the locations where x⋆ does not belong with high probability. It is represented in green in figure 5.
(c) Gaussian process upper confidence bound with pure exploration algorithm
We present here the GP-UCB-PE of Contal et al.  combining two strategies to determine the queries for batches of size K. The first location is chosen using the UCB rule, 4.4 This single rule is enough to address the exploration/exploitation trade-off. The value of βt balances exploring uncertain regions (high posterior variance ) and focusing on the supposed location of the maximum (high posterior mean ). This strategy is illustrated with the point x0 in figure 5.
The location of the single point that maximizes the information gain It is easily computed, by maximizing the posterior variance. This method usually referred to as a ‘greedy’ strategy selects for each 1≤k<K one by one the following points, 4.5 where is the updated variance after choosing . We use here the fact that the posterior variance does not depend on the values of the observations, but only on their position . One such point is illustrated with x1, in figure 5.
These K−1 locations reduce the uncertainty about f, improving the guesses of the UCB procedure by . The overall procedure is shown in algorithm 1.
With the Gaussian process assumption on f, we adjust the parameter βt, which regulates the width of the confidence region, so that f will be contained by the high confidence region with high probability. Under this condition, Contal et al.  prove a general bound on the regret achieved by GP-UCB-PE, making this strategy a ‘no-regret’ algorithm. The order of magnitude of the cumulative regret obtained with a linear kernel is , and for the RBF Kernel , up to polylog factors. When K≪T, these probabilistic bounds with parallel queries are better than the ones incurred by the sequential procedure by an order of .
(d) Stopping criterion
One challenging problem faced in practice is deciding when to stop the iterative strategy. The theoretical analysis only gives general expected bounds when T tends to infinity, but it does not provide estimations of the constant and short term factors.
One trivial solution is to fix the number of iterations (or computation time) allowed by a predefined limit. This is not suitable for the general case, as one does not know precisely the amount of exploration needed to be confident about the maximum of f. Other rules like the criteria based on the local changes of the queries in the target space (improvement-based criteria) as well as in the input space (movement-based criteria), come from the convex optimization literature. These are not suitable in our global, non-convex setting, as they stop after a local maximum is found.
Our approach is to stop when the procedure ceases to learn relevant information on f. We attempt to measure the global changes in the estimator between two successive iterations, with more focus on the highest values. The algorithm then stops when these changes become insignificant for a short period. The change between and is measured by the correlation between their respective values on a finite validation dataset .
We denote by nv the size of the validation dataset and the set of all permutations of [1…nv]. Let (resp. πt+1) be the ranking function associated to (resp. ), such that and We then define the discounted rank dissimilarity and the normalized rank correlation as and The denominator in the definition of represents the discounted rank dissimilarity between two reversed ranks π+ and π−. The normalized rank correlation for two such ranks will therefore be equal to 0, whereas for any rank π, this correlation with itself will be . The correlation can be seen as a modified Spearman’s rank correlation coefficient, where the squared distances are weighted by their position in the new rank πt+1. If we note that , the inversion of the nth and mth ranks in π, for all rank π, we find that .
In figure 6, we present the observed relationship between the regret incurred at iteration t and the rank correlation . The algorithm stops when stays below a given threshold ρ0 for ℓ iterations in a row. The value of this threshold is fixed empirically. In figure 6, ρ0=10−4 and ℓ=4, the algorithm stopped at iteration 12.
In figure 7, we can see the distribution of the final number of iterations T (lower is better) together with the final gap (lower is better) for four different thresholds ρ0. The value ρ0=10−4 appears to be an adequate threshold, since the final regret is always 0, and the number of iterations remains low. Further reduction of this threshold will again ensure zero regret, but at higher computational cost. On the other hand, greater values of ρ0 will reduce the computational cost, but with increased risk of missing the maximum.
5. Testing of the algorithm
(a) Synthetic datasets
Apart from the tsunami experiment, which is five-dimensional and we do not know a priori the form of the response surface, we use two synthetic datasets to test the performance of the described active learning algorithm, Himmelblau’s function and a Gaussian mixture. Such datasets can be easily visualized, being two-dimensional, and we can attribute to them desired properties, such as several local maxima or background noise, so that we can test the effectiveness of our methodology.
(i) Himmelblau’s function
Himmelblau’s function is non-convex in (figure 8a). We compute a slightly tilted version of Himmelblau’s function and take its opposite to match the challenge of finding its maximum. This function has four crests, but only one global maximum near (−3.8,−3.3). It gives a practical way to test the ability of an active learning strategy to manage exploration/exploitation trade-offs.
(ii) Gaussian mixture
This synthetic function is the addition of three two-dimensional Gaussian functions, at (0.2,0.5), (0.9,0.9), with the maximum at (0.6,0.1). We then perturb these Gaussian functions with smooth variations, generated from a Gaussian process with Matérn Kernel (equation (3.6)) and very little noise. The resulting function is shown in figure 8b. Given how narrow the crest is, the sequential search for the maximum is quite challenging.
We verify empirically the performance of GP-UCB-PE by measuring the decay of the regret obtained on two synthetic functions. For brevity, we do not report the cumulative regret (equation (3.2)) on the figures, but the gap between the maximum point discovered and the true maximum, defined as the minimum regret, .
First, we show in figure 9 the impact of the batch size K on the minimum regret . It is shown that the sequential (red curve K=1) performs better than the others. This suggests that in a situation where the final number of queries is the restrictive factor, it is advantageous to choose small batch sizes. On the other hand, if the total time of the optimization is the restrictive factor, then one could choose larger batch sizes, without sacrificing additional computational cost.
We then compare our algorithm to three other strategies,
— a random strategy, which chooses the next queries at random,
— an exploration strategy, which attempts to maximize the information gain on f at each iteration,
— and an exploitation strategy, which only focuses on the predicted maximum,
For all datasets and algorithms, the batch size K was set to 10, and the four strategies were initialized with a random subset of tinit=20 observations (xi,yi). The curves in figure 10 show the evolution of the gap in terms of the iteration t. We report the average value with the confidence interval over 64 experiments (random initializations). The kernel function used was always an RBF kernel (equation (3.5)). The parameters of the algorithm, like the length-scale of k (represented by l), were chosen as the best parameters through validation on a random subsample of the data.
Our GP-UCB-PE learning algorithm is shown to outperform the other three strategies on the synthetic datasets. Exploitation in these datasets loses time, because it gets trapped in a local maximum, while the exploration and random strategies asymptotically find the global maximum, on average. On the conical island experiments dataset, the exploitation strategy performs slightly better than GP-UCB-PE, probably due to the simplicity of the run-up function, which even though five-dimensional, it does not seem to pose any serious challenges, probably because it has a single maximum.
6. The effect of the island
After running 200 simulations, we have found that in none of the situations considered did the island offer protection to the coastal area behind it. On the contrary, we have measured amplified run-up on the beach behind it, compared with a lateral location on the beach, not directly affected by the presence of the island (figure 11). This finding shows that small islands in the vicinity of the mainland act as amplifiers of long waves at the region directly behind them and not as natural barriers as it was commonly believed.
The maximum amplification achieved was approximately 70% more than if the island was absent and the median amplification factor is 1.3. The island focuses the wave on its lee side, while far from it the wave propagates unaffected (figure 12). The amplified wave propagates towards the beach and causes higher run-up in the region directly behind the island.
One of the key questions is which parameters control the run-up amplification (RA) and how. To answer these questions, we can use mean function of the Gaussian process (equation (3.3)). We perform a local sensitivity analysis around the maximum RA by fixing all parameters, except one each time at the value which corresponds to the maximum RA, and we vary the excluded parameter across the whole range of its input space (figure 13). We can observe that the water depth, the beach slope and ω are more important.
To better understand dependencies between the input parameters, we recombine them in non-dimensional but physically interpretable measures. In figure 14, we express the RA as a function of the ratio of the wavelength over the island radius at its base λ0/r0 and the Iribarren number computed using the beach slope and normalized with the relative wave amplitude H0/h0. We find that the RA strongly depends on the ratio λ0/r0 and that the highest values are attained when the wavelength is almost equal to the island radius. The normalized Iribarren number gives a satisfactory classification, with smaller values qualitatively leading to higher RA. However, the complexity of the phenomenon cannot be completely explained by these two measures. Nevertheless, figure 14 shows that a comprehensive knowledge of the system provides better insight than statistics and implies that interdisciplinary problems need close and careful collaborations to ensure useful results.
Real coastlines are of course seldom straight, having promontories, bays and bends, and they feature composite beaches. Islands are seldom, if ever, circular. Considering these added complexities was beyond the scope our work, but telling published observations of Fritz et al.  for the 27 February 2010 Chilean tsunami for Mocha island off Tirúa and unpublished observations by Costas Synolakis for the 12 February 1992 Flores and the 3 June 1994 East Javan tsunamis suggest that our finding for idealized geometries may be of wider applicability. In the very least, it suggests that numerical models, which sometimes eliminate small islands of complex shapes to stabilize inherently unstable run-up computations close to shore, need to consider carefully the effect of the islands. Furthermore, our analysis provides one example of what may be possible in the future for tsunami forecasts. Instead of relying on vast databases of pre-computed scenarios, it may well be possible to vary uncertainties in parameters of the initial estimates of fault characteristics available shortly after an event and obtain faster than real time estimates of maximal inundation.
We examined the effect a small conical island has on the long-wave run-up on a plane beach behind it. Using a simplified geometry dependent on five physical parameters, we set to explore the combination of parameters which will give us the maximum run-up amplification with a minimal computational cost. To achieve that, we employed an active experimental design strategy based on a Gaussian process surrogate model. The strategy, which is parallelizable, can handle efficiently the trade-off between exploration of the input space and focusing on the region where the argmaxx f(x) is believed to reside in.
Even though our algorithm is asymptotically convergent, we are interested in its behaviour for a finite-time horizon T. Comparing our strategy to others commonly used, we showed that it performs better in most cases. Overall, the active experimental design approach can reduce the computational cost by more than 60% compared with a classic experimental design (LHS) and potentially much higher (figure 10c). In addition, the computational gain is orders of magnitude smaller than a regular grid approach—three orders of magnitude for a five-dimensional problem.
Moreover, a stopping criterion was presented, which can signal the achievement of the optimization objective and thus the end of the experiments. The development of such a criterion is essential in real applications where only a small number of experiments is allowed due to cost constraints and thus the theoretical asymptotic convergence is useless. Our stopping criterion is based on the difference in the ranking of the predictions of the surrogate model between two consecutive iterations. Even though, it is shown to correlate well with the regret (figure 6), it depends on an empirically set threshold.
Our results show that for the given ranges of input parameters, the island instead of protecting the beach behind it, as it was widely believed, acts as a focusing lens of wave energy on its lee side. This finding is of importance for the appropriate education of coastal communities and their emergency preparedness.
This work was funded by EDSP of ENS-Cachan (T.S.S and E.C), the Cultural Service of the French Embassy in Dublin (T.S.S), the ERC under the research project ERC-2011-AdG 290562-MULTIWAVE (F.D), SFI under the programme ERC Starter Grant - Top Up, grant no. 12/ERC/E2227 (F.D), the Strategic and Major Initiatives scheme of University College Dublin (T.S.S and F.D) and the EU FP7 program ASTARTE (F.D and C.E.S).
C.E.S acknowledges the Hellenic Centre for Marine Research for travel support.
- Received July 28, 2014.
- Accepted October 8, 2014.
- © 2014 The Author(s) Published by the Royal Society. All rights reserved.