## Abstract

Permafrost consists of soil and rocks that remain at 0^{°}C or below for at least two consecutive years. In mountains, permafrost ground ice acts like cement, stabilizing rock walls. Its degradation, following climate warming, may lead to slope instability in high mountains and damage to infrastructure, so knowledge about its evolution is essential for risk analysis. In pure solids, heat is transferred by conduction, but permafrost ground is also subject to non-conductive fluxes, and heat transfers are influenced by factors such as air temperature and snow cover, so a deterministic scheme cannot fully describe heat propagation. Current approaches to modelling use numerical models involving heat conduction schemes and energy balance models, requiring data on quantities such as relative humidity and radiation. We describe a stochastic treatment of the heat equation, which adapts to space–time changes in heat transfers driven by factors such as air temperature and snow cover, without requiring corresponding data, as part of a statistical model. The flexibility and performance of our approach are illustrated using data from two boreholes in the Swiss Alps, which show the strong influence of snow cover on ground temperature and the long-term degradation of permafrost produced by the 2003 heat wave.

## 1. Introduction

Permafrost (or perennially frozen ground) is defined as lithospheric material that remains at 0^{°}C or below for at least two consecutive years (Brown & Péwé 1973; Washburn 1979). It is a common and an important thermal phenomenon in high mountain areas and high latitudes. The active layer, the top layer of material that thaws during summer and freezes again during winter, is largely controlled by climatic conditions and reacts very sensitively to their changes (Slaymaker & Kelly 2007). The active layer and permafrost thermal state are regarded as important cryospheric indicators of climate change (Burgess *et al.* 2000; Harris *et al.* 2001), and the degradation of ice-bearing permafrost may lead to adverse effects such as rock wall instability in mountains (Gruber *et al.* 2004*a*; Ravanel *et al.* 2010), mass movements and thermokarst formation in less steep terrain or instability of mountain infrastructure (Phillips *et al.* 2007).

In view of growing human activity in cold mountain areas, owing to tourism or hydropower production, better knowledge about the occurrence of permafrost and its evolution over time is essential. Unlike the retreat of glaciers, permafrost degradation is largely invisible because it is purely based on ground temperatures and ground ice is not visible at the surface. A major part of mountain permafrost research therefore deals with assessing and modelling ground temperature distributions (Riseborough *et al.* 2008). Despite much research on near-surface characteristics of permafrost (Lehning *et al.* 2002; Stocker-Mittaz *et al.* 2002; Juliussen & Humlum 2007), less is known about its two-dimensional distribution at depth inside mountains. Although ground temperatures are mostly governed by heat conduction, complex combinations of convection processes varying in depth and time take place in permafrost (Kane *et al.* 2001), and other variables, such as accumulation and ablation of snow cover and surface temperatures, profoundly influence the subsurface thermal field. This makes direct application of basic heat conduction theory inadequate for ground temperature modelling in real-world conditions (Riseborough *et al.* 2008). Current approaches to this attempt to understand the influence of external variables on the heat conduction scheme. First approaches of this type took into account surface temperatures and two-dimensional heat fluxes in case studies in the Swiss Alps (Wegmann *et al.* 1998; Gruber *et al.* 2004*b*). Systematic modelling combining a surface energy balance model with a three-dimensional ground heat conduction scheme was investigated by Noetzli *et al.* (2007). Despite its performance, a drawback of this approach is that additional data such as air temperature, air pressure or relative humidity are needed to feed the surface energy balance model.

In this study, we propose a new approach, in which the heat equation remains the guiding thread of heat propagation into the ground, but considered in a stochastic rather than a deterministic framework. Its physical parameters and its boundary conditions, including surface temperature, are unobserved stochastic variables to be estimated. A major advantage of our approach over existing ones is that only ground temperature data are required. The stochastic treatment allows the model to flexibly reproduce the influence of external variables, such as air temperature or snow cover, on ground temperatures, without requiring such data. Heat flow is part of a hierarchical Bayesian model used by Brynjarsdóttir & Berliner (2011) in the context of palaeoclimate reconstruction; see also the advection–diffusion approach of Wikle (2003) to an ecological problem. Observed temperatures are assumed to be a noisy version of the temperature reconstructed below ground. This modelled temperature is assumed to be governed by the heat equation, whose physical parameters and boundary conditions are stochastic. This hierarchical framework accounts for the various sources of uncertainty in the data, the theory and the model specification, but still uses a physically based model.

Our approach is illustrated by the reconstruction of ground temperature in permafrost at two adjacent boreholes in the Swiss Alps, which, despite their proximity, have different thermal properties owing to differences in snow cover. The data are presented and their main features are discussed in §2. The hierarchical model is detailed in §3, with technical details given in appendix A and in the electronic supplementary material. Results for the two boreholes are given in §4.

## 2. Data and their properties

### (a) Data

The data presented here are obtained from the WSL Institute for Snow and Avalanche Research (SLF), and are linked to the Swiss permafrost monitoring network PERMOS. The two adjacent boreholes used in this paper (figure 1) are at the site Muot da Barba Peider, located near the village of Pontresina in the Eastern Swiss Alps. The site is equipped with experimental snow-supporting structures and instrumented boreholes, and has been part of the SLF permafrost-monitoring network since 1996. Site-specific information was obtained from M. Phillips (personal communication, 2011), and from Phillips (2006) and Zenklusen-Mutter *et al.* (2010). The boreholes are located at the same elevation (2960 m a.s.l.), only 50 m apart and approximately 30 m below the top of a northwest-oriented flank ridge. Borehole B1 is situated between snow-retaining avalanche defence structures and borehole B2 is in undisturbed terrain. The drilling stratigraphy in the uppermost 6 m at a location between B1 and B2 shows ground ice inside the talus that reaches a depth of about 4 m, with frozen bedrock below (figure 2).

Temperatures are measured hourly in the boreholes at 10 different depths between 0.5 and 17.5 m using YSI 44008 thermistors (Yellow Spring Instruments) with a calibrated precision of 0.02^{°}C. Daily mean temperatures are registered using Campbell CR10X data loggers. The measurements analysed here are daily mean temperatures from November 1996 to July 2010 (figure 3). Data from 27 March to 9 July 1999 are missing at all depths in B2 owing to battery failure. These data, which are among the longest available time series of ground temperature in the Swiss Alps, were previously analysed by Phillips (2006) and Zenklusen-Mutter *et al.* (2010).

### (b) Qualitative discussion

We begin with a preliminary qualitative analysis of the data (based on figure 3) in order to highlight the main specificities of temperature in permafrost.

Near-surface ground temperatures show large yearly fluctuations, mostly driven by ambient air temperature and solar radiation, whose amplitude decreases with depth. In deep bedrock layers, temperatures are almost stable throughout the year, because heat penetration is delayed and attenuated (Zenklusen-Mutter & Phillips in press). In purely conductive soils, such as bedrock, this delay depends on their properties and is measured by the thermal diffusivity (see equation (3.4)). In near-surface layers, non-conductive processes such as convection owing to the infiltration of water or air may also occur (Kane *et al.* 2001).

Snow cover strongly influences ground temperatures at high altitudes. Thick snow cover is a good insulator and decouples ground temperature from conditions in the atmosphere. Winters with a long delay before thick snow cover is established have a pronounced cooling effect on the ground, whereas heat is trapped in the ground in winters with early thick snow cover (Zenklusen-Mutter *et al.* 2010). The insulating effect of snow cover explains why temperatures in B1 and B2 differ so much, although they are located only 50 m apart. The defence structures located close to B1 delay the snow melt there (Phillips 2006), and thus modify its ground thermal regime (figure 1). Thicker snow cover in winter at B1 insulates near-surface temperatures better from colder air (e.g. winter 2001–2002). Similarly, the longer persistence of the snow cover in spring/summer at B1 better insulates near-surface ground layers from warmer air temperatures.

Despite some differences between B1 and B2, a previous study (Zenklusen-Mutter *et al.* 2010) revealed consistent temperature trends for the two boreholes. Temperatures in near-surface coarse-blocky ground layers show a slight increase in amplitude; so, at the surface, the maximum temperature tends to increase in summer and the minimum temperature tends to decrease in winter. This may be due to the slow decrease in winter snow cover (Zenklusen-Mutter *et al.* 2010), consistent with Marty (2008) and Marty & Blanchet (in press). An overall warming trend has been found in deeper bedrock layers, where the annual cycles are less pronounced. This could indicate changes in the long-term thermal regime, perhaps as a result of climate warming, or may be the result of heat transfer within complex topography (Kohl 1999).

This discussion reveals the complexity of ground temperature in permafrost and motivates development of a flexible model that can adapt to the data, without being site-specific. The approach we propose is presented in §3. We develop a model that can reconstruct daily ground temperatures on a regular grid downwards from the surface, when data are available only at certain depths. The modelling framework is stochastic and is based on the heat flow equation, which is deterministic in its traditional use but is here applied in a stochastic way, bringing extra flexibility to the underlying physical model of heat flow.

## 3. Hierarchical modelling of ground temperatures

### (a) Hierarchical approach

Hierarchical models are very useful for complex and/or high-dimensional statistical problems. The essential idea is to approach complex problems by breaking them into a series of simpler parts, corresponding to conditional probability distributions. The strategy is typically based on the formulation of three primary statistical models or stages, namely:

— stage 1. Data model: [data|process,

*θ*_{1}];— stage 2. Process model: [process|

*θ*_{2}]; and— stage 3. Parameter model: [

*θ*_{1},*θ*_{2}],

where the square brackets indicate probability densities and *θ*_{1} and *θ*_{2} denote parameters introduced in the modelling. The stage 1 model specifies the distribution of the data, in our case, measured ground temperatures, given the process of interest, i.e. the modelled ground temperatures at all depths, and parameters that describe the data model. The stage 2 model specifies a model for the process, conditional on further parameters. Finally, the stage 3 model assigns prior distributions to all the parameters introduced at other levels. These distributions can themselves depend on further fixed or random parameters, called hyperparameters.

The definition of these primary stages allows us to break the initial problem into easier subproblems. Indeed, we are ultimately interested in learning about the unknown quantities of interest, i.e. in updating the distributions of the unknown variables (the process and parameters) given the data. This posterior distribution is obtained by Bayes' theorem, expressed as
3.1
The fundamental idea is that the posterior density [process,*θ*_{1},*θ*_{2}|data], which is usually very complex, is decomposed into a product of simpler conditional distributions defined in the stages above. However, in practice, the posterior densities are usually too complex to be computed analytically, and Markov chain Monte Carlo (MCMC) simulation (Gilks *et al.* 1996; Robert & Casella 2004) is used to sample from (3.1), which is then estimated using standard techniques.

### (b) Notation

We write the observed temperatures as *z*_{j,t}, for depth levels *j*=1,…,*n* and times *t*=1,…,*T*. In the application of §4, we have *n*=10 and *T*=5027 days. Our goal is to reconstruct the unobserved temperature profile, at times and depths for which no data are available. We let denote the *N*+2 equidistant depths of interest, at a spacing of Δ_{x} metres and let denote the *T* times, at a spacing of Δ_{t}=1 day. In the application of §4, we will take *x*_{0}=0 m, Δ_{x}=0.5 m and *N*=39. We write *u*(*x*,*t*) for the modelled temperature at depths and time . The process *u*(*x*,*t*) is unobserved; it will be termed a hidden process. Our goal is to reconstruct it based on the measurements *z*_{j,t}. The top- and bottom-level values, *u*(*x*_{0},*t*) and *u*(*x*_{N+1},*t*), are used as boundary conditions and require special treatment. For simplicity, we sometimes write *u*_{top,t}≡*u*(*x*_{0},*t*) for the top-level temperature, *u*_{bot,t}≡*u*(*x*_{N+1},*t*) for the bottom-level temperature and *u*_{i,t}=*u*(*x*_{i},*t*) (*i*=1,…,*N*) for the temperatures between them.

### (c) Stage 1: data model

Let *z*_{t} denote the *n*×1 vector of observations [*z*_{1,t},…,*z*_{n,t}]′ and *u*_{t} the *N*×1 vector of hidden temperatures inside the bulk [*u*_{1,t},…,*u*_{N,t}]′; this does not contain the boundary layers *x*_{0} and *x*_{N+1}. The observed temperature is supposed to be a noisy version of the modelled one, due, for example, to measurement error. We assume independent Gaussian errors, that is
3.2
where *i*(*j*)∈{1,…,*N*} is the modelled depth nearest to the observed depth *j*∈{1,…,*n*}. The variance *σ*_{z,j} is expected to be larger for *j* small (near-surface) than for *j* large (deeper in the ground), and is modelled as random (see §3*e*). We write the observations at time *t* in vector form as
3.3
where is the *n*×1 vector , is an *n*×*n* diagonal matrix with elements and *D* is a known *n*×*N* matrix that maps the observed depths, *j*∈{1,…,*n*} to the gridded depths, *i*∈{1,…,*N*}. In the application of §4, we choose to contain all observed depths, so *D* is an incidence matrix.

### (d) Stage 2: process model

To set up the process level, recall that in the continuous case and under transient conduction, the general equation for heat flow is (Carslaw & Jaeger 1986)
3.4
with appropriate boundary conditions, where *δ*(*x*,*t*) is a spatio-temporal diffusivity parameter that determines how rapidly the soil at depth *x* and time *t* adjusts its temperature to that of its surroundings: the higher the diffusivity, the faster the heat is conducted, and thus the more rapidly the temperature adjusts. Heat flow equations are the basis of all geothermal models (Riseborough *et al.* 2008), but are typically used with two important restrictions that make them less flexible than (3.4). First, a constant diffusion parameter *δ*(*x*,*t*)≡*δ* is usually used (Riseborough *et al.* 2008), so the ground is implicitly assumed to be homogeneous in space and time, in contradiction both with the borehole stratigraphy of figure 2 and with the fact that infiltration of water during the snow melt season, and freezing and thawing of the active layer, change the near-surface thermal properties (Hinkel 1997). Second, the heat flow equation is almost always used in a deterministic way. As in Wikle (2003) and Brynjarsdóttir & Berliner (2011), we will use (3.4) as the basis of a stochastic model in which both *u*(*x*,*t*) and *δ*(*x*,*t*) are unobserved processes.

We use an implicit finite difference scheme to discretize (3.4), with a backward difference for the derivative in time and a second-order central difference evaluated at time *t*+1 for the derivative in space. Writing *δ*_{i,t}≡*δ*(*x*_{i},*t*) and with the notation of §3*b*, this gives for *i*∈{1,…,*N*} and *t*∈{1,…,*T*−1} the equation
3.5
We use the implicit scheme rather than the explicit scheme of Wikle (2003) because the former is always numerically stable and does not require an upper bound for *δ*; this choice has very little influence on the computation of *u*(*x*,*t*) for the values of *δ* for which the explicit scheme is well-defined. Equation (3.5) can be rearranged as
which can be rewritten in vector form as
yielding
3.6
where *G*_{t+1}(*δ*) is an asymmetric tridiagonal *N*×*N* matrix with tridiagonal elements depending on *δ*_{0,t+1},…,*δ*_{N+1,t+1}. The *N*×2 matrix consists of zeros except in the top-left and bottom-right elements, which involve *δ*_{0,t+1},*δ*_{1,t+1},*δ*_{N,t+1},*δ*_{N+1,t+1}, and is the 2×1 vector [*u*_{top,t+1},*u*_{bot,t+1}]′. Thus, only the non-zero elements in the vector are the first and last, which, respectively, involve *u*_{top,t+1} and *u*_{bot,t+1} and specify the edge effects.

Some differences with the process model of Wikle (2003) are noteworthy. First, unlike in the explicit scheme of Wikle (2003), updating of *u*_{t+1} in (3.6) requires the inversion of a tridiagonal *N*×*N* matrix. However, such inversion has been extensively studied and many formulae exist (Meurant 1992); we used the R function Solve.tridiag from package limSolve. Second, the matrices *G*_{t}(*δ*) and are time-dependent. In the most general case, *T*−1 such matrices must be computed. However, given the model we use for *δ* (see §3*e*), many fewer such matrices are needed. Finally and most importantly, *u*_{t} (*t*=2,…,*T*) is here fully determined given the diffusivity *δ*, the upper and lower boundary conditions *u*_{top,t}, *u*_{bot,t} (*t*=2,…,*T*) and the *u*_{i,1} (*i*=1,…,*N*), as we can write
3.7
where *u*_{1} is the *N*×1 vector [*u*_{1,1},…,*u*_{N,1}]′. This is not the case in Wikle (2003), where an additional error term appears in (3.6). Thus, here the deterministic heat equation receives a stochastic treatment, because its solution (3.7) involves stochastic terms, whereas in Wikle (2003) the heat equation is itself treated as stochastic. Our assumption greatly reduces the number of variables that must be simulated but may lead to a loss of flexibility. However, here the boundary conditions (*t*=1,…,*T*) and *u*_{1} are random processes, whereas they are fixed by Wikle (2003); this re-injects flexibility into our modelling. Stochastic boundary conditions in deterministic boundary value problems are also used in Wikle *et al.* (2003). As will be shown in §4, our overall hierarchical model turns out to be reasonable. We return to the model of Wikle (2003) in §5.

It follows from (3.7) that *u*_{t} (*t*=2,…,*T*) can be written as , where the function *f* is deterministic. Thus, given the initial condition, the upper and lower boundary conditions and the estimated diffusivity, the unobserved temperatures *u*_{t} can be computed for *t*=2,…,*T*; this enables us to ‘fill in’ the missing data for B2, which of course have an unconditional probability distribution owing to the randomness of *u*_{1}, etc. The data model (3.3) is given by
and

For the boundary conditions, we assume that the upper temperature, *u*_{top,t}, is driven by a seasonal cycle owing to its dependence on the ambient air, whereas the bottom temperature (*u*_{bot,t}) is decorrelated from the ambient air and is modelled as white noise:
3.8
and
3.9
Zenklusen-Mutter *et al.* (2010) used similar equations in a frequentist framework. Both (3.8) and (3.9) allow trends, but in different ways. Consistent with Zenklusen-Mutter *et al.* (2010) (see also §2*b*), we assume a trend in the amplitude and periodicity of the cycles at the upper boundary, and a possible overall trend at the bottom boundary. The parameters *m*_{top}, *c*_{top,t}, *d*_{top,t}, *σ*_{utop}, *m*_{bot,t} and *σ*_{ubot} are stochastic variables defined in §3*e*.

The first-day temperature in the bulk, *u*_{i,1} (*i*=1,…,*N*), is also a boundary condition. We take
3.10
where *m*_{ui,1} and *σ*_{ui,1} are stochastic parameters defined in the parameter model.

### (e) Stage 3: parameter model

The data parameters are the parameters of (3.2). For computational convenience and to ensure positivity, we let
where denotes the inverse gamma distribution with shape parameter *a*>0 and scale parameter *b*>0. We use a tilde to denote hyperparameters, such as and , for which we do not specify probability distributions; their values and those of other hyperparameters are fixed to the values in table 1.

The process parameters are *δ*_{i,t}, *m*_{top}, *c*_{top,t}, *d*_{top,t}, *σ*_{utop}, *m*_{bot,t} and *σ*_{ubot}. The first parameter (*δ*_{i,t}) represents the diffusivity of the layer at depth *x*_{i} metres and time *t* days. It measures how quickly heat diffuses. As the diffusivity is quite difficult to estimate, we do not define *δ*_{i,t} as *N*×*T* parameters, but rather use a simpler model. The stratigraphy of the boreholes shown in figure 2 shows roughly four ground layers, at increasing depths: coarse scree from 0 to 1.6 m, silt and moisture from 1.6 to 2.2 m, clay and silt with ice lenses from 2.2 to 3.5 m, and bedrock below. It is thus natural to define different diffusivities for these layers. Without additional assumptions, this would give us 4×*T* diffusivity parameters, which is still very large, but as the whole borehole core is frozen except in summer in the active layer, we make the crude assumption that all frozen layers have same diffusivity, irrespective of the depth, as if they were all made of the same material. The active layer is unfrozen in summer and the different ground materials therein may have different thermal properties, and thus correspond to different diffusivity parameters. Ideally, therefore, *δ*_{i,t} should take different values for positive and negative ** u**. However, this would greatly complicate the simulation procedure because the full conditional distributions (see the electronic supplementary material) could not be computed analytically for the boundary conditions. Complex Metropolis–Hastings steps would be necessary in the algorithm, making the simulation slow and potentially unstable. We thus make the simplifying assumption that only the

*K*first layers of the ground in summer can be driven by different diffusivities. This leads us to set where denotes the summer days, taken in §4 to be the months July–October. The term

*l*(

*x*) represents the ground type at depth

*x*:

*l*(

*x*)=1 if 0≤

*x*≤1.5 m,

*l*(

*x*)=2 if 1.5<

*x*≤2.5 m,

*l*(

*x*)=3 if 2.5<

*x*≤3.5 m and

*l*(

*x*)=4 if

*x*>3.5 m. Hence,

*δ*

_{0}represents the diffusivity in frozen ground whatever the depth, and

*δ*

_{1},

*δ*

_{2},

*δ*

_{3}are the diffusivities in the upper three ground types of the active layer in summer. As the active layer thickness does not exceed 3 m,

*K*=2 or 3 seems to be a natural choice. In §4, we will consider the choices

*K*=0,1,2,3, leading to four different models. The coefficients (

*δ*

_{0},…,

*δ*

_{K}) are positive random variables, and for computational convenience, we let where

*m*

_{δk}and (

*k*=0,…,

*K*) are random hyperparameters, defined as with hyperparameters , , and .

The parameters of *u*_{top,t} in (3.8) are specified as
and
Thus, *c*_{top,t} and *d*_{top,t} are random walks, allowing a large flexibility in the amplitude and periodicity of the cycles near the surface, as shown in figure 3. The parameters of *u*_{bot,t} in (3.9) are given by
and
Hence, *m*_{bot,t} is a random walk, which provides flexibility in the estimation of the temporal trend in the bottom boundary, if any. The parameters of *u*_{i,1} (*i*=1,…,*N*) in (3.10) are defined as

### (f) Hyperparameters

The values of the hyperparameters (given in table 1) are all relatively uninformative. Those related to *δ* are based on knowledge about the geophysics of boreholes and are somewhat more informative. A limited sensitivity analysis to this choice of values, provided at the end of §4, nevertheless suggests that the output is not heavily dependent on these choices.

The choice of parameters and (*j*=1,…,*n*) is inspired from Brynjarsdóttir & Berliner (2011). These parameters model the standard error, *σ*_{zj}, between observed and modelled temperature. Owing to the smaller variability of ground temperatures at depth, we expect this error to decrease deeper in the boreholes. For this reason, we held fixed and let be a decreasing function of depth (table 1). The standard deviation of the measurement error *σ*_{zj} then lies between 0.15^{°}C and 0.70^{°}C at 0.5 m and between 0.05^{°}C and 0.25^{°}C at 17.5 m with probability 95 per cent.

The parameters and (*k*=1,…,*K*) are set equal at all depths *k* (table 1). With 95 per cent probability, the median *e*^{mδk} of the diffusion parameter *δ*_{k} lies between 0.05 and 0.13 m^{2} per day, i.e. between 1.4×10^{−5} and 3.6×10^{−5} m^{2} s^{−1}. For comparison, the thermal diffusivities of copper, iron, quartz, air and water are, respectively, 0.4,0.08,0.005,0.06 and 0.0005 m^{2} per day. The chosen values of and are such that when the s.d. of *δ*_{k}, , lies between 0.01 and 0.07 m^{2} per day with probability 95 per cent.

The parameters , and are obtained by fitting a linear model of the form (3.8) to all observed depths *j* and extrapolating each coefficient to depth *x*_{0}=0 m using a smoothing spline. Similarly, is obtained by computing the temperature means at all observed depths *j* and extrapolating them to depth *x*_{N+1}=20 m using a smoothing spline. Parameters and describe the model standard error *σ*_{utop} in the top layer. With the chosen values, *σ*_{utop} lies between 0.15^{°}C and 0.70^{°}C with probability 95 per cent. As we expect the model standard error to be smaller in the bottom layer, we choose and so that the model standard error in the bottom layer is between 0.04^{°}C and 0.22^{°}C with probability 95 per cent.

The parameters (*i*=1,…,*N*) are obtained by fitting a spline to the first-day temperatures for all observed depths *j* and interpolating them to all modelled depths *i*. With the chosen values of and , the first-day standard error decreases at depth. With probability 95 per cent, it lies between 0.15^{°}C and 0.70^{°}C at *x*_{1}=0.5 m and between 0.07^{°}C and 0.33^{°}C at *x*_{N}=19.5 m.

### (g) Inference

As explained in §3*a*, we wish to make inference about the hidden processes using their posterior distribution, but this is too complex to be computed analytically (see appendix A). Fortunately, MCMC methods can be used to simulate indirectly from it (Gilks *et al.* 1996; Robert & Casella 2004). The approach is described in appendix A and in the electronic supplementary material. The correctness of our algorithm was validated using simulated artificial data.

## 4. Results

We applied our hierarchical model separately to the data for the boreholes B1 and B2 described in §2. For both boreholes, we used *x*_{0}=0 m, *x*_{N+1}=20 m and Δ_{x}=0.5 m, so ground temperatures can be predicted on a 0.5 m grid between the surface and a depth of 20 m. This rather fine grid barely affects the number of stochastic variables in our model, because it affects only the *N*×1 vector *u*_{1}. The value Δ_{x}=0.5 m was chosen based on a bias–variance tradeoff: increasing Δ_{x} would give a rougher reconstruction of temperature profiles, and decreasing Δ_{x} would give a finer reconstruction but could lead to numerical problems when inverting the *N*×*N* matrices *G*_{t}(*δ*). We consider four variants of the model of §2, depending on the value of *K* in the definition of the diffusivity *δ*_{i,t} of (A1):

M0: the diffusivity does not change in unfrozen ground:

*δ*_{i,t}=*δ*_{0}for all*i*=1,…,*N*and*t*=1,…,*T*; equivalently*K*=0.M1, M2 and M3: the diffusivity is different in the

*K*=1,2,3 upper layers of the active layer in summer.

We compare the results obtained for the four models with the two boreholes. We also compare the predicted parameters in the two boreholes to assess the influence of the avalanche structure near B1.

In total, our hierarchical models have about 10 000 stochastic variables, each of which is generated at each iteration of the MCMC algorithm. For each model and each borehole, the MCMC simulation was run for 30 000 iterations, taking about 30 h. Assessing convergence of MCMC algorithms is awkward, particularly in high-dimensional models. We used standard graphical tools (e.g. trace plots, running mean plots and autocorrelation plots) and the Geweke diagnostic to assess convergence of certain of the model parameters (Geweke 1992). These assessments suggested convergence of the chains after a burn-in period of 10 000 iterations. This rather slow convergence is mainly owing to the diffusivity parameters *δ*_{1}, *δ*_{2}, *δ*_{3} in the unfrozen active layer, which are quite difficult to estimate. We based inference on the final 20 000 iterations.

Selection among the four models can be performed using the deviance information criterion (Spiegelhalter *et al.* 2002). If ** w** denotes the model parameters, then the deviance

*D*(

**)=−2ℓ(**

*w***) is defined as twice the negative log-likelihood and the deviance information criterion (DIC) is where is the posterior mean of the deviance over the retained MCMC steps, is the deviance evaluated at the posterior mean of**

*w***, and is considered to be the effective number of parameters. DIC is a penalized likelihood criterion: the posterior mean deviance decreases when the model complexity increases, but is offset by an increase in**

*w**p*

_{D}. Models with smaller DIC are typically preferred to models with larger DIC, although Spiegelhalter

*et al.*(2002) emphasize that DIC should not be used as a strict criterion for model selection, but rather to screen candidate models for further consideration.

The values of DIC for models M0–M3 are given in table 2. The improvement between M0 and the other models is clear: DIC is at least 677 units larger under M0 at B1 and 834 at B2. Differences between models M1, M2 and M3 are smaller, but still non-negligible. The lowest DIC is found for model M2 at B1 and model M3 at B2, and the second lowest is M3 for B1 and M1 for B2. The difference with between the lowest and second lowest DIC is 199 units at B1 and 66 units at B2. Figure 3 suggests that the maximum active layer thickness is between 1 and 2 m at B1 and between 2 and 3 m at B2. This corresponds, respectively, to the second and third upper layers of the ground, so it is not surprising that *K*=2 (i.e. model M2) at B1 and *K*=3 (i.e. model M3) at B2 give the lowest DIC.

As a complement to DIC, we also computed the L criterion introduced in Laud & Ibrahim (1995) as a posterior predictive check for model selection. This measures model performance by accounting for both how the predictions match the data and for their variability. Better models should have lower values of L. The computed values of L for models M0–M3 (given in table 3) confirm the poorer fit of model M0, particularly for borehole B2. Both DIC and L suggest that model M1 is best for borehole B1 and model M3 is best for B2.

A better understanding of the differences among the estimated models, and hence of the differences in DIC and L, is provided by the estimated diffusivity coefficients *δ*_{k}, whose posterior means and standard deviations are given in table 3. The diffusivity *δ*_{0} is fairly constant in both boreholes, irrespective of the model. This was expected because the boreholes are only 50 m apart and the frozen ground at them has similar thermal properties. Defining *δ*_{0} with (model M0) or without (models M1–M3) the summer days does not seem to influence it much. Whatever the model, *δ*_{1} in the 0–1.5 m layer in summer is much lower than the diffusivity elsewhere, consistent with figure 2. This layer is composed of coarse talus and contains air-filled voids, providing good thermal insulation (Gruber & Hoelzel 2008). However, *δ*_{1} seems to be slightly larger at B1 than at B2, perhaps because snow persists longer at B1 in spring and early summer. This makes the near-surface coarse talus at B1 wetter in summer, leading to higher diffusivity. For both M2 and M3, the highest diffusivity in both boreholes is found at depths of 1.5–2.5 m (*δ*_{2}), consistent with figure 2. This layer is composed of finer particles of silt and clay through which heat can propagate faster. However, estimation of *δ*_{2} at B2 is quite uncertain, perhaps because the 1.5–2.5 m layer at B2 is in the transition zone, i.e. is sometimes frozen and sometimes unfrozen in summer (see also figure 7), which might make the estimation of a single *δ*_{2} more difficult. The larger uncertainty of *δ*_{2} under M2 at B2 might explain why, according to table 2, M1 is preferred to M2 at B2. Finally, the diffusivity at 2.5–3.5 m depth (*δ*_{3} in M3) is very similar to that of the frozen ground (*δ*_{0}) at B1, because this layer is also frozen in summer. This explains why, according to table 2, *δ*_{3}=*δ*_{0} is selected (i.e. model M2).

Although the DIC and L values in table 2 seem to eliminate model M0 for both boreholes, the choice between models M1, M2 and M3 remains open. A closer look at the other variables shows rather similar results for the three models, so for conciseness we report the results only under the models that DIC would select, i.e. M2 for B1 and M3 for B2. This also corresponds to the models we would a priori have chosen given our knowledge about the stratigraphy (figure 2).

Figure 4 shows summaries of some boundary parameters at B1 and B2 under these models. Our modelling nicely reproduces the different scenarios of heat penetration into the top ground, *u*_{top}, without requiring the use of snow cover or air temperature data. For example, the quite warm winter 2000–2001 and the very cold winter 2001–2002 both appear clearly, because the time-varying stochastic variables *c*_{top,t} and *d*_{top,t} in (3.8) allow us to model *u*_{top} with great flexibility. The time-varying wave amplitude in the top is and the phase is modulo 2*π*. The ‘zero-curtain effect’ (Outcalt *et al.* 1990), when temperatures remain close to 0^{°}C for a while, is also visible. This happens in autumn because the freezing of moisture in the active layer releases latent energy, and again in spring when air temperature increases, because large amounts of energy are needed to thaw ground ice. The autumn and spring zero-curtains can last from several days to several weeks. This variability is reproduced in the posterior mean of *u*_{top} in figure 4, although a simple model of conductive heat was used. In particular, the long zero-curtain of autumn 2001 is nicely found. It is produced by the variables *c*_{top,t} and *d*_{top,t} which are simulated by the MCMC in such a way they produce an almost constant amplitude and a large phase (figure 4*e*,*f*). Once the ground is frozen, it starts to cool more or less quickly, depending on air temperature and snow cover. This is also well represented by our model, through the variables *c*_{top,t} and *d*_{top,t}. For example, the fast cooling in December 2001 and the slow cooling in December 2003 are both found. Finally, the large differences in near-surface temperatures in summer at B1 and B2 appear clearly. Surface temperatures stay close to 0^{°}C in summer at B1 owing to the longer persistence of snow.

Figure 4 shows that temperature at 20 m depth (*u*_{bot}) is much more stable than at the surface, because the warming and cooling produced by ambient air are attenuated with depth. Hence, low amplitude waves are found, driven by the time-varying variables *m*_{top,t} modelled as a random walk (see (3.9)). An almost linear increasing trend seems to affect the bottom temperatures up to year 2003 at B1 and year 2005 at B2. This was also noticed by Zenklusen-Mutter *et al.* (2010), although the source of this heating affecting deep depths (*u*_{bot}) rather than the near-surface (*u*_{top}) remains unclear. Furthermore, despite large differences in summer at the surface, temperatures at depth in B1 and B2 were very similar before 2003, but since 2005 a shift of 0.15^{°}C has appeared owing to the warming that occurred at B2 during 2003 and 2004.

Figures 5 and 6 show the posterior mean of the modelled temperature ** u** at the four first depths of B1 and B2 where data are available. This can serve as model validation. For legibility, we show only the estimated temperature for years 2000–2007, which include the coldest winter 2001–2002 and the warmest summer 2003. The predicted temperature

**fits the negative ground temperatures very well but fails to capture the largest positive peaks in summer, in particular at B2 (figure 6) where summer temperatures are much higher. For comparison, figures 5 and 6 also show the ground temperature predicted by the simplest model M0, in which the active layer is treated like the rest of the permafrost body. The less good fit of M0 is clearly visible for the positive temperatures, particularly at B2, whereas negative temperatures are similarly well fitted. The poorer overall performance of M0 is consistent with table 2. Using different diffusivities in the unfrozen, active layer seems to help in modelling the positive peaks, although it remains unsatisfactory for the highest ones. We return to this in §5. Temperatures at deeper depths are very well fitted at both boreholes, as confirmed by the error parameters (**

*u**j*=1,…,

*n*) of (3.2), given in table 4. As expected, decreases with depth. In the active layer (

*j*=1,2,3), it is higher at B2 than at B1, owing to the worse fits to the positive temperatures (figures 5 and 6).

A closer look at the differences of the modelled temperature ** u** between B1 and B2 reveals that large differences are experienced in early spring. The spring surface temperature at B2, although negative, is about 2

^{°}C warmer than at B1, with a maximum difference of 3

^{°}C in spring 2005. This is also true in deeper ground: in spring, B2 is about 0.5

^{°}C warmer than B1 at 5 m depth, 0.2

^{°}C warmer at 1 m and 0.15

^{°}C warmer below 15 m. This is due to the shorter persistence of snow cover at B2. Similarly, B2 is colder than B1 in mid-winter because its thinner snow cover makes the outflow of heat easier.

However, this affects only the top 5 m: in mid-winter B2 is about 0.5^{°}C colder than B1 at the surface (rising to 1^{°}C in January 2005); but at 5 m, B1 and B2 have same temperature in mid-winter (again, except in January 2005). This nicely illustrates the insulating effect of snow cover on ground temperature that influences the temporal development of the active layer. The depth of penetration of the 0^{°}C isotherm (i.e. the active layer depth) at B1 and B2 is shown in figure 7. This can be seen as the temporal thermal border of the active layer: the upper part is unfrozen, whereas the lower part is frozen. This figure was obtained by interpolating the modelled temperature ** u** linearly between the two depths closest to 0

^{°}C. As expected, the 0

^{°}C isotherm penetrates more deeply at B2 than at B1. The active layer thickness, namely the maximum seasonal depth of penetration of the 0

^{°}C isotherm into the ground (Burn 1998), is given in table 5. It was about 0.7–1.0 m deep at B1 and about 1.7–2.2 m deep at B2 before the heat wave of summer 2003, which produced a deeper penetration of the 0

^{°}C isotherm: during that summer, the active layer was about 1.7 m deep at B1 and 3 m deep at B2. Summer 2003 was the hottest summer on record in Europe since at least 1540 (Beniston 2004); the heat wave began in June and continued until mid-August, raising summer temperatures 20–30% higher than the seasonal average. In Switzerland, previous records for summer maximum temperatures, observed in the late 1940s and early 1950s, were broken in many locations. Figure 7 shows that this produced not only a deepening of the active layer in summer 2003, but it also modified the ground durably. Seven years later, the active layer is still about 40 cm deeper at B1 and B2 than before summer 2003. This long-term degradation illustrates that how permafrost reacts sensitively to changes in air temperature, and in particular how a few weeks of extreme heat can have long-term effects. Given that such events are likely to occur more frequently in the future (Beniston 2004), these results can help to project how permafrost is likely to be affected by climate change.

The previous results were obtained using the hyperparameter values in table 1. We carried out a limited analysis of the sensitivity of our results to these values. Because the estimation is computationally demanding, we focus on the most difficult borehole, B2, and the most complex model, M3. As shown in table 1, the hyperparameters form five groups, corresponding to the model noise, the diffusion parameters, the top layer boundary conditions, the bottom layer boundary conditions and the first-day temperature. The noise model is expected to be insensitive to the prior, so we restricted our sensitivity analysis to the four other groups, performing four further MCMC runs for model M3 at borehole B2. In each of these, all the parameters of just one of the four groups were changed by multiplying their standard deviations by a factor 10, with the other hyperparameters unchanged. In order to speed up the convergence of the MCMC runs, we initialized all chains with the final iteration of the chain corresponding to the results described earlier. The three chains were run for 20 000 iterations, and the last 10 000 of these were used for inference. In all cases, the results showed very good agreement of the posterior densities with those obtained above, confirming that the results are not sensitive to the choice of hyperparameters.

## 5. Conclusion and discussion

We have proposed a physically based hierarchical model for ground temperature in permafrost, driven by a stochastic treatment of the heat equation, and designed to flexibly accommodate the local characteristics of ground temperatures, resulting for example from snow accumulation. However, ground temperatures are the only data needed for the model; no additional information about air temperature or snow cover is needed. The model can reconstruct ground temperatures at depth in boreholes with different thermal properties. The results obtained turned out to be very satisfactory in frozen ground, but the highest positive temperatures in the near-surface layers tend to be underestimated.

A reason for this underestimation might be that non-conductive processes, such as water and vapour flux, occur in unfrozen ground (Kane *et al.* 2001), but the heat equation (3.4) used in our model assumes pure conduction. Although our stochastic model adds flexibility and efficiency compared with a deterministic treatment, it is probably too simple to account for the large variability of temperature when vapour and water flux become predominant. In principle, their influence can be accommodated using a more general version of (3.4) including the freezing–thawing of water and the convective effect of water flow (Johnsson & Lundin 1991; Scherler *et al.* 2010); but to apply it, in practice, we would need measurements of water and vapour flux, at least in the near-surface layers.

A second improvement of the model, which would not require any additional data, might be to account for the incorrectness of the purely conductive heat equation (3.4) directly in the model. This could be performed by adding an error term corresponding to model misspecification in the discretization (3.7), as in Wikle (2003). The heat ** u** in the bulk would thus not be fully determined by the boundary conditions as in (3.7) but would still be stochastic conditional on the boundary conditions, unlike in the present model. This might provide the flexibility needed to accommodate non-conductive heat transfer. However,

**would then have to be simulated in the MCMC procedure alongside the other existing variables, drastically increasing the number of parameters and slowing down the estimation procedure, which is already quite intensive. First attempts in this direction raised some identifiability issues, but we intend to pursue it further.**

*u*## Acknowledgements

The authors thank Noel Cressie, Marcia Phillips, Jacques Rappaz and referees for helpful comments. This research was funded by the Swiss National Science Foundation and by the CCES project EXTREMES (http://www.cces.ethz.ch/projects/hazri/EXTREMES). The data belong to the SLF permafrost measurement network and PERMOS. The site at Muot da Barba Peider is instrumented by Canton Graubünden.

## Appendix A. Markov chain Monte Carlo simulation inference

Below we detail the procedure used to estimate the unknown processes of our hierarchical model. The target is the posterior density of the hidden processes given the data, which we write as
A1
A2
A3
A4
with straightforward notation for the *n*×1 vectors *m*_{u1}, , , the *T*×1 vectors *c*_{top}, *d*_{top} and the *K*×1 vectors ** δ**,

**, . Line (A1) is the data model, (A2) is the process model, and (A3) and (A4) are the parameter models.**

*m*_{δ}Computing the posterior requires the normalizing constant for the posterior density. This is too complex for analytical solution, but MCMC sampling can be used to approximate the posterior. The idea is to construct a Markov chain that has the posterior distribution as its equilibrium distribution. More precisely, let *w*_{1},…,*w*_{r} be the *r* unknown variables in the hierarchical model. Starting from initial samples , MCMC approaches are based on iteratively simulating new samples using the previous samples . After *q*_{0} burn-in iterations, realizations of the Markov chain , are viewed as simulations from the posterior distribution. Monte Carlo estimation techniques can be applied to these samples to estimate the unknown densities of *w*_{1},…,*w*_{r}. The Gibbs sampler algorithm produces a Markov chain with the posterior distribution at equilibrium (Robert & Casella 2004). At each iteration, new simulations are drawn from the full conditional distributions: at iteration (*q*), it generates the new sample according to
The Gibbs sampler requires that all full conditional distributions can be sampled exactly. Variables for which this is not possible can be sampled using a Metropolis–Hastings step. The detailed algorithm is given in the electronic supplementary material.

- Received October 10, 2011.
- Accepted December 12, 2011.

- This journal is © 2012 The Royal Society

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.