## Abstract

Despite the extreme complexity that characterizes the mechanism of the earthquake generation process, simple empirical scaling relations apply to the collective properties of earthquakes and faults in a variety of tectonic environments and scales. The physical characterization of those properties and the scaling relations that describe them attract a wide scientific interest and are incorporated in the probabilistic forecasting of seismicity in local, regional and planetary scales. Considerable progress has been made in the analysis of the statistical mechanics of earthquakes, which, based on the principle of entropy, can provide a physical rationale to the macroscopic properties frequently observed. The scale-invariant properties, the (multi) fractal structures and the long-range interactions that have been found to characterize fault and earthquake populations have recently led to the consideration of non-extensive statistical mechanics (NESM) as a consistent statistical mechanics framework for the description of seismicity. The consistency between NESM and observations has been demonstrated in a series of publications on seismicity, faulting, rock physics and other fields of geosciences. The aim of this review is to present in a concise manner the fundamental macroscopic properties of earthquakes and faulting and how these can be derived by using the notions of statistical mechanics and NESM, providing further insights into earthquake physics and fault growth processes.

## 1. Introduction

Fracturing processes giving rise to subsequent earthquakes is a complex phenomenon, such that the definition of the exact physics and the forecasting of upcoming events still represent an outstanding challenge. The complexity of the earthquake activity is manifested in the wide range of spatial and temporal scales that are incorporated in the process. Earthquake ruptures occur on a complex network of fractures and faults that vary from microcracks, to major fault zones and plate boundaries, whereas the temporal scales that are incorporated in the process vary from seconds (during dynamic rupture), to the hundreds, thousands and millions of years that characterize the repeat times of characteristic earthquakes and the evolution of fault zones and tectonic plate boundaries, respectively [1,2]. Although, given a large set of seismographic records, the history of fault rupture can be reconstructed *a posteriori*, the physical processes that lead to the initiation and propagation of a rupture through a fault system, giving rise to an earthquake, is really limited.

However, despite the extreme complexity that characterizes the mechanism of the earthquake generation process, simple general laws seem to describe the collective properties of seismicity and faulting. The well-established Gutenberg–Richter (GR) scaling relation indicates scale invariance in the frequency of the dissipated seismic energies, while the Omori–Utsu scaling relation implies that the aftershock production rate decays as power law with time [3]. Furthermore, the earthquake frequency–magnitude relationship is found to be applicable over a wide range of earthquake sizes both globally and locally and even down to the laboratory scale [4–6]. Short- and long-term clustering effects and scale invariance have also been exhibited in the temporal evolution of seismicity [7]. In addition, earthquakes exhibit fractal spatial distribution of epicentres and they occur on fractal-like structure of faults [8]. All these properties are strong observational evidences implying nonlinear dynamics in the earthquake generation process [9].

Although the collective properties of earthquakes and faulting can be reproduced by statistical models, the fundamental challenge in earthquake physics is to understand the transition from the microscopic scale and the laws that govern friction, rupture propagation, chemical reactions, fluid migration and so on, to the macroscopic scale of large earthquakes, fault networks and tectonic plate boundaries [10]. Towards this direction, considerable progress has been made in the statistical mechanics approach, which, based on the entropy principle, can be used to estimate the macroscopic properties of earthquakes and faulting from the specification of the relevant microscopic constituents and their interactions. In many studies, the classic statistical mechanics approach due to Boltzmann–Gibbs–Shannon has been used to describe the collective properties of earthquake populations (e.g. [2,11]).

However, typical complex systems whose elements are strongly correlated, i.e. the probability of a certain microstate occurring depends strongly on the occurrence of another microstate, violate Boltzmann–Gibbs (BG) statistical mechanics. Such systems, instead of the Boltzmann distribution, typically present power-law behaviour, enhanced by (multi-) fractal geometries, long-range interactions or large fluctuations between the various possible states; properties that seem to correspond well to the phenomenology of earthquakes. For the statistical mechanics description of such systems, a generalized framework has been introduced by Tsallis [12], known as non-extensive statistical mechanics (NESM). NESM generalizes BG statistical mechanics and its main advantage is that it considers all-length scale correlations among the elements of the system, leading to broad distributions with power-law asymptotic behaviour. The NESM concept has found so far many applications in nonlinear dynamic systems, including earthquakes (Tsallis [13] and references therein). In a series of publications during the last decade, it has been demonstrated that NESM can successfully reproduce the macroscopic properties of earthquake-related phenomena from the laboratory scale (e.g. [14,15]), to regional (e.g. [16–18]) and global scale (e.g. [19]).

The aim of this study is to review the fundamental properties of fault and earthquake populations and to demonstrate in a concise manner how these properties can be derived by using aspects of statistical mechanics and NESM. We focus on a variety of tectonic environments and scales and discuss the insights that are gained in the analysis of fault and earthquake populations, with specific interest in earthquake and fracture physics and seismic hazard assessment.

## 2. Statistical mechanics and the phenomenology of earthquakes and faulting

The collective properties of fault and earthquake populations exhibit some universal characteristics, in the sense that these are appearing in various tectonic environments and scales that may vary from the laboratory microscale to major fault zones and tectonic plate boundaries. It is exactly these properties that motivate the statistical mechanics approach to earthquake physics. In the following, the main properties of earthquakes and faulting and the classic statistical mechanics approaches that have been developed to address them are briefly reviewed.

### (a) Phenomenology of fault and earthquake populations

#### (i) Scaling properties of seismicity

One of the most well-known empirical scaling relations in geophysics is the GR scaling relation [20], which describes the frequency of earthquake magnitudes in a given region and takes the form:
*M* represents the earthquake magnitude, *N*(>*M*) is the number of earthquakes with magnitude greater than *M*, log is the logarithm to the base of 10 and *a* and *b* are positive constants. The constant *a* represents the regional level of seismicity and *b*, commonly known as the seismic *b*-value, is the slope of the distribution that estimates the proportion of small to large earthquakes (figure 1). The G-R scaling relation has been found to describe the frequency-size distribution of earthquakes in various seismic regions, tectonic environments and scales, with *b*-values that are generally close to one [22,23], although regional variations may appear (e.g. [24]).

The G-R scaling relation can alternatively be written as
*N* and the magnitude *M*. Furthermore, the earthquake energy *E* (in terms of the seismic moment *M _{o}*) is related to the earthquake magnitude

*M*as

*c*= 1.5 [25,26]. Equation (2.2) can then be written in terms of the seismic energies

*E*as

*E*, in terms of the seismic moment

*M*, is related to the surface area of a fault with the relationship

_{o}*M*=

_{o }*μ*Δ

*A*, where

*μ*is the rigidity of the material,

*A*the surface area of the fault break and

*Δ*the average slip during the earthquake [25]. According to the previous definition of

*M*and equation (2.4), the number of earthquakes

_{o}*N*(>

*A*) with rupture areas greater than

*A*has a power-law dependence on the area

*A*(e.g. [8]). The latter implies that the empirical G-R scaling relation is equivalent to a fractal distribution [27].

In the distribution of larger earthquake magnitudes, an upper bound or a taper may appear that is associated with the finite energy release rates [28]. This upper bound limits the number of the maximum expected earthquake magnitudes and has been modelled by modifying the G-R scaling relation to include an exponential tail or a taper [29]. The modified G-R (MGR) scaling relation then takes the form of a gamma distribution:
*E*_{c}.

Regarding the spatio-temporal properties of seismicity, the most prominent feature is the presence of clustering, where correlated earthquake sequences exhibit scale-free structures [30]. Spatial clustering is exemplified by the concentration of seismicity along the tectonic plate boundaries and regional fault networks (e.g. [1]). Temporal clustering is best observed in aftershock sequences, where a significant increase in the regional seismicity rate occurs immediately after the occurrence of a strong earthquake. The aftershock production rate *n*(*t*) = d*N*(*t*)/d*t* (where *N*(*t*) is the number of earthquakes in time *t* after the major event) has empirically been found to decay as a power law with time *t* according to the modified Omori formula [3]:
*K* and *c* are constants and *p* is the power-law exponent. The modified Omori formula expresses a short-term clustering effect that can be seen in almost every seismic catalogue [31]. However, a truncation of the power-law regime may appear at short time-scales following the main shock that can be attributed either to missing recorded events immediately after the main shock or to a real non power-law regime (e.g. [32]).

The background activity of main earthquakes in a seismic region is often considered as uncorrelated and statistically independent in time and models that express randomness, like the general Poisson model, have been used to describe its temporal properties [33]. A Poissonian background activity combined with triggered aftershocks that scale according to the modified Omori formula can be realized by the distribution of inter-event times *τ*, i.e. the time intervals between the successive earthquakes in a seismic region, which frequently takes the form of a gamma distribution (e.g. [34,35]):
*C*, *γ* and *τ*_{0} are positive constants. The latter expression exhibits two regions, where short inter-event times, associated with short-term clustering effects and aftershock sequences, scale as a power law with exponent 1 − *γ*, while long inter-event times that are associated with the background activity exhibit Poissonian behaviour and exponential scaling. This type of scaling has been found in stationary earthquake time series [35–37] and is approximately the one predicted by the epidemic-type aftershock sequence model [35,38,39]. However, in non-stationary earthquake time series long-term clustering effects and power-law scaling have been found to characterize long inter-event times [7,34,40,41]. The latter indicates clustering in both short and long inter-event times and implies memory effects in the earthquake generation process [42,43].

In addition, properties like non-stationarity and intermittency in earthquake time series are indicative of heterogeneous clustering effects that are quantitatively consistent with multifractal geometries (e.g. [44]). The multifractal structure of earthquake time series is evident in various earthquake catalogues [45–47] and multifractal analysis is used as a second-order approximation to enlighten the correlations of seismicity and the local clustering effects [44,48].

#### (ii) Scaling in fault populations

Faults are key components in the dynamic evolution of the Earth's lithosphere and surface and are the sites of large earthquakes. Crustal strain is accommodated in systems of faults and fractures that exhibit complex geometries and sizes that vary from a few millimetres to tens or hundreds of kilometres [1]. Fault systems appear self-similar in a wide range of scales, indicating scale invariance and the absence of a characteristic length scale in the fault growth process [11,49,50]. Scale invariance in fracturing processes is also supported by the frequency-size distribution of earthquakes that scales according to the G-R relation (equation (2.1)), which represents a power-law relationship between the number of earthquakes and the rupture area (see equations (2.1)–(2.4) and the previous section). However, even if scale invariance does hold, it is restricted to a finite range of scales that are associated with the size of the fractured medium.

In support to scale-invariant fault growth is the power-law scaling of fault lengths *L* found in numerous studies and in various tectonic environments and scales (e.g. [51–53]). In terms of the cumulative distribution of fault trace-lengths, the power-law relationship is expressed as
*N*(>*L*) is the number of faults with lengths greater than *L*, *A* is a scaling constant and *D* the power-law exponent.

In other case studies though, the exponential function has been found to best describe the cumulative number of fault lengths *N*(>*L*). In this case, the exponential function is expressed as
*A* is a scaling constant and *L*_{0} a characteristic length scale that may reflect some physical parameter in the fault system, such as the thickness of the brittle layer [50]. This type of scaling in fault-length distributions has been observed in mid-ocean ridges [54], in the Turcana Rift (Northern Kenya) [55] and in high-strain zones in the Corinth Rift [56] and the Afar Rift [57].

The power-law frequency-size distribution and the self-similar structure across the wide range of scales have been considered as strong indications of fractal geometries in fracture systems (e.g. [8]). Fractal geometry has been used as a first-order approximation to describe the complex patterns of fault systems and physical models on fractal fault growth have been developed [49,58,59]. Cowie *et al*. [58] used a two-dimensional numerical model to show how fractal fault patterns emerge as the result of correlations in the fault network, induced from both short- and long-range elastic interactions in a heterogeneous material (random heterogeneity) that is deformed under constant velocity (see also [59]). Cowie *et al*. [60] used a similar model to show that as deformation progresses and strain localizes on a few large faults, a multifractal structure emerges as the superposition of a fractal distribution of fault displacements onto a fractal fault pattern.

In the previous models fault interactions were an intrinsic characteristic of fault growth (see also [61,62]). Fault interactions are created as the fault system grows in size and faults start to interact through their stress fields. According to Sornette *et al*. [63], brittle failure along a fault causes the redistribution of strain perturbations in the medium that decay algebraically with distance according to the equations of elasticity, producing interactions in the far field. In the near field, stress interactions around larger faults, where strain localizes, produce a combination of stress screening and enhancement effects [61,62]. In earthquake triggering, this mechanism is quite important as it may accelerate or retard upcoming events on neighbouring faults [64]. In fault growth, the stress perturbations around larger faults favour deformation in some areas, while other areas remain relatively undeformed. Hence, optimally oriented faults in stress increase zones continue to grow, while other faults in stress drop zones may cease activity.

Other geometrical or physical factors that may control fault growth and, respectively, the scaling properties of fault systems are the finite thickness of the brittle layer [65,66] and the rheological properties of the lithosphere [67,68]. These factors may lead to non-universal scaling exponents and frequency-size distributions that deviate from the power law (e.g. exponential). In addition, simulations on numerical models [60,69,70] and on analogue laboratory experiments [65,71] showed that the frequency-size distribution depends on the total strain that the fault system has been imposed on and the competition between the different stages of fault growth, i.e. fault nucleation, growth and coalescence. The models indicate that at the very early stages of deformation, where few new faults have nucleated, their spatial organization is random and the trace-length distribution is best described by the exponential function [65,69]. As strain accumulates, more faults are nucleated and grow in size and number and start to coalesce forming larger faults. At this stage, where growth and coalescence start to dominate, faults display a power-law size distribution with decreasing exponents as strain increases, indicating the increased importance of large faults in accommodating strain [60,65]. At later stages of deformation, strain starts to localize along few large faults that span the mechanical layer and the number of active faults decreases. With increasing strain, the fault population reaches saturation, fault interactions are suppressed and the size distribution turns to exponential [65,69–71]. The transition from power-law scaling in lower strain regimes to exponential scaling in higher strain regimes has been observed in the Afar Rift [57] and the Corinth Rift [56], supporting the combination of crustal processes in single tectonic settings as a function of crustal strain.

### (b) The classical statistical mechanics approach

Despite the extreme complexity that characterizes the generation process of individual earthquakes, the relatively simple phenomenology (e.g. scale invariance) that seems to apply in the collective properties of earthquakes and faulting has motivated the statistical mechanics approach. By using the mathematical tools of probability theory and statistics, statistical mechanics can be used to estimate the macroscopic properties of fault and earthquake populations from the specification of the relevant microscopic constituents and their interactions [10].

Originally, statistical mechanics and the associated concept of entropy (*S*) were used in thermodynamics and the kinetic theory of gases. The concept of entropy was later incorporated into information theory as a measure of uncertainty [72], and since the mid-1950s, it has been extended to other fields beyond classic thermodynamics in order to provide a general principle for inferring the least biased probability distribution from limited information [73]. The latter is commonly known as the maximum entropy principle (MaxEnt). During recent decades, innovative insights into the macroscopic states of many physical, chemical and biological systems have been accomplished by applying the MaxEnt hypothesis, since only the constraints of the large-scale system need to be known, regardless of the complexity that may characterize the different microscopic configurations (e.g. [74]).

According to the principles of statistical mechanics, the relative probability that the system possesses a given state can be estimated by optimizing the entropy, subject to the limited knowledge about the system that is expressed through the constraints. For a continuous variable *x* that takes values in the space [0, ∞], the BG entropy *S*_{BG} is expressed as
*p*(*x*) is the probability distribution of *x* and *k*_{B} is Boltzmann's constant. The most probable state of *p*(*x*) is estimated by maximizing the entropy *S*_{BG}, subject to the constraints of normalization of *p*(*x*)
*x*

By using the Lagrange multipliers method, the probability distribution *p*(*x*) that maximizes *S*_{BG} is the well-known *Boltzmann distribution*:

The term e^{−βx} is often called the Boltzmann factor, where *β* is the Lagrange multiplier, and the denominator of equation (2.13) is a normalization factor that is referred to as the partition function

BG statistical mechanics has been applied to earthquake physics in a series of works, which address the theory as a variational principle for deriving the large-scale properties of earthquake populations [11,75] or to investigate the thermodynamic state of the crust [76–78]. One of the first studies that applied statistical mechanics to earthquakes was that of Berrill & Davis [79]. The authors maximized entropy to derive a probability density function of earthquake magnitudes *p*(*M*) that has the form of a truncated exponential distribution:
*m*_{1} is the minimum magnitude in the dataset and *β* the Lagrange multiplier. Furthermore, Main & Burton [75] maximized entropy using the constraints of the average magnitude 〈*m*〉 and the average seismic energy release 〈*E*〉 to derive a gamma distribution of earthquake energies *p*(*E*), consisting of a power-law distribution at small magnitudes that corresponds to the G-R scaling relation (equation (2.4)) and an exponential (Boltzmann) tail at larger magnitudes:
*β* is a scaling exponent and *θ* a characteristic energy that reflects the probability of occupancy of the different energy states *E*. Main & Burton [75] used earthquake data from the Mediterranean region and from Southern California to show that this type of distribution is more consistent with real observations. Moreover, Main & Al-Kindy [77] used equation (2.16) to investigate the proximity of global seismicity to criticality, characterized by the dissipated seismic energy *E* and the entropy *S*. The authors defined a subcritical state for positive *θ* and a supercritical state for negative *θ* and concluded that global seismicity is in a near-critical state, in which large fluctuations are dominated by fluctuations in *θ* rather than *β* and large seismic energy fluctuations can occur for smaller entropy fluctuations [77].

The hypothesis that the Earth's lithosphere is in a state of thermodynamically driven maximum entropy production and SOC was tested by Main & Naylor [78,80]. The authors explored this hypothesis using the Olami–Feder–Christensen (OFC) model [81] and real seismicity data and concluded that their observations were consistent with the hypothesis of entropy production as the driving mechanism for self-organized subcriticality in natural and model seismicity [80].

In other studies, Shannon entropy (*H*) was used as a measure of disorder in earthquake sequences [82,83]. Telesca *et al*. [82] estimated *H* with time for earthquake magnitudes and inter-event time series in Umbria–Marche region (central Italy) and found significant variations that correspond to the largest event in the series. De Santis *et al*. [83] related *H* to the G-R scaling relation and derived a relationship between *H* and the *b*-value that reads as
*b*_{max} = e log e ≈ 1.2. De Santis *et al*. [83] studied the variability of the *b*-value and *H* during two earthquake sequences in Italy and identified three dynamic regimes with respect to *H* variations: (i) a preparatory phase, where *H* increases slowly with time, (ii) the phase of occurrence of a strong earthquake, where *H* exhibits an abrupt increase after the occurrence of the main shock, and (iii) a final diffuse phase, where *H* recovers ‘normal’ values and seismicity spreads all over the region.

## 3. A non-extensive statistical mechanics approach

### (a) The Tsallis entropy *S*_{q}

_{q}

BG statistical mechanics seems the correct one to be used in a large and important class of physical systems that present strongly chaotic dynamics (positive maximal Lyapunov exponent). However, there is an important class of weakly chaotic systems (where the maximal Lyapunov exponent vanishes) that violate some of the essential properties of BG statistical mechanics [13,84–86]. The macroscopic behaviour of such systems, instead of the Boltzmann distribution, typically presents power-law distributions with heavy tails, enhanced by (multi-) fractal geometries, long-range interactions, intermittency or large fluctuations between the various possible states; properties that seem to correspond well to the phenomenology of earthquakes. For such systems, where BG statistical mechanics has limited applicability, a generalized framework has been introduced by Tsallis [12], termed as NESM. NESM generalizes BG statistical mechanics and its main advantage is that it considers all-length scale correlations among the elements of the system, leading to broad distributions with power-law asymptotic behaviour.

Central to NESM is the non-additive (Tsallis) entropy *S _{q}* that for the discrete case is expressed as [12]

*k*is some conventional positive constant taken to be Boltzmann's constant in thermo-statistics,

*p*is a set of probabilities,

_{i}*W*is the total number of microscopic configurations, and

*q*is the entropic index that represents a measure of non-extensivity. Tsallis entropy (

*S*) recovers the BG entropy (

_{q}*S*

_{BG}) in the limit

*q*→ 1. Although

*S*shares a lot of common properties with

_{q}*S*

_{BG}(e.g. non-negativity, expansibility, concavity; see Tsallis [13] for the full list of these properties),

*S*

_{BG}is additive whereas

*S*(

_{q}*q*≠ 1) is non-additive. According to this property and for two probabilistically independent systems

*A*and

*B*(i.e.

*S*

_{BG}satisfies

*S*satisfies:

_{q}The last term on the right-hand side of equation (3.4), which introduces the concept of non-additivity, is the fundamental principle of NESM [13]. Moreover, the cases *q* < 1, *q* = 1 and *q* > 1 correspond to super-additivity (super-extensivity), additivity (extensivity) and sub-additivity (sub-extensivity), respectively. However, if the systems *A* and *B* are correlated, then a *q*-value might exist such that *S _{q}*(

*A*+

*B*) =

*S*(

_{q}*A*) +

*S*(

_{q}*B*). In this case,

*S*is extensive for

_{q}*q*≠ 1 [13].

Let us now consider a continuous variable *X* with probability distribution *p*(*X*) (0 ≤ *p*(*X*) ≤ 1). The *X* variable in earthquake physics may represent the seismic moment (*M _{o}*), the inter-event times (

*τ*) or distances (

*r*) between successive earthquakes or the fault trace-lengths (

*L*) in a given region. In this case, the non-additive entropy

*S*is expressed through the integral formulation:

_{q}To obtain the distribution *p*(*X*) that optimizes *S _{q}*, subject to constraints, the Lagrange multipliers method is applied. The first constraint refers to the normalization condition of

*p*(

*X*):

The second constraint is the condition regarding the generalized expectation value (*q*-expectation value) *X _{q}*, which is defined as

*P*(

_{q}*X*) is the escort probability distribution [13,87] that is defined as

In the limit *q* → 1, *X _{q}* recovers the standard average value of

*X*, 〈

*X*〉. The concept of escort distributions [88] refers to a set of probability distributions which scan the structure of an original probability distribution. The later one is related to the (multi-) fractal features of nonlinear dynamic systems [87]. Using the Lagrange multipliers method and the previous constraints, the following functional is optimized:

*α*and

*β** represent the Lagrange multipliers. Optimization of equation (3.9) yields the optimal physical probability

The denominator of equation (3.10) is the *q*-partition function defined as

The term *β _{q}* is associated with the Lagrange multiplier

*β** and the second constraint (equation (3.7)) as

The numerator of equation (3.10) is the *q*-exponential function defined as [13]
*q*-logarithmic function:

According to Abe & Suzuki [16], the *q*-exponential distribution is a generalization of the Zipf–Mandelbrot distribution [89], which is obtained for *q *> 1. If *q *> 1, equation (3.10) exhibits an asymptotic power-law behaviour, whereas for 0 < *q *< 1 a cut-off appears at *X*_{c}* *=* *1/(1* *−* q*)*β _{q}* [16,17]. In the limit

*q*→

*1, the*

*q*-exponential and

*q*-logarithmic functions lead to the ordinary exponential and logarithmic functions, respectively. The

*q*-exponential function for various values of

*q*is shown in figure 2.

In the NESM framework, it has been proposed that the cumulative distribution function (CDF) *P*(>*X*) should be obtained upon integration of the escort probability *P _{q}*(

*X*) rather than

*p*(

*X*) [16,17]. Following this approach and by using equation (3.8),

*P*(>

*X*) is derived as

*X*

_{0}is always positive, has the units of

*X*and is defined as

*q*-logarithmic function (equation (3.14)). The latter equation implies that after the estimation of the appropriate

*q*-value that describes the distribution of

*X*, the

*q*-logarithmic function (equation (3.14)) is linear with

*X*, with slope −1/

*X*

_{0}[91].

The other type of distributions that is deeply related to statistical mechanics is that of the squared variable *X*^{2}. Optimization of *S _{q}* in terms of

*X*

^{2}using the appropriate constraints leads to the

*q*-Gaussian distribution that has the form [13]:

The *q*-Gaussian distribution generalizes the standard Gaussian distribution, which is recovered in the limit of *q *→* *1. For *q *> 1, the *q*-Gaussian distribution displays power-law tails with slope −2/(*q *−* *1), thus enhancing the probabilities of rare events.

### (b) Fundamental ideas of non-extensive statistical mechanics applied to earthquakes and tectonics

Given the nonlinear dynamics that control fragmentation processes and the phenomenology that fault and earthquake populations exhibit, the application of NESM to earthquake physics arose naturally. The NESM theory addresses properties such as (multi-) fractal geometries, long-range interactions and criticality, properties that seem quite important for understanding earthquake physics [85]. In a series of recent publications, it has been demonstrated that NESM is a powerful tool for describing the macroscopic properties of earthquake-related phenomena and other geophysical problems as diverse as plate tectonics, natural hazards, geomagnetic reversals and rock physics (see [92] and references therein). The current review focuses on the application of NESM to earthquakes and tectonics and the insights that can be obtained in earthquake physics using this approach.

#### (i) Non-extensive statistical mechanics and the earthquake-size distribution

A longstanding distribution in geophysics that is widely used in earthquake hazard assessments is the frequency-size distribution of earthquakes. The earthquake generation process is primarily a mechanical phenomenon, where stick–slip frictional instability inside fault zones has a dominant role (e.g. [1]). Consistent with this idea Sotolongo-Costa & Posadas [93] introduced the fragment-asperity interaction model for earthquake dynamics. This model considers the interaction of two rough profiles (fault blocks) and the fragments filling the space in between them. Stress accumulates in the crust until a fragment is displaced or an asperity is broken, resulting in fault plane slip and energy release. The relative displacement of the fault blocks and consequently the earthquake energy release (see the related discussion in \S2a(i)) are proportional to the size of the hindering fragments. Sotolongo-Costa & Posadas [93] used the NESM formalism to establish the seismic energy distribution function based on the size distribution of the fragments. The non-additive entropy *S _{q}*

_{,}in terms of the probability

*p*(

*σ*) of finding a fragment of area

*σ*, is expressed as

Sotolongo-Costa & Posadas [93] obtained the probability *p*(*σ*) by maximizing *S _{q}* under the constraints of normalization of

*p*(

*σ*) (equation (3.6)) and the

*q*-mean value of

*p*(

^{q}*σ*), whereas Silva

*et al*. [94] introduced the condition about the

*q*-expectation value in the second constraint (equation (3.7)) and maximized

*S*using the Lagrange multipliers method to derive the fragment size distribution function

_{q}*p*(

*σ*). Considering the proportionality between the released seismic energy

*E*and the size of the fragments

*r*(

*E*∼

*r*

^{3}) (e.g. [95]), the following expression for the seismic energy distribution function is derived [94]:

*p*(

*E*) =

*n*(

*E*)/

*N*, where

*n*(

*E*) corresponds to the number of earthquakes with energy

*E*and

*N*is the total number of earthquakes (figure 3

*b*). A more viable expression can now be obtained by introducing the normalized cumulative number of earthquakes, given by the integral of equation (3.18):

*N*(

*E*>

*E*

_{th}) is the number of earthquakes with energy

*E*greater than the threshold energy

*E*

_{th},

*N*the total number of earthquakes,

*q*the entropic index and

_{M}*a*a positive constant that expresses the proportionality between the released energy

_{M}*E*and the fragments of size

*r*. If we consider the relationship between

*E*and

*M*(equation (2.3)), the latter expression can be written in terms of the earthquake magnitude

*M*. By taking into account the minimum magnitude

*M*

_{0}of the earthquake catalogue, the cumulative distribution of earthquake magnitudes

*N*(>

*M*) takes the form [96]:

The fragment–asperity interaction model describes from the first principles of statistical mechanics the cumulative distribution of the number of earthquakes *N* greater than a threshold magnitude *M*, normalized by the total number of earthquakes (figure 3). This model, in the form of equation (3.20) or in the previous forms of Sotolongo-Costa & Posadas [93] and Silva *et al*. [94], has been applied to various regional earthquake catalogues [18,41,94,97–105]. The results of these studies indicate that the model can successfully reproduce the frequency–magnitude distribution of earthquakes in diverse tectonic environments (e.g. figure 3*a*) and in volcanic regions [100,106]. In comparison to the G-R scaling relation (equation (2.1)), the fragment-asperity model provides a good description of the observed earthquake magnitudes over a wider range of scales, while for values above some threshold magnitude, the G-R relation can be derived as a particular case, for the value of *b *= (2 − *q _{M}*)/(

*q*− 1) [96]. Moreover, the

_{M}*b*-value estimation according to the maximum-likelihood technique (e.g. [107]) is quite sensitive to the initial selection of the minimum earthquake magnitude

*M*

_{0}in the catalogue under study, while the

*q*-value estimation is quite stable irrespective of the selection of

_{M}*M*

_{0}[108]. The stability of

*q*with

_{M}*M*

_{0}is quite important in earthquake hazard assessments, as reliable estimates of

*q*can be made even in cases where the catalogue is incomplete or presents uncertainties due to spatio-temporal variations in the magnitude of completeness.

_{M}In addition, the *q _{M}* variations with time have been used as an index of tectonic stability in a seismic region and its proximity towards a larger event [18,100,101,104,109–111]. Papadakis

*et al*. [18] estimated

*q*and

_{M}*a*along distinct seismic zones of the Hellenic Subduction Zone (HSZ) and found that the

_{M}*q*variations are strongly related with the seismic energy release in each zone. Telesca [101] studied the

_{M}*q*temporal variations prior to the 6 April 2009 (

_{M}*M*

_{L }= 5.8) L'Aquila earthquake and found a significant increase some days before the occurrence of the strong event. Significant

*q*increases were also observed prior to the 12 October 2013 (

_{M}*M*

_{L }= 6.2) earthquake that occurred in the southwest segment of the Hellenic arc [112] and prior to the 1995 (

*M*= 7.2) Kobe earthquake [110] (figure 4). In these case studies, the

*q*increases were associated with the occurrence of moderate-size events prior to the main shock, indicating the initiation of a preparatory phase leading to a strong earthquake (figure 4).

_{M}The probability distribution of incremental earthquake energies (i.e. the energy differences between successive earthquakes) in real earthquake data and in numerical models has been found to follow the *q*-Gaussian distribution (equation (3.16)) [92,106,113]. The *q*-Gaussian distribution exhibits power-law tails, which enhance the probabilities (for *q *> 1) of rare events and in the case of seismicity the occurrence of a large earthquake immediately after the occurrence of a smaller one. In particular, Caruso *et al*. [113] found that in the critical regime of the dissipative OFC model [81] (small-world lattice), the incremental earthquake energies exhibit a *q*-Gaussian probability distribution, while in the non-critical regime (regular lattice) the probability distribution is close to Gaussian (figure 5*a*). Caruso *et al*. [113] repeated their analysis on real earthquake data and found that the probability distribution of incremental earthquake energies in global seismicity and Northern California follows the *q*-Gaussian distribution (figure 5*b*), providing further evidence for SOC, intermittency and long-range interactions in seismicity. These results also imply that although long-range temporal and spatial correlations exist in seismicity, together with a certain degree of statistical predictability, it is not possible to predict the magnitude of the events [113]. The probability distribution of incremental earthquake energies in volcano seismicity was further studied by Vallianatos *et al*. [106]. In the latter study, the analysis showed that the probability distribution of incremental earthquake energies during the 2011–2012 unrest at the Santorini volcanic complex follows the *q*-Gaussian distribution for the *q*-value of *q *= 2.24, a value that is in agreement with the Ehrenfest dog-flea SOC model [114].

In addition, *q*-exponential functions have been found to characterize the acoustic emission energy release produced during the triaxial deformation of an Etna basalt in the laboratory [14] and the global earthquake frequency–magnitude distribution [19]. Using the global CMT catalogue, Vallianatos & Sammonds [19] used NESM to study the effect of the Sumatra and Honshu mega-earthquakes (*Mw *≥ 9) on the global frequency–magnitude distribution (figure 6). The authors used a cross-over formulation of NESM [115] to interpret thermodynamically the deviation of greater magnitudes from a pure power law (see also the discussion in §2a(i)) and found that the distribution of greater earthquakes (*Mw *≥ 7.6) changes from an exponential to another power-law prior to the two mega-events, implying a global organization of seismicity as the two mega-events were approached (figure 6).

#### (ii) Spatio-temporal properties of seismicity and non-extensive statistical mechanics

The NESM approach to the spatio-temporal properties of seismicity was first introduced by Abe & Suzuki [16,17]. These authors studied the spatio-temporal distribution of seismicity in California and Japan and showed that the cumulative distributions of inter-event distances *P*(>*r*) and times *P*(>*τ*) between successive earthquakes are well described by the *q*-exponential distribution (equation (3.15)) for *q*-values of *q _{r}* < 1 and

*q*> 1, respectively. In particular, Abe & Suzuki [16,17] found

_{τ}*q*-values of

*q*= 0.77,

_{r}*q*= 1.13 for California and

_{τ}*q*= 0.747,

_{r}*q*= 1.05 for Japan and suggested the spatio-temporal duality of earthquakes, where

_{τ}*q*+

_{τ}*q*≈ 2.

_{r}The results of Abe and Suzuki were further verified in laboratory experiments [14], in numerical models and in regional and global seismicity (e.g. [18,19,41,105,116–118]), implying a universal character and the existence of long-term clustering effects in the spatio-temporal evolution of seismicity. The temporal scaling properties of volcanic seismicity and in particular during the 2011–2012 unrest at the Santorini volcanic complex were studied by Vallianatos *et al*. [106]. In the latter study, the authors showed that when the volcano-related seismicity takes swarm-like character, complex correlations of seismicity emerge that are characterized by a *q*-exponential inter-event times distribution.

Papadakis *et al*. [18] examined the inter-event times and inter-event distances distribution along the seismic zones of the HSZ (figure 7) and proposed that the observed variations of the entropic indexes signify different degrees of earthquake clustering. The latter result may imply asymmetric patterns within the clustered events in each seismic zone (e.g. [119]). In addition, Antonopoulos *et al*. [105] studied the probability distribution of inter-event times between successive earthquakes in Greece and showed that for both the entire dataset and the declustered one, where the aftershocks have been removed, the probability distributions are better described by the *q*-exponential distribution (equation (3.10)), for the *q*-values of *q _{τ}* = 1.24 ± 0.054 and

*q*= 1.14 ± 0.057, respectively. For the declustered dataset, the corresponding

_{τ}*q*-value that best describes the observed distribution approaches unity, implying the loss of temporal correlations and close proximity to Poissonian (random) behaviour once the aftershocks are removed from the dataset. Moreover, Antonopoulos

*et al*. [105] estimated the hazard function

*W*(

_{M}*T*, Δ

*T*), defined as the probability of at least one earthquake with magnitude greater than

*M*will occur in the next time interval Δ

*T*if the previous one occurred before time

*T*. The hazard function is related to

*p*(

*T*) by:

If *p*(*T*) scales according to the *q*-exponential function (equation (3.13)), then by simple integration one obtains:

The hazard function *W _{M}*(

*T*,Δ

*T*)for the earthquake activity in Greece according to equation (3.22) and for various time intervals Δ

*T*is shown in figure 8. The graph shows that for a fixed time interval Δ

*T*, the probability for at least one earthquake with magnitude greater than

*M*to occur in the next time intervalΔ

*T*, decreases with increasing

*T*(figure 8) [105]. The latter implies that the longer it has been since the last earthquake, the longer it will take for the next one to occur (see also Sornette & Knopoff [120]).

In addition, Michas *et al*. [41] studied the PDF for the inter-event times of earthquakes in the West Corinth Rift, Greece. In this case, the inter-event times *τ* were normalized according to the mean seismic rate as *τ*′ = *R *× *τ*, or equivalently to the mean inter-event time *τ* with *R*, the normalized probability distributions *p*(*τ*′) for various threshold magnitudes fall approximately into a unique curve (figure 9; see also Corral [36]). For stationary periods, where the mean seismic rate does not fluctuate, this unique curve can well be approximated by the gamma distribution (equation (2.7)). However, if the entire non-stationary earthquake time series are considered, an additional power-law regime at long inter-event times appears. This type of scaling can be described by a *q*-gamma distribution (figure 9), where the last exponential term has been substituted by the *q*-exponential function (equation (3.13)) [41]. The *q*-gamma distribution then takes the form of [121]:

In the limit of *q* → 1, the *q*-gamma distribution (equation (3.23)) recovers the ordinary gamma (equation (2.7)). According to equation (3.23), for short inter-event times, *p*(*τ*) scales as a power law, *p*(*τ*) ∼ *τ ^{γ}*

^{−1}, whereas for longer inter-event times,

*p*(

*τ*) scales as another power law,

#### (iii) Fault-size distribution and Tsallis entropy

The NESM formalism, as it applies to a fault system with various fault sizes (lengths) *L* was introduced by Vallianatos *et al*. [122] and Vallianatos & Sammonds [91]. In this case, the Tsallis entropy *S _{q}* is expressed as

*σ*is a positive scaling factor,

*k*a positive constant,

*q*the entropic index and

*p*(

*L*)d

*L*the probability of finding

*L*in the interval [

*L*,

*L*+ d

*L*]. Optimizing

*S*using the appropriate constraints and the Lagrange multipliers method, as described in the previous sections, yields a

_{q}*q*-exponential type distribution, which takes the form (see also [123]):

*P*(>

*L*) is the cumulative PDF of

*L*,

*L*

_{0}is a positive scaling parameter (

*q*> 1) and

*L*

_{min}is the minimum fault-length in the dataset.

Vallianatos *et al*. [122] used the NESM approach to study the scaling properties of the fault network in Crete, in the front of the Hellenic Arc, and found that this scales according to the *q*-exponential distribution for the *q*-value of *q *= 1.16. Vallianatos & Sammonds [91] used a similar approach to study the scaling properties of linked and independent faults in an extraterrestrial fault system; the Valles Marineris extensional province, Mars, and showed that the fault-length distributions follow a *q*-exponential, with *q *= 1.75 for linked faults and *q *= 1.10 for the independent ones (figure 10). These results indicate the strong mechanical correlations of linked faults and the weaker ones for the independent faults, which exhibit a *q*-value close to unity and BG statistical mechanics. Furthermore, Vallianatos [123] studied the scaling properties of thrust (compressional) and normal (extensional) faults in Mars and found that the former are described by the *q*-exponential distribution for the value of *q *= 1.114 and the latter for *q *= 1.277.

In addition, Michas *et al*. [56] studied the scaling properties of the fault network in the Corinth Rift (Greece) that is one of the most tectonically active continental rifts on the Earth. By using the NESM approach, the analysis indicated the transition from *q*-exponential scaling and asymptotic power-law behaviour in the lower strain eastern zone, to exponential scaling and Poissonian behaviour in the higher strain central and western zones. When the current strain rates were considered, the analysis showed a similar transition from *q*-exponential scaling in the lower strain rate zone to exponential scaling in the higher strain rate zone, indicating the maturity of the fault network and fracture saturation in the currently active rift zone. The observed properties in the various strain regimes are shown in figure 11 and are described in detail by Michas *et al*. [56]. These properties provide evidence for a combination of crustal processes in natural fault systems and further suggest that fault growth processes in the upper crust control the fault network evolution and the localization of strain in the Corinth Rift. Furthermore, Michas *et al*. [56] studied the sensitivity of the observed distributions to missing faults from the dataset by generating synthetic fault datasets that scale according to the *q*-exponential-type distribution of equation (3.20). The analysis showed that even in the case of more than 90% missing faults from the dataset, the observed scaling remains quite stable, with *q*-value variations that increase as more data are removed from the original dataset.

## 4. Concluding remarks

Although significant research has been conducted on the complex properties of seismicity, including scaling relations, temporal and spatial correlations, critical phenomena and nucleation, many questions regarding earthquake physics remain to be answered in the future. Most of the current research approaches rely on empirical laws rather than on a solid underlying theoretical concept. However, statistical seismology should evolve into a genuine physically based statistical physics of earthquakes. This review paper presents the concept of NESM, which offers a consistent theoretical framework for the macroscopic description of fault and earthquake populations. The NESM approach, based on the first principles of statistical mechanics, provides a unified framework that produces a range of asymptotic power law to exponential-like distributions that are both ubiquitous in nature.

The first section of this study reviews the scaling properties of seismicity and fault populations, and the classical statistical mechanics approaches that have been developed to describe them. Following, the theoretical framework of NESM and its application to earthquakes and tectonics is extensively presented. More particularly, we present various applications on the earthquake frequency–magnitude distribution, the spatio-temporal properties of seismicity and the fault-size distribution.

The various published studies and the cases presented here illustrate that in terms of probabilities of the different microstates and their interactions, the large-scale properties of fault and earthquake populations can be deduced by following the principles of NESM. The NESM-based models can be used to model the evolution of seismicity and can be used to further contribute to seismic hazard assessments. Moreover, it seems that the examination of the non-extensive parameters derived from the earthquake frequency–magnitude distribution can be used to describe the organization of seismicity and to reveal its hidden dynamics. In addition, examination of the spatio-temporal distributions reveals the degree of correlation among the elements of the studied system as well as the probabilistic content of earthquake sequences. Various studies of the scaling properties of fault networks signify mechanical correlations and reveal strong relation with the strain regime.

Although such results provide a step forward to the understanding of earthquake-related phenomena, many questions regarding the earthquake generation process remain wide open. Using the principles of NESM in a unified approach with the other known laws in fracture mechanics may lead to significant discoveries and may enhance our understanding regarding the physical mechanisms that drive the evolution of seismicity in local, regional and global scale.

## Competing interests

We declare we have no competing interests.

## Funding

No funding has been received for this article.

- Received June 20, 2016.
- Accepted November 7, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.