## Abstract

We introduce a novel measure, Fisher transfer entropy (FTE), which quantifies a gain in sensitivity to a control parameter of a state transition, in the context of another observable source. The new measure captures both transient and contextual qualities of transfer entropy and the sensitivity characteristics of Fisher information. FTE is exemplified for a ferromagnetic two-dimensional lattice Ising model with Glauber dynamics and is shown to diverge at the critical point.

## 1. Introduction

Interactions within a complex dynamical system often induce intricate statistical regularities and rich information flows. Formalizing these regularities and flows in a dynamic distributed setting is a subject of Information Thermodynamics—an emerging field combining approaches from Information Theory, Statistical Estimation Theory, Complex Dynamical Systems and Statistical Mechanics in an attempt to systematically and information-theoretically quantify spatio-temporal patterns on both a global and a local scale. This in turn enables a comprehensive comparative analysis of system dynamics across diverse physical, computational, biological and technological domains.

Furthermore, discovering patterns of information thermodynamics within the system is crucial in identifying critical regimes and phase transitions, and providing efficient means for an accurate forecasting and precise control of system behaviour.

One of the key challenges of Information Thermodynamics [1] is a lack of rigorous characterization of a dynamic balance between various information flows in the vicinity of phase transitions. An adequate information-theoretic framework for critical, edge-of-chaos, phenomena is yet to be developed. On the one hand, it is conjectured that at the edge of chaos the distributed computation, intrinsic to complex dynamics, maintains a balance between high information storage, information transfer and synergistic information (or novelty generation). For example, transfer entropy [2], characterizing the communication aspect of computation, is known to peak near critical regimes [3,4], while Fisher information [5] is known to peak at phase transitions [6]. On the other hand, it remains unclear how such dynamic balance is related to physical fluxes which are observed and studied during phase transitions. This work is motivated by the need to develop a new measure which shares properties of both transfer entropy and Fisher information, and apply it to a well-understood physical model.

Computation-theoretically, transfer entropy was shown to capture one of the three elements of distributed computation: communication from system *Y* to system *X* [7,8]. Transfer entropy was observed to be locally maximized in coherent propagating spatio-temporal structures within cellular automata (i.e. gliders) [7] and self-organizing swarms (cascading waves of motions) [9]. In another context, transfer entropy was found to be high while a system of coupled oscillators was beginning to synchronize, followed by a decline from the global maximum as the system was moving towards a synchronized state [10].

Thermodynamically, transfer entropy was found to be related to the external entropy production by the system *X* in the context of *Y* , due to irreversibility (e.g. heat flux) [11,12]. In addition, maxima of transfer entropy were observed to be related to critical behaviour, e.g. average transfer entropy was observed to maximize on the chaotic side of the critical regime within random Boolean networks [3]. Furthermore, in a ferromagnetic two-dimensional lattice Ising model with Glauber dynamics, (collective) transfer entropy was analytically shown to peak on the disordered side of the phase transition [4].

Elements of the Fisher information matrix were explicitly related to gradients of the corresponding order parameters [6], providing another important connection between information-theoretic and thermodynamic interpretations of critical behaviour. It is obvious, however, that transfer entropy and Fisher information reflect on quite different aspects of the dynamics. Information-theoretically, transfer entropy is centred on information dynamics during state *transitions* in the *context* of another source, while Fisher information quantifies the amount of information in an observable variable about a parameter, and thus estimating *sensitivity* to changes in the parameter. Thermodynamically, transfer entropy is related to the external entropy produced by a system during a transition, while Fisher information is proportional to the gradient of an order parameter, diverging when the system approaches a critical point. Under certain conditions, these two measures can be explicitly related [13], i.e. in isothermal systems near thermodynamic equilibrium, the gradient of a suitably defined transfer entropy is shown to be dynamically related to Fisher information and the curvature of system’s entropy. In other words, ‘predictability’ of computation (transfer entropy) is explicitly connected to its ‘sensitivity’ (Fisher information) and ‘uncertainty’ (thermodynamic entropy).

There are other results relating Fisher information with various entropy measures (e.g. [14–21]), as well as showing that Fisher information provides a variational principle using which it is possible to derive, under suitable constraints, several fundamental physical laws in equilibrium and non-equilibrium thermodynamics, e.g. [16,22–25], and also relating Fisher information to synaptic plasticity and complexity of neural networks [26].

However, the main motivation of this study is a desire to meaningfully *combine* Fisher information and transfer entropy, in order to capture both *transient* and *contextual* qualities of transfer entropy and *sensitivity* characteristics of Fisher information. This may ultimately reveal fundamental connections between elements of information dynamics at critical points. Such a measure may also be useful in characterizing regimes near various tipping points occurring when a small change triggers a strong or even catastrophic response [27,28], e.g. tipping points in climate, ecosystem or financial market dynamics. Some tipping points are extremely difficult to model and anticipate, and so a new measure which would combine the sensitivity of Fisher information and the interaction context captured by transfer entropy could extend the set of forecasting tools available for analysis of challenging nonlinear dynamical systems.

We introduce here a novel measure, Fisher transfer entropy (FTE), which aims to quantify a gain in sensitivity to a control parameter, obtained during a (state) transition of an observable random variable, in the context of another observable random variable. The approach is then applied to a kinetic Ising model where we initially derive Fisher information, showing analytically its divergence at the critical point, followed by a derivation and analysis of FTE.

## 2. Technical preliminaries

*Transfer entropy* is a Shannon information-theoretic quantity [2] which measures a directed relationship between two, possibly coupled, time-series processes *Y* and *X*, by detecting asymmetry in their interactions. Specifically, the transfer entropy **y**_{n} at time *n* of the *source* time-series process *Y* provide about the next values **x**_{n+1} of the *destination* time-series process *X*, in the context of the previous state **x**_{n} of the destination process:
*state* here refers to the underlying dynamical state of a process. For a time-series process *X*, this is generally represented by Takens’ *embedding vectors* [29] *embedding dimension* *k* and embedding delay *τ*. In a thermodynamic setting, a set of thermodynamic state variables fulfils the same role.

The transfer entropy **Y**_{n} and **X**_{n+1} given **X**_{n}. Informally, it helps to answer the question ‘*if I know the state of the source, how much does that help to predict the state transition of the destination?’.*

*Fisher information* and the Fisher information matrix are well known in statistical estimation theory. Fisher information [5] is a measure for the amount of information that an observable random variable *θ*, upon which the likelihood function of *θ* depends. Let *p*(*x*|*θ*) be the likelihood function of *θ* given the observations *x*. Then, Fisher information can be written as
*E*[…|*θ*] denotes the conditional expectation over values for *p*(*x*|*θ*) given *θ*. Thus, Fisher information is not a function of a particular observation, since the random variable

The discrete form of Fisher information is
*p*(*x*) is a discrete probability distribution function, such that *x*∈{*x*_{1},…,*x*_{D}}, where *D* is the alphabet size or number of potential values for the variable

Furthermore, the *n*×*n* Fisher information matrix is defined for several parameters *θ*=[*θ*_{1},*θ*_{2},…,*θ*_{n}]^{T}, as follows:

*Statistical-mechanics models* typically deal with Gibbs measures, defined for a physical system in an equilibrium with a large thermal reservoir, as follows:
*x* varies over the configuration space; the set {*θ*^{i}} includes time-dependent thermodynamic variables (e.g. inverse temperature, pressure, magnetic field, chemical potential, etc.); and the time-independent functions *X*_{i}(*x*) are collective variables which determine the form of action. The system’s Hamiltonian captures the total energy at *x*: *β*=1/*k*_{B}*T* being the inverse temperature (*T*) of the environment in natural units, and *k*_{B} denoting the Boltzmann constant, and

Several previous studies [31–34] reported that the Fisher information matrix provides a Riemannian metric (more precisely, the Fisher–Rao metric) for the manifold of thermodynamic states. For instance, it was suggested that the scalar curvature *g*_{ij}(*θ*)=*F*_{ij}(*θ*) measures the complexity of the system [33].

Fisher information is also explicitly related to the gradient of the corresponding order parameter(s) [6]
*ϕ*^{i} is a negative derivative of thermodynamic potential *θ*^{i}, i.e. *ϕ*^{i}=−∂*G*/∂*θ*^{i}. The order parameter is known to be related to the mean value of the corresponding collective variable *X*_{i} [6,35]
*et al*. [6], not only does this avoid the issue of identifying order parameters, but also provides a natural interpretation of localizing the critical point where the observed variable is most sensitive to the control parameter(s)/thermodynamic variable(s) (an interpretation applicable in both infinite and finite systems).

The following general relationship [6,38]
*X*_{i} and *X*_{j}. Thus, the Fisher information *F*_{ij} can be seen to measure ‘the size of fluctuations about equilibrium’ of the collective variables *X*_{i} and *X*_{j} [38].

Using this general expression, one may consider a generic case of a *d*-dimensional Ising-type magnetic model with a probability density expressible in the form of equation (2.5) [31]. For this model, Brody & Rivier [31] have shown that critical behaviour of thermodynamic quantities can be analysed in terms of the reduced temperature *t*=*T*/*T*_{c}−1, leading to the general expression [31]
*d* and the corresponding values of critical exponents (e.g. for the three-dimensional Ising model all matrix elements diverge).

## 3. Fisher information in a kinetic Ising model

We consider an isotropic ferromagnetic two-dimensional lattice Ising model of size *N*=*L*×*L*, with no external field. If the system is in state **s**=*s*_{1},…,*s*_{N},*s*_{i}∈{+1,−1}, then the Hamiltonian is given by
*N* unique pairs of lattice neighbours *i*,*j*, bold type **s** denotes a state vector of spins and normal/lower case Greek type *s*_{i} denotes individual ±1 spin of site *i*. In the following, we also use capitals **S**, *S*_{i} to denote random variables, and *σ* to denote a specific spin value ±1.

For the kinetic model, we consider discrete-time Glauber spin-flip dynamics [39]: at each time step, a site *i* is selected uniformly at random and its spin flipped with probability
*H*_{i}(**s**)=*H*(*s*^{i})−*H*(**s**) is the energy difference between the spin-flipped and original state, and units are considered normalized so that Boltzmann constant *k*_{B} is 1. Here, a superscript *i* denotes flipping the *i*th spin.

The following analytical expression, obtained by Onsager, is well known for the magnetization as a function of temperature [40]:
*T*<*T*_{c}, where *M*=0 for *T*≥*T*_{c}.

Fisher information of the spin at a specific site *i* can then be analytically derived as a function of inverse temperature *β* (see appendix A for details):

As shown in appendix A, Fisher information of each spin diverges at critical temperature, as expected. In other words, as

For the disordered case *T*≥*T*_{c}, the magnetization *M*=0 and the probabilities *F*_{i}(*β*)=0.

## 4. Fisher transfer entropy

We introduce here a novel measure, FTE, which quantifies a gain in sensitivity of an observable random variable *X* to an unknown parameter *θ*, due to incorporating the context of another observable random variable *Y* , during a (state) transition in *X*:
*F*_{Xn+1|Xn}(*θ*), specified in equations (4.2) and (4.3), quantifies the transient (or dynamic) sensitivity of a state transition from *X*_{n} to *X*_{n+1} of the observable random variable *X*, to parameter *θ*. Specifically, this measures the sensitivity of the transition probability *p*(*x*_{n+1}|*x*_{n},*θ*) to *θ*. The term *F*_{Xn+1|Xn,Yn}(*θ*), given by equations (4.4) and (4.5), accounts for the transient sensitivity to *θ* of a state transition from *X*_{n} to *X*_{n+1} given *Y* _{n}: that is, the transition of the variable *X* in the context of variable *Y* . This measures the sensitivity of the transition probability *p*(*x*_{n+1}|*x*_{n},*y*_{n},*θ*) to *θ*. Hence, the resulting difference between the terms, *gain* in transient sensitivity when variable *Y* is accounted for in the transition from *X*_{n} to *X*_{n+1}. That is, if variables *X* and *Y* are independent, and *F*_{Xn+1|Xn}(*θ*)=*F*_{Xn+1|Xn,Yn}(*θ*), there is no transient sensitivity gain:

The terms *F*_{Xn+1|Xn}(*θ*) and *F*_{Xn+1|Xn,Yn}(*θ*) can be represented using the Chain Rule for Fisher information [41]

To obtain a discrete form for FTE, one simply applies the discrete form for Fisher information to each term in equation (4.7) (see specific examples in appendix B).

## 5. Fisher transfer entropy in a kinetic Ising Model

In this section, we derive FTE for a kinetic Ising model, focusing only on the disordered phase, i.e. the simpler case of *T*≥*T*_{c}, as *from below*. In this phase, Fisher information is trivially zero: *F*_{i}(*β*)=0, as established in the previous section. Henceforth, *X* is the random variable associated with a given lattice site, and *Y* is the random variable representing one of its lattice neighbours.

The first term in the FTE is given as follows (see appendix B):
*i* being the site index, **S** the stochastic variable representing the system (lattice) vector, and a state vector **s** representing an instance of **S**. *P*_{i}(**S**) is the flipping probability of spin *S*_{i} in a given spin configuration **S** (n.b. equation (3.2)). In general, *q* is a function of *β* or *T*. Results of a numerical estimation of *q* are shown in figure 1*a*.

We numerically estimated ∂*q*/∂*T* by simulating a kinetic Ising model of size *N*=*L*×*L* for *L*=512, using Glauber dynamics as implemented in [4]. The simulations show (cf. figure 1*b*) that ∂*q*/∂*T* is positive at critical temperature. Furthermore, the numerical estimates increase as the discrete differences Δ*T* decrease, so that *β*=−*T*^{2}(∂/∂*T*), we can conclude that *q*>0, the total transient sensitivity of a state transition from *X*_{n} to *X*_{n+1} diverges at

The second term in the FTE is given by following expression (obtained in appendix B):
*U* is the internal energy (as a function of *β*, see equation (B 12)), and *X*_{n} to *X*_{n+1} in context of *Y* _{n} at critical point as well:

The difference between (5.3) and (5.1) yields the desired FTE defined by (4.1), as the gain in transient sensitivity. In the thermodynamic limit, we obtain
*A*_{q,U}, *B*_{U} and *C*_{q,U} are positive at the critical point (see appendix B).

Given logarithmic divergence *q*/∂*β*)(∂*U*/∂*β*) always positive, we obtain that the total FTE diverges,

Since ∂/∂*β*=−*T*^{2}(∂/∂*T*), we also obtain

In other words, at the critical regime, divergence of the transient sensitivity in context of the neighbours, i.e. *NF*_{Xn+1|Xn,Yn}(*β*), is faster than divergence of the transient sensitivity *per se*, *NF*_{Xn+1|Xn}(*β*), and so the gain in transient sensitivity diverges overall. Interestingly, this can be contrasted with zero Fisher information: *F*_{i}(*β*)=0, on the disordered side; highlighting that FTE reveals changes in dynamic sensitivity that Fisher information does not.

## 6. Discussion and conclusion

This study introduced *FTE*, a measure which quantifies a gain in sensitivity to a control parameter. This gain is obtained during a state transition of an observable random variable *X* (‘destination’), in the context of another observable random variable *Y* (‘source’). The new measure combines several characteristics of two well-known measures: transfer entropy and Fisher information. It captures transient and contextual qualities of transfer entropy, as well as sensitivity of Fisher information. The ‘destination’ variable *per se* may be insensitive to changes in some control parameter *θ*, resulting in zero Fisher information *F*_{X}(*θ*). Moreover, even a transition between the states of the ‘destination’ variable may gain no sensitivity to the control parameter changes, with *F*_{Xn+1|Xn}(*θ*)=0. However, when such a transition occurs in context of some external influence, e.g. provided by ‘source’ *Y* , the transient dynamics may become sensitive to changes in *θ*, with non-zero transient contextual sensitivity: *F*_{Xn+1|Xn,Yn}(*θ*)≠0. The gain in transient sensitivity is brought about by the source–destination interaction, which may be due to either direct influence or some indirect contextual contribution from the source.

It is well known that non-zero transfer entropy does not necessarily mean that the source has a causal effect on the destination [43], and so the introduced FTE is not intended to capture any sensitivity or gain in causal interactions between the variables. It does, nevertheless, capture the gain in transient sensitivity of the destination variable in presence of the source variable. Informally, FTE refers to the amount of informational sensitivity about the state transition of a destination variable that is gained when a source variable is incorporated; i.e. addressing the question ‘if I know the state of the source, how much does that help in gaining sensitivity of the state transition of the destination, to changes in some control parameter?’.

One may then pose the question, ‘In which situations would FTE reveal interesting dynamics?’ As pointed out in preceding paragraphs, the proposed measure is focused on sensitivity of transient dynamics, in context of some external source *interacting* with the dynamics under the consideration. We can expect that such interactions exhibit some non-trivial dynamics in the vicinity of phase transitions, when variables are characterized by critical behaviour and so may be particularly sensitive to changes in the underlying control parameter. Moreover, the gain in the sensitivity within an interacting system may further characterize the strength and/or complexity of the interaction.

We applied the approach to a kinetic Ising model. Fisher information was analytically shown to diverge at the critical point approaching from one side (cf. empirical results of another study confirming this derivation [44]), and staying zero on the other side. We followed with a detailed analysis of FTE and demonstrated its divergence at the critical point, approaching from the same side where Fisher information itself is actually zero. The reason for zero Fisher information is that the opposite spin states are in balance on that side and remain insensitive to the temperature, whereas the interactions in the transient dynamics are sensitive to the temperature. Furthermore, the results show that sensitivity of transient dynamics diverges faster in presence of the interactions with the lattice neighbours of Ising model, yielding non-zero FTE.

In this study, we used Glauber dynamics in simulating the Ising model. There are other simulations techniques, such as the loop-algorithm, which do not suffer from critical slowing down and employ global updates [45,46]. It remains to be seen whether the different update dynamics of the Ising model give rise to different FTE profiles at the transition.

Pairwise transfer entropy was previously shown to reach a finite maximum at the phase transition [4], while our analysis demonstrated divergence of FTE. In addition, collective transfer entropy was shown to peak on the disordered side of the phase transition [4]. It remains an intriguing question whether collective FTE, generalized to account for influences of all lattice neighbours, may also have a (post-critical) peak, or diverge, on the disordered side. In addition, it is interesting to investigate other complex systems, where pairwise transfer entropy and FTE behave differently at the critical point(s).

There are several other avenues for future research and applications. Formally derived mathematical properties of FTE would enhance the emerging theoretical framework of Information Thermodynamics. On the computational implementation side, numerical estimations are known to be very difficult for quantities involved in the Fisher information and related measures [47]. Future effort can be directed to the development of efficient algorithms for numerical estimation of FTE and the analysis of the ensuing computational complexity. We also believe that measuring FTE may be particularly useful in systems with strong coupling and interacting components. For example, interactions within bipartite systems may undergo critical changes near or at phase transitions, and estimating the gain in transient sensitivity may reveal and/or characterize specific phase transitions driven by external influences. Similarly, many real-world complex networks are interdependent, and recent theoretical work on ‘networks formed from interdependent networks’ suggests that when interdependencies are introduced, some well-known properties no longer hold (e.g. scale-free networks coupled with other networks lose their robustness to random failures [48]). Again, FTE measured within such networks may identify salient pathways for critical information dynamics.

## Data accessibility

The manuscript describes a theoretical mathematical development. The derivations and the plot shown in figure 1 are reproducible by following the steps described in the paper.

## Authors' contributions

M.P. is responsible for the initial idea which was later refined over many discussions with J.T.L., and X.R.W. L.B., J.T.L. and M.H. developed the theoretical framework for specifying probability distributions used by M.P. in deriving Fisher information in a Kinetic Ising Model (§3 and appendix A). J.T.L. led the formalization of FTE via suitable conditional distributions, with M.P. and X.R.W. further developing the concept (§4). M.P., M.H. and J.T.L. produced analytical results characterizing FTE in a Kinetic Ising Model (§5 and appendix B). L.B. provided the software code used to perform Ising model simulations, which were carried out and interpreted by O.O. (§5). M.P. coordinated the effort, drafted the manuscript, with multiple revisions suggested by all authors who gave final approval for publication.

## Competing interests

Authors have no competing interests.

## Funding

L.B. was supported by the Dr Mortimer and Theresa Sackler Foundation.

## Acknowledgements

The authors thank the anonymous reviewers for their comments and suggestions, which have helped in improving the quality of the manuscript.

## Appendix A. Fisher information

We follow [4] in specifying the distribution
*T*<*T*_{c} (3.3), we can rewrite the probability distribution as
*i* as a function of inverse temperature *β*, by substituting *p*(*S*_{i}=*σ*) into (2.3), and setting *θ*=*β* and *x*=*S*_{i}:

We can evaluate this expression as *ϵ* in the corresponding eqns (15) and (18) [42]:

## Appendix B. Fisher transfer entropy

The analysis presented here is limited to the simpler case of *T*≥*T*_{c}, as

The relevant probability functions are given as follows [42]:
*q* is defined, in the thermodynamic limit, according to (5.2)
*i* being the site index, **S** the stochastic variable representing the system (lattice) vector and a state vector **s** representing an instance of **S**.

Next, we have [42]
*q*_{y}=*q*_{yn} is defined according to (B 5).

Then, according to the discrete form of (4.3),

When symmetry is unbroken, in the thermodynamic limit, the probability

Now we consider the discrete form of the more complicated term (4.5)
*T*≥*T*_{c}, and noting that 〈*S*_{j}*P*_{i}(**S**)〉=0 [4], results in *p*(*x*_{n+1}|*x*_{n},*y*_{n}), specified by (B 6), does in general depend on *β*, creating several possibilities, dependent on spins of *x*_{n+1}, *x*_{n} and *y*_{n}. According to Barnett *et al*. [42], for infinite lattices,
*U* is dependent on *β* and is given by Barnett *et al*. [4,42]
*p*(*x*_{n},*y*_{n}), which appears in denominators of (B 6), depends on *β*, as follows (we again use *M*=0 for *T*≥*T*_{c}):

Taking derivative over *β* in (B 6), and using *x*_{n+1}=±*x*_{n})

Substituting (B 20) and (B 6) into (B 10) and considering eight possible spin permutations of *x*_{n+1}, *x*_{n} and *y*_{n} yields

The difference between (B 23) and (B 9) yields the desired FTE defined by (4.1). In the thermodynamic limit, we obtain
*A*_{q,U}, *B*_{U} and *C*_{q,U} given as follows:
*A*_{q,U}, *B*_{U} and *C*_{q,U} are positive at the critical point, since *q*>0 and *A*_{c}=8/*q*, *C*_{c}=40*q*.

- Received August 28, 2015.
- Accepted November 3, 2015.

- © 2015 The Author(s)