The random search problem has long attracted continuous interest owing to its broad interdisciplinary range of applications, including animal foraging, facilitated target location in biological systems and human motion. In this paper, we address the issue of statistical inference for ordinary Gaussian, Pareto, tempered Pareto and fractional Gaussian random walk models, which are among the most studied random walk models proposed as the best strategy in the random search problem. Based on rigorous analysis of the local asymptotic normality property and the Fisher information, we discuss some issues in unbiased joint estimation of the model parameters, in particular, the maximum-likelihood estimation. We present that there exist both theoretical and practical difficulties in more realistic tempered Pareto and fractional Gaussian random walk models from a statistical standpoint. We discuss our findings in the context of individual animal movement and show how our results may be used to facilitate the analysis of movement data and to improve the understanding of the underlying stochastic process.
Peculiarities of individual animal movement have been attracting considerable attention over the last three decades (Mandelbrot 1977; Kareiva & Shigesada 1983; Viswanathan et al. 1996, 2008; Bartumeus et al. 2003; Reynolds et al. 2007; Codling et al. 2008) as they are thought to hold the key to the understanding of the animal dispersal, and hence to better understanding of the spatio-temporal phenomena in population dynamics such as biological invasions, pattern formation, and so on (Turchin 1988; Okubo & Levin 2001). Moreover, because the observed patterns in animal movement (in particular, in foraging behaviour) are thought to optimize the search success (Bartumeus & Catalan 2009), it may help to better understand some crucially important phenomena such as population outbreaks or species extinction as well as the evolutionary aspects of the population dynamics.
An animal path is typically continuous and that provides a considerable challenge as a consistent theoretical framework allowing for analysis of continuous paths is yet largely missing (Nathan 2001; Nathan et al. 2008), although some progress has recently been made (Patterson et al. 2008; Pedersen et al. 2011). In contemporary practice of animal movement studies, the path is usually mapped into a broken line where the line's nodes correspond to animal position at certain observation times. Generally speaking, the time step Δt between two subsequent observations can vary along the path; here, for the sake of simplicity, we assume it to be constant (as is indeed the case with many field and laboratory studies). The movement along the broken line can then be quantified by the probability distribution for the step size or ‘jump’ between two subsequent positions and by the probability distribution for the turning angle (Turchin 1988). Therefore, researchers have to work with finite amount of data. Moreover, in order to reach a good understanding of the underlying stochastic process, it often appears necessary to work with rare events, e.g. long jumps, where the data are scarce by definition. Because scarce data are subject to a large statistical fluctuation, analysis of rare events is a challenging issue which requires application of advanced statistical tools.
Application of statistical techniques to animal movement data usually consists of two steps. The first one is the choice of the model, e.g. the type of the step size distribution. Examples of widely used models are given by the Brownian motion (also known as Gaussian random walk), fractional Brownian motion and Lévy walk or flight (aka Pareto random walk). Note that animal movement usually consists of more than one mode (Morales et al. 2004), so that it can be a combination of different movement types or behavioural states (Benhamou 2007). Once the distribution is chosen using an appropriate model selection procedure (Burnham & Anderson 2002; e.g. the Akaike information criteria), the next step is to estimate the model parameters such as the variance, the value of cutoff at small steps (where applies), etc. This is normally performed by the maximum-likelihood method.
Standard approaches, however, tell little about the possible error of the parameter estimate. Meanwhile, a large error may reduce the practical value of the estimate to zero. Consider a hypothetical example when experimental data, for instance, on the jump size distribution, are nicely fitted by a theoretical curve. In case the error of the parameter estimation is small, that can be regarded as a successful interpretation of the data and hence can make it possible to identify the process behind the observed pattern. However, a large error would make this interpretation very questionable. Even more importantly, although the second step of model fitting (i.e. parameter estimation) tends to be regarded as a technical stage rather than a conceptual one, there can be a logical loop when results from parameter estimation may overturn the hypothesis made at the stage of model selection. An example of such situation is considered below.
Note that mapping a continuous curve onto a discrete set (or a broken line) is of course not a one-to-one correspondence, and hence some information is lost. A question arises as to how important this information loss can be. In particular, whether the pattern of movement may change depending on the observations frequency has been a controversial issue. Using data simulated from a mathematical model, Reynolds (2008) showed that, if for a certain Δt the movement pattern along the broken line is a Lévy walk, then a ‘sub-sampling’ with a larger Δt will also result in a Lévy walk and even with a heavier tail. Similar results were obtained by Plank & Codling (2009) who also emphasized that standard statistical tools may sometimes fail to distinguish between the power law and the exponential rate of decay in the step size distribution and hence fail to distinguish between the two qualitatively different movement patterns (see also Codling & Plank 2011). A question remains largely open as to what may happen in case of a small/decreasing Δt. Note that this is not only important but also a timely issue as the developments in relevant laboratory and field equipment makes it possible to obtain high-resolution data on animal position.
Assuming that the observation frequency is properly established, another question arises as to how large should be the total time scale of observations in order to provide robust information about the movement pattern. That revokes a recent discussion in the literature that, in some cases, the Lévy flight may be difficult to identify in the movement data as it results in the pattern similar to that of the correlated random walk (Austin et al. 2004; Reynolds 2010). This issue was first addressed by Viswanathan et al. (2005) who, having considered the tangling impact of the turning angles (Bartumeus et al. 2008), showed that distinguishing between different types of movement (e.g. correlated random walk versus Lévy flight) is possible only when the data span over a sufficiently large time scale, i.e. when the time series of animal positions is long enough. Here, we consider this problem from a different angle. Having chosen parameter estimators for the step size distribution, we reveal their convergence rate(s) for a few most commonly used random walk models. This, in principle, makes it possible to estimate the amount of data required to provide an estimate of parameters with a given accuracy.
In this paper, we address the issue of statistical analysis of animal random walk data using a mathematically rigorous approach borrowed from the general theory of stochastic processes. We revisit a few commonly used random walk types and reveal the convergence rate for different parameters. Analysis of the power law and the truncated power law distributions leads to a conclusion that the best-fitting hypothesis may indeed depend on the time scale of observations. Moreover, we show that an increase in the observation frequency may change the whole pattern of movement unless some a posteriori biological information is used. We thus conclude that the random walk is not robust to the time scale of the observation. Finally, we discuss how our findings may help researchers working on individual animal movement to reveal the movement pattern and to identify the movement type.
(a) Outline of the method
The local asymptotic normality (LAN) property is a vital concept in asymptotically optimal statistical analysis. In short, the LAN property for a differentiable statistical model for the parameter to be estimated is defined through the weak convergence of the likelihood ratio to the Gaussian shift experiment; for each , under , where is a probability measure associated with θ restricted to the filtration , is a sequence of diagonal matrices in whose diagonal entries tend to zero, is a non-negative definite deterministic matrix in , called the Fisher information matrix and under . If the above weak convergence holds, then we say that the LAN property holds at point θ with the rate Rn(θ) and the Fisher information matrix . The convergence is equivalent to the following locally asymptotically quadratic structure of log-likelihood function 1.1 where the vector Rn(θ)∇θ(ℓn(θ)) converges in law to under and, where the matrix Rn(θ)Hessθ(ℓn(θ))Rn(θ) converges in -probability to . Moreover, if the LAN property holds with non-singular , then a unbiased estimator of θ is said to be asymptotically efficient in a neighbourhood of θ if under , that is, such estimators achieve asymptotically the Cramer–Rao lower bound for the estimation variance. (We refer the reader to Cramér 1946; Le Cam 1960; Rao 1973; Le Cam & Yang 1990 for thorough details.) Let us remark that the concept of asymptotic efficiency resides within the situation where asymptotic normality of the estimator holds valid. The efficiency here is not necessarily identical to the optimality in a general sense, which we will demonstrate with concrete examples in §2b,c.
Let us close this introductory section with some notation which will be used throughout the text. We denote by the d-dimensional Euclidean space with the norm ∥⋅∥ and the inner product 〈⋅,⋅〉. We use the incomplete gamma function for and x>0, where the case s<0 is well defined as long as x>0. Let us denote by the identity matrix in . We denote by the Gaussian law with mean γ and covariance matrix Σ, that is, its probability density function is given by and by Exp (λ) the exponential distribution with rate λ>0, that is, its probability density function is given by
2. Random walk models
Let us begin with a general modelling base, which applies to all the discrete time random walks models of this paper. Let be a discrete time stochastic process in . In this paper, we call it a (discrete time) random walk if
the increments forms a sequence of identically distributed (not necessarily independent) random vectors in ,
the standardized increments form a sequence of independent and identically distributed uniform random vectors on the unit sphere in .
The condition (ii) means that each step has no preference in direction, which corresponds to an isotropic random walk (Codling et al. 2008).
(a) Ordinary Gaussian random walks
We mean by ordinary Gaussian random walk (also known in applications as the Brownian motion) a random walk with being iid with Gaussian distributed length and with no directional preference. To define this precisely and uniquely, we first transform, without any essential loss of information, each step Zn in into a random variable Wn (i.e. in ) and an angle ηn∈(0,π] in the polar decomposition 2.1 By restricting the angles to the half circle (0,π], we let the random variables possess a sign. In the case of the ordinary Gaussian random walk model, is a sequence of iid random variables in (0,π] and is a sequence of iid centered Gaussian, that is, , where σ>0 is the parameter to be estimated. It is well known that the maximum-likelihood estimator is asymptotically efficient and satisfies under that , a.s., and We omit the proof as those are very well-known results (Le Cam 1960; Ferguson 1996).
(b) Pareto random walks
We call a random walk Pareto random walk if forms a sequence of iid Pareto random variables whose probability density has the form 2.2 where α>0 and τ>0 are parameters, α>0 quantifying the rate of decay of the probability density at large y. In applications to animal movement, Pareto random walk as given by (2.2) is usually referred to as a ‘power law distribution’, for instance, of the jump size y (Viswanathan et al. 1996; Bartumeus et al. 2003; Sims et al. 2008). The corresponding pattern of individual animal movement is usually referred to as a Lévy flight. Obviously, it is valid only if the value of the ‘cutoff’ τ at small jumps is positive.1 In the following text, we will show that this simple observation sometimes may have crucial consequences for data interpretation.
Generation of Pareto random variables for modelling purposes is straight-forward as the probability density function (2.2) has an inverse of its tail integral in closed form, that is, The algorithm is as simple as;
step 1. Generate U as uniform (0,1).
step 2. Exit with τU−1/α.
In figure 1, we draw typical sample paths of the rotation invariant Lévy flight in .
The unknown parameter to be estimated in our hypothesis is [α,τ]⊤. To state results, let us prepare some notation. Let Θ1 and Θ2 be bounded convex domains satisfying and We denote by θ0:=[α0,τ0]⊤∈Θ1 the true value of the unknown parameter. We denote by the probability measure associated with θ∈Θ1, or with θ∈Θ2 and τ0 being known.
The minimum is the maximum-likelihood estimator of τ0. In particular, it holds under -a.s. that as , 2.3
Suppose that τ0 is known. The LAN property holds at the point α∈Θ2 with In particular, letting it holds under that as , a.s., and 2.4
Note that for each , . This indicates that the Cramér–Rao bound for τ is infinite. In the meantime, owing to (i) of theorem 2.1, the simple minimum of the sample converges to the true value at the much faster rate of n and is not asymptotically normal. It is noteworthy that this faster convergence in (2.3) is realized without knowing the true value of α. Therefore, in practice, given a certain number of sample steps, the parameter τ is estimated via (2.3), then the parameter (c,α) is estimated jointly via (2.4) with τ0 being the estimate from the first procedure. Note that the maximum-likelihood estimator of α is rather than earlier. For each , however, it is neither an unbiased estimator nor attains the Cramér–Rao bound (see Rytgaard 1990 for more details). Also, once τ0 is known, the Pareto sample is nothing but a disguised exponential sample, as . Hence, the Pareto sample with known τ0 can be analysed with various known techniques for exponential distributions.
(c) Tempered Pareto random walks
As before, we call a random walk tempered Pareto random walk if forms a sequence of iid-tempered Pareto random variables, while there could exist two definitions.
(i) Tempering density function
The first definition is given in the form of probability density function 2.5 where , and . For convenience, we interpret (2.5) as the distribution of the jump size y of animal movement along the path. Correspondingly, the distribution defined by (2.5) is usually known as a ‘truncated power law’ or, in a more precise manner, as a power law with exponential cutoff at large distances (Newman 2005; Edwards et al. 2007; Mashanova et al. 2010). In terms of the underlying stochastic process, truncated power law may indicate a transition from superdiffusive Lévy flight to a more usual, diffusive Brownian motion, e.g. owing to the boundedness of the available space (Edwards et al. 2007; Mashanova et al. 2010).
In view of the limiting relation 2.6 the Pareto random walk can also be thought of as a limit of this tempered Pareto random walk. This definition provides a simpler acceptance–rejection method.
step 1. Generate U as uniform (0,1).
step 2. Generate V through algorithm 1, independent of U.
step 3. If U≤e−κV, then exit with V . Otherwise, return to step 1.
The acceptance rate at step 3 is which tends up to 1 as κ↓0, while down to 0 as . Hence, this acceptance–rejection method tends to terminate more quickly with smaller κ. In figure 2, we draw typical sample paths of the rotation invariant truncated Lévy flight in . Evidently, sample paths look more like non-tempered Pareto random walk for smaller κ.
The unknown parameter to be estimated in our hypothesis is [α,κ,τ]⊤. In a similar manner to before, let Θ3 and Θ4 be bounded convex domains satisfying and We denote by θ0:=[α0,κ0,τ0]⊤∈Θ3 the true value of the unknown parameter and by the probability measure associated with θ∈Θ3, or with θ∈Θ4 and τ0 being known.
The minimum is the maximum-likelihood estimator of τ0. In particular, it holds under that as , 2.7
Suppose that τ0 is known. The LAN property holds at the point θ:=[α,κ]⊤∈Θ4 with where 2.8 In particular, the matrix is not singular.
For each , define 2.9 where , p=1,2,3, and define 2.10 Then, it holds under that as , , a.s., and
As before, estimation of the parameter τ is possible at the much faster rate n without knowing the true value of (α,κ). Efficient estimation of (α,κ) here is not a trivial problem, because the likelihood equation is highly involved and requires numerical approximation methods to solve. The condition (iii) of the theorem 2.2 claims that the method of moments with the so-called method of scoring achieves the asymptotic efficiency. (It is not clear whether the method of moments alone achieves the asymptotic efficiency.) Let us, however, note two possible issues in this approach. First, we might be careful in estimation accuracy of xl,ns, in particular of higher orders, which may require an extremely great number of samples when the distribution has a heavy tail, that is, when κ0 is close to zero. Also, it is a little cumbersome to compute the matrix for every n, which may be addressed by somehow avoiding to update the matrix at each iteration.
In principle, the model based on the tempered Pareto distribution is not appropriate either when κ is very large or when it is very close to zero. If κ is very large, on the one hand, the tempered Pareto distribution is nearly exponential at infinity. A simpler model based on the exponential distribution is then enough to describe sample paths. Note that the issue of higher moments in the above method of moments is likely to disappear, while estimates of α0 tend to be very unstable in return. On the other hand, if κ is very close to 0, then the tempered Pareto distribution is nearly non-tempered. In this case, the non-tempered Pareto model should work sufficiently well. The user can, therefore, select the models with a posteriori knowledge.
(ii) Tempering tail probability
There exists another definition of the tempered Pareto distribution based on the tail probability 2.11 where , , and where the probability density function f(y;α,κ,τ) is also available in closed form 2.12 with f(y;α,0,τ) being the Pareto density function (2.2). Clearly, it reduces to the Pareto density as soon as κ=0. Rather than a numerical inversion of the tail probability (2.11), the density function suggests the following acceptance-rejection method.
step 1. Generate U as uniform (0,1).
step 2. Generate V through algorithm 2.1, independent of U.
step 3. If then exit with V . Otherwise, return to step 1.
The acceptance rate at step 3 is which tends up to as κ↓0, while down to 0 as .
Recently, the estimation problem of the tempered Pareto distribution of this form is addressed in Meerscheart et al. (in press), based on Hill-type estimators. We simply describe the LAN property and refer the reader to Meerscheart et al. (in press) for more details on estimation. The sets Θ3 and Θ4 are the same, respectively, as the ones in theorem 2.2.
The minimum is the maximum-likelihood estimator of τ0. In particular, it holds under that as ,
Suppose that τ0 is known. The LAN property holds at the point θ:=[α,κ]⊤∈Θ4 with In particular, the matrix is not singular.
(d) Fractional Gaussian random walks
We mean by fractional Gaussian random walk a random walk with being iid with a common Gaussian distribution, while not necessarily independently distributed. We again use the polar decomposition (2.1) to transform in into a sequence of random variables in with , , and a sequence of iid uniform random variables over (0,π]. In addition, we hypothesize that steps are not necessarily independent and have the autocovariance structure where H∈(0,1) is often called the Hurst parameter. In this sense, the fractional Gaussian random walk lies in the class of correlated random walks. However, contrary to the standard CRW (Kareiva & Shigesada 1983), here the correlation is between the step sizes rather than between the turning angles. Therefore, the path does not necessarily show any directional preference. In figure 3, we draw typical sample paths of the fractional Gaussian random walk in . Note that when H=1/2, this model reduces to the ordinary Gaussian random walk discussed in §2a.
We want to mention it here that, as well as the models considered earlier, the fractional Gaussian random walk was suggested as a possible stochastic process underlying animal movement (Gardner et al. 1989; Reynolds 2009).
The unknown model parameters to be estimated are . Let Θ5 be bounded convex domain satisfying We denote by θ0:=[σ0,H0]⊤∈Θ5 the true value of the unknown parameter and by the probability measure associated with θ∈Θ5. The random vector Y n:=[W1,…,Wn] in has the probability density function where Tn(H) is the symmetric Toeplitz matrix which indicates the autocorrelation structure for k1,k2=1,…,n.
The LAN property holds at the point θ∈Θ5 with In particular, the matrix is singular for every θ∈Θ5.
Owing to the singularity of the Fisher information matrix, the conventional asymptotic efficiency theory is not applicable to the full joint estimation of the two parameters. Nevertheless, as soon as either σ or H is fixed, the matrix reduces to with a strictly positive entry. It is noteworthy that the singularity across σ and H occurs also under high-frequency sampling, as discussed in Kawai (in press; preprint). We omit a practical estimation procedure which attains the above efficiency, as it is essentially very involved. For details on estimation based on Whittle maximum-likelihood estimators and a plug-in version of maximum-likelihood estimators, we refer the reader to, for example, Dahlhaus (1989) and Fox & Taqqu (1986). From a parameter estimation point of view, thus, this model is not necessarily the best candidate among the random walk models discussed in this paper.
3. Discussion and conclusions
Analysis and interpretation of animal movement data is one of the most controversial issues in contemporary ecology. Importance of rare events such as long-distance jumps and, on the other hand, availability of large amounts of high-resolution data owing to recent developments in the methodology and equipment for field and laboratory experiments (Nathan et al. 2008; Mashanova et al. 2010) pose a considerable challenge for data analysis. Commonly used statistical approaches do not always allow distinguishing unambiguously between qualitatively different processes such as, for instance, correlated random walk and the Lévy flight, and that has caused a heated debate (Turchin 1996; Benhamou 2007; Edwards et al. 2007; Plank & Codling 2009; Reynolds 2010). Moreover, it remains a largely open question to what extent the inherently continuous animal movement can be adequately described by discrete random walk models as it is intuitively clear (cf. Turchin 1988) that the results may depend on the time-step of the observation (Reynolds 2008; Plank & Codling 2009; Codling & Plank 2011).
In this paper, we addressed this issue by considering the problem of parameter estimation for random walk data. Having applied methods of the stochastic processes theory (the Fisher information for unbiased joint estimation of the model parameters) we provided ready-to-use formulas to estimate parameter values for a few widely used random walk models such as ordinary and fractional Gaussian random walk, Pareto random walk (also known as power law) and tempered Pareto random walk (aka truncated power law). We have shown that there exist both theoretical and computational difficulties in the ‘more realistic’ models. In particular, in the tempered Pareto model, a root of its likelihood equation is unavailable in closed form. In the fractional Gaussian random walk model, the conventional asymptotic optimality theory is not applicable to the full joint estimation of its two model parameters as the Fisher information matrix is singular. This fact suggests that much care should be taken in the choice of the underlying model. Moreover, a posteriori knowledge may be important; see the comments at the end of §2b(i).
Note that here we focused on Gaussian, Pareto and tempered Pareto random walk models as they are widely regarded as paradigms of a few qualitatively different types of individual animal movement. However, our approach is not reduced to these cases only, and its extension onto some other cases may be possible. For instance, a relatively straightforward yet meaningful extension of the proposed framework can be performed by introducing some preference in directions for Gaussian and fractional Gaussian random walks, instead of them being equiprobable. It is readily seen that, if the distribution of the turning angles is symmetric with respect to zero (e.g. like in the standard correlated random walk), then our results remain essentially the same.
(a) Theoretical implications
One of the main problems in individual animal movement studies is to identify the pattern of movement. Now we are going to discuss how our findings may help to solve this problem. However, before we move to a further discussion, a general remark seems to be necessary in order to clarify the issue. In the literature on individual animal movement, the power law distribution f(y)∼y−(α+1) (cf. equation (2.2)) is often called ‘scale-free’ for the reason that, if α∈(0,1], then the average step size 〈y〉 does not exist as the corresponding integral is diverging. However, we believe that this terminology is confusing and misleading. We want to emphasize here that, whatever the type of the individual movement (e.g. Brownian motion or a Lévy flight or walk), the movement is never really scale-free.2 There always exists a characteristic step size, although it may be defined somewhat differently. Below, we mention just a few possibilities:
Median of the distribution. Consider y* such as 3.1 Obviously, y* gives a characteristic scale of the random walk with a very clear meaning.
Dimensions analysis 1. Another argument resulting in a characteristic scale is based on the observation that, if we consider the distribution f(y) of the step size, then y is not an abstract number but has the dimension of length, which we denote as [y]=L. Correspondingly, the probability density distribution f is measured not in abstract units but has the dimension of inverse length, i.e. [f]=L−1. It means that, apart from a senseless case of α=0, the step size probability distribution function must always include a parameter with the dimension of length, which then can be regarded as a characteristic scale. In particular, we immediately observe that equation (2.2) contains the cutoff parameter [τ]=L.
Dimensions analysis 2. Note that the step size probability distribution with an asymptotic power law behaviour at large y can be written in a different form (known as a Cauchy-type distribution): 3.2 where p>0 is a parameter and Cp,α is the normalizing constant. Equation (3.2) does not contain a cutoff but it has got another parameter p instead. It is readily seen that [p]=L, so it can be regarded as a characteristic scale of the movement.
Similar arguments apply to truncated power law as well. Note that here we talk not about an external spatial scale which can arise owing to the boundedness of the space available for the random walk (consider, for instance, dispersal of terrestrial birds in a small island). Although its importance for the movement pattern has been widely recognized (Viswanathan et al. 2011) so that there is even an opinion that it may turn the power law into a truncated power law (Edwards et al. 2007), the existence of an external spatial scale is in many cases obvious. Instead, here we talk about something much less obvious, i.e. about an inherent spatial scale that originates in the power law itself and hence would have been present even in an idealistic case of movement in unbounded space.
The actual issue with the power law distribution is not the absence of a characteristic scale—which, as we have shown it above, always exists—but whether this scale is representative of the process(es) resulting in a heavy-tailed distribution. It is this issue that our findings seem to make a significant contribution to the understanding of. For instance, in case of Pareto distribution (2.2) with α close to zero, the frequency of large jumps can be high but the scale parameter τ presumably remains small being estimated as the minimum value observed in the data (see §2c). We have, therefore, shown that, in fact, parameter τ controls the type of the distribution. Indeed, the power law and truncated power law distributions are only formally valid when the minimum jump size is positive, whatever good may be the fitting of the data at large and intermediate jumps. This leads to a counterintuitive conclusion that, for an individual movement described by Pareto distribution, we cannot separate the apparently different scales of small and large jumps and the result of the analysis of long-distance dispersal essentially depend on animal movements on small scale. We refer to this situation as a genuine Lévy flight. It is not scale-free, as it was suggested by earlier studies, but its specifics are that the small and large scales appear to be equally important and cannot be separated.
Applying this result to movement data, we note that animal movement is never uninterrupted, usually alternating periods of fast movement with periods of slow movement or rest (Kramer & McLaughlin 2001; Newlands et al. 2004; Mashanova et al. 2010; Knell & Codling in press) Therefore, having the time step chosen sufficiently small, which seems to be prescribed by a natural desire to have the data as detailed as possible, the dataset inevitably includes periods of rest when the increment in the animal position is zero (up to the measurement error). In this case, a power law-type distribution is not valid because the estimation procedure gives τ=0 (see equations (2.3) and (2.7)). However, by means of coarsening the time-grid, the zero-valued increments can be eliminated, which restore the validity of the power law (or truncated power law) distribution.
A question then arises as to what should be the course of action if the data fitting at large and intermediate jumps do suggest a power law but the estimate (2.3) results in τ=0. We hypothesize that, in this situation, the movement pattern is not actually a Lévy flight and fitting the data with a power law is descriptive but not insightful. Such an ‘artificial’ power law may appear as a result of heterogeneity of the movement data, e.g. when movement tracks of different animals are pooled together (Petrovskii et al. 2011) or when a few different movement modes are mixed (De Jager et al. 2011), even when each of the tracks/modes may be a Brownian motion (Jansen et al. in press). We refer to this situation as a superficial Lévy flight. We want to mention here that, in this situation, a power law may appear to be the best fit simply because the range of hypotheses from which the choice is made was not sufficiently broad (De Jager et al. 2011; Jansen et al. in press). It will then be outperformed by the true model (e.g. a composite Brownian motion) once this true probability distribution is added into the range (Jansen et al. in press). Our conclusions are in good agreement with an earlier study by Benhamou (2007) who showed that one should distinguish between the Lévy flight with the Lévy pattern.
For similar reasons, we believe that the Cauchy-type distribution is not an appropriate model for a genuine Lévy flight. Unlike (2.2), distribution (3.2) predicts only a power law at very large jump size, but the deviation from the power law becomes significant already at the intermediate scale y∼p. If (3.2) appears to be the best fit to explain movement data, then that may indicate the data are a mixture of different movement modes.
Therefore, we demonstrated that the properties of a given random walk can be essentially different on a different time scale, i.e. for a different frequency of observations. Note that the zero-valued increments or jumps can be removed from the dataset by distinguishing between different movement modes, e.g. by separating the periods of actual movement (also known as bouts) from the periods of slow movement or rest (Mashanova et al. 2010). However, this procedure is purely heuristic and usually based on a conventional value of a ‘threshold’ jump value separating the movement and the rest. Although it is often used in practice, such a separation is lacking a solid theoretical basis, and the question about the possible impact of the threshold value on the type of the distribution as well as on the corresponding parameter values is largely open. Our findings suggest that not only parameter estimation, in some cases, but also the model itself can be quite sensitive to the choice of this threshold.
(b) Implications for data analysis
On a more technical side, we have shown that, even within the same model, the convergence rate for different parameters can be different. Especially, the latter seems to bring a practically important message that, given a required accuracy of parameter estimation, the constraint on the minimum number of the data-points originates in the parameter with the slowest convergence rate. Taking into account the recent growth in availability of high-resolution data (cf. Nathan et al. 2008), this seems to be a useful and timely result.
In conclusion, we show how our results on the convergence rate and the residual error distribution may help to evaluate the amount of data that is required to estimate parameter values with a given accuracy or tolerance. Consider Pareto distribution (2.2) as an important example. Let τ(n) is an estimate of τ obtain from a dataset containing n data points, i.e. see (2.3). Suppose we wish to ensure with the (100σ)%-confidence that the difference between the estimate τ(n) and the actual value τ is less than the prescribed tolerance ϵ>0. Then, we need at least the following number of observations: 3.3 where the function ceil(A) rounds A to the nearest integer greater than or equal to A. Indeed, from which we immediately obtain (3.3).
If, by way of example, we now set ϵ=τ/10 and σ=0.99 (i.e. to estimate τ with 10% tolerance and 99% confidence), then we obtain: 3.4 Note that the above estimate is obtained without any initial guess or a priori information about the true value of τ. For a few hypothetical values such as α=0.5, α=1.0 and α=1.75, we obtain, respectively: 3.5 Interestingly, the smaller is α (i.e. the heavier is the tail of the distribution) the larger is the required amount of data.
Now we recall that the rate of convergence for τ is n−1/2 while the rate of convergence for parameter α is n−1. Omitting calculations details, roughly speaking it means that, in order to evaluate α, the number of data points required to obtain an estimate for τ (as given by equations (3.4) and (3.5)) should be squared. Therefore, for Pareto distribution (2.2) with a relatively fast rate of decay (α=1.75), a dataset of about 103 data points should be sufficient to estimate both parameters with a good accuracy, while a heavier tail (α=0.5) may require a much larger dataset of about 104.
S.P. gratefully acknowledges funding from The Leverhulme Trust through grant no. F/00 568/X.
Appendix A. Proofs
This section is devoted to proofs of the results. The complete proof entails rather lengthy arguments of somewhat routine nature. To avoid overloading the paper, we omit non-essential details in some instances.
Proof of theorems 2.1 and 2.2.
We prove the results under the setting of theorem 2.2, as those of theorem 2.1 can be easily recovered with the help of the asymptotics (2.6).
The result follows from the straightforward algebra where the first equality holds because is a sequence of iid random variables. The limiting function (in x) indicates the exponential distribution with the desired rate.
It holds that for each , where for θ∈Θ4, We can show with the pathwise Taylor expansion and the mean value theorem that where and −RnHessθ(ℓn(θ))Rn is independent of n and is no longer a random matrix. First, it holds by the central limit theorem and by the continuity of the summands in (α,κ) that for each ϵ∈(0,1), as , Note that owing to the compactness of Θ4 and the continuity of the Hessian matrix in θ, where denotes the operator norm of a linear transform, Hence, it holds that
Next, we can deduce that for each , where we denote by ∇θ(ℓn,1(θ)) the independent summand in ∇θ(ℓn(θ)), that is, . (See lemma 2.5 of Meerscheart et al. (in press).) This shows the desired LAN property. Note that ∇θ(ℓn,1(θ)), which we have defined just above, is independent of n. Hence, we can show that is not singular, with the help of the Cauchy–Schwarz inequality.
The rest of theorem 2.1 is obvious as is not singular. For the part (iii) of theorem 2.2, observe that Using the recurrence relation Γ(s,z)=(s−1)Γ(s−1,z)+zs−1 e−z, we get By solving the above for α and κ, we get the equations (2.9). The desired results follows from the method of scoring. The proof is complete. ■
Proof of theorem 2.4.
We have where In what follows, we will use the notation where We denote by Tn(H)−1/2 the matrix in satisfying (Tn(H)−1/2)⊤Tn(H)−1/2=Tn(H)−1 and write Tn(H)1/2 for the matrix in satisfying . We denote by Zn be a standard normal random vector in under . With those notations, it holds under that for each , , that is, Based on the above expressions, we can show that where we have used the fact that is chi-square with n degrees of freedom, the identity A1 and the well-known property tr[ABC]=tr[CAB]. Next, again because is chi-square with n degrees of freedom, we get where we let indicate the (k1,k2)th entry of the matrix , with a slight notational abuse for simplicity. Next, by letting Bn:=(Tn(H)1/2)⊤∂H(Tn(H)−1)Tn(H)1/2 and by using and ,
we can show that where we have used theorem 5.1 of Dahlhaus (1989) and A2 By using the well-known identity , we can derive that where we have again used theorem 5.1 of Dahlhaus (1989) and the identities (A1) and (A2).
Next, we show the convergence of the Hessian matrix. It is straightforward that for each , where we have used the law of large numbers of the chi-square distribution. As before, we can show that and where the asymptotics hold in -probability as and where we have again used theorem 5.1 of Dahlhaus (1989), the identities (A1) and (A2), and We can also show that and that , as , in a manner similar to the derivation of the matrix . We get the desired result by applying this to the pathwise Taylor expansion with the mean value theorem The singularity of the matrix is evident. The proof is complete. ■
↵1 Note that observation data on animal movement are usually distributed on the whole semiline, i.e. f>0 for any y≥0. It means that the events with small jump size should either be excluded for some a posteriori reasons (see §3) or otherwise the probability distribution function f should be defined additionally for 0≤y≤τ.
- Received November 7, 2011.
- Accepted January 12, 2012.
- This journal is © 2012 The Royal Society