## Abstract

Lévy processes, which have stationary independent increments, are ideal for modelling the various types of noise that can arise in communication channels. If a Lévy process admits exponential moments, then there exists a parametric family of measure changes called Esscher transformations. If the parameter is replaced with an independent random variable, the true value of which represents a ‘message’, then under the transformed measure the original Lévy process takes on the character of an ‘information process’. In this paper we develop a theory of such Lévy information processes. The underlying Lévy process, which we call the fiducial process, represents the ‘noise type’. Each such noise type is capable of carrying a message of a certain specification. A number of examples are worked out in detail, including information processes of the Brownian, Poisson, gamma, variance gamma, negative binomial, inverse Gaussian and normal inverse Gaussian type. Although in general there is no additive decomposition of information into signal and noise, one is led nevertheless for each noise type to a well-defined scheme for signal detection and enhancement relevant to a variety of practical situations.

## 1. Introduction

The idea of filtering the noise out of a noisy message as a way of increasing its information content is illustrated by Norbert Wiener in his book *Cybernetics* [1] by means of the following example. The true message is represented by a variable *X* which has a known probability distribution. An agent wishes to determine as best as possible the value of *X*, but owing to the presence of noise the agent can only observe a noisy version of the message of the form *ξ*=*X*+*ϵ*, where *ϵ* is independent of *X*. Wiener shows how, given the observed value of the noisy message *ξ*, the original distribution of *X* can be transformed into an improved *a posteriori* distribution that has a higher information content. The *a posteriori* distribution can then be used to determine a best estimate for the value of *X*.

The theory of filtering was developed in the 1940s when the inefficiency of anti-aircraft fire made it imperative to introduce effective filtering-based devices [2,3]. A breakthrough came with the work of Kalman, who reformulated the theory in a manner more well-suited for dynamical state-estimation problems [4,5]. This period coincided with the emergence of the modern control theory of Bellman & Pontryagin [6,7]. Owing to the importance of its applications, much work has been carried out since then. According to an estimate of Kalman [8], over 200 000 articles and monographs have been published on applications of the Kalman filter alone. The theory of stochastic filtering, in its modern form, is not much different conceptually from the elementary example described by Wiener in the 1940s. The message, instead of being represented by a single variable, in the general setup can take the form of a time series (the ‘signal’ or ‘message’ process). The information made available to the agent also takes the form of a time series (the ‘observation’ or ‘information’ process), typically given by the sum of two terms, the first being a functional of the signal process, and the second being a noise process. The nature of the signal process can be rather general, but in most applications the noise is chosen to be a Wiener process [9–11]. There is no reason, however, why an information process should necessarily be ‘additive’, or even why it should be given as a functional of a signal process and a noise process. From a mathematical perspective, it seems that the often proposed ansatz of an additive decomposition of the observation process is well adapted to the situation where the noise is Gaussian, but is not so natural when the noise is discontinuous. Thus, while a good deal of recent research has been carried out on the problem of filtering noisy information containing jumps [12–17] such work has usually been pursued under the assumption of an additive relation between signal and noise, and it is not unreasonable to ask whether a more systematic treatment of the problem might be available that involves no presumption of additivity and that is more naturally adapted to the mathematics of the situation.

The purpose of the present paper is to introduce a broad class of information processes suitable for modelling situations involving discontinuous signals, discontinuous noise and discontinuous information. No assumption is made to the effect that information can be expressed as a function of signal and noise. Instead, information processes are classified according to their ‘noise type’. Information processes of the same noise type are then distinguished from one another by the messages that they carry. Each noise type is associated to a Lévy process, which we call the *fiducial process*. The fiducial process is the information process that results for a given noise type in the case of a null message, and can be thought of as a ‘pure noise’ process of that noise type. Information processes can then be classified by the characteristics of the associated fiducial processes. To keep the discussion elementary, we consider the case of a one-dimensional fiducial process and examine the situation where the message is represented by a single random variable. The goal is to construct the optimal filter for the class of information processes that we consider in the form of a map that takes the *a priori* distribution of the message to an *a posteriori* distribution that depends on the information that has been made available. A number of examples will be presented. The results vary remarkably in detail and character for the different types of filters considered, and yet there is an overriding unity in the general scheme, which allows for the construction of a multitude of examples and applications.

A synopsis of the main ideas, which we develop more fully in the remainder of the paper, can be presented as follows. We recall the idea of the Esscher transform as a change of probability measure on a probability space that supports a Lévy process {*ξ*_{t}}_{t≥0} that possesses -exponential moments. The space of admissible moments is the set . The associated Lévy exponent then exists for all *α*∈*A*_{C} :={*w*∈C:Re *w*∈*A*}, and does not depend on *t*. A parametric family of measure changes commonly called Esscher transformations can be constructed by use of the exponential martingale family , defined for each *λ*∈*A* by . If {*ξ*_{t}} is a -Brownian motion, then {*ξ*_{t}} is -Brownian with drift *λ*; if {*ξ*_{t}} is a -Poisson process with intensity *m*, then {*ξ*_{t}} is -Poisson with intensity e^{λ}*m*; if {*ξ*_{t}} is a -gamma process with rate parameter *m* and scale parameter *κ*, then {*ξ*_{t}} is -gamma with rate parameter *m* and scale parameter *κ*/(1−*λ*). Each case is different in character. A natural generalization of the Esscher transform results when the parameter *λ* in the measure change is replaced by a random variable *X*. From the perspective of the new measure , the process {*ξ*_{t}} retains the ‘noisy’ character of its -Lévy origin, but also carries information about *X*. In particular, if one assumes that *X* and {*ξ*_{t}} are -independent, and that the support of *X* lies in *A*, then we say that {*ξ*_{t}} defines a *Lévy information process* under carrying the message *X*. Thus, the change of measure inextricably intertwines signal and noise. More abstractly, we say that on a probability space a random process {*ξ*_{t}} is a Lévy information process with message (or ‘signal’) *X* and noise type (or ‘fiducial exponent’) *ψ*_{0}(*α*) if {*ξ*_{t}} is *conditionally* -Lévy given *X*, with Lévy exponent *ψ*_{0}(*α*+*X*)−*ψ*_{0}(*X*) for *α*∈C^{I}:={*w*∈C:Re *w*=0}. We are thus able to classify Lévy information processes by their noise type, and for each noise type we can specify the class of random variables that are admissible as signals that can be carried in the environment of such noise. We consider a number of different noise types, and construct explicit representations of the associated information processes. We also derive an expression for the optimal filter in the general situation, which transforms the *a priori* distribution of the signal to the improved *a posteriori* distribution that can be inferred on the basis of received information.

The plan of the paper is as follows. In §2, after recalling some facts about processes with stationary and independent increments, we define Lévy information, and in proposition 2.2 we show that the signal carried by a Lévy information process is effectively ‘revealed’ after the passage of sufficient time. In §3, we present in proposition 3.1 an explicit construction using a change of measure technique that ensures the existence of Lévy information processes, and in proposition 3.2 we prove a converse to the effect that any Lévy information process can be obtained in this way. In proposition 3.3 we construct the optimal filter for general Lévy information processes, and in proposition 3.4 we show that such processes have the Markov property. In proposition 3.5 we establish a result that indicates in more detail how the information content of the signal is coded into the structure of an information process. Then in proposition 3.6 we present a general construction of the so-called innovations process associated with Lévy information. Finally, in §4, we proceed to examine a number of specific examples of Lévy information processes, for which explicit representations are constructed in propositions 4.1–4.8.

## 2. Lévy information

We assume that the reader is familiar with the theory of Lévy processes [18–23]. For an overview of some of the specific Lévy processes considered later in this paper, we refer the reader to [24]. A real-valued process {*ξ*_{t}}_{t≥0} on a probability space is a Lévy process if: (i) , (ii) {*ξ*_{t}} has stationary and independent increments, (iii) , and (iv) {*ξ*_{t}} is almost surely càdlàg. For a Lévy process {*ξ*_{t}} to give rise to a class of information processes, we require that it should possess exponential moments. Let us consider the set defined for some (equivalently for all) *t*>0 by
2.1If *A* contains points other than *w*=0, then we say that {*ξ*_{t}} possesses exponential moments. We define a function *ψ*:*A*→R called the Lévy exponent (or cumulant function), such that
2.2for *α*∈*A*. If a Lévy process possesses exponential moments, then an exercise shows that *ψ*(*α*) is convex on *A*, that the mean and variance of *ξ*_{t} are given, respectively, by *ψ*′(0) *t* and *ψ*′′(0) *t*, and that as a consequence of the convexity of *ψ*(*α*) the marginal exponent *ψ*′(*α*) possesses a unique inverse *I*(*y*) such that *I*(*ψ*′(*α*))=*α* for *α*∈*A*. The Lévy exponent extends to a function *ψ*:*A*_{C} →C where *A*_{C} ={*w*∈C:Re *w*∈*A*}, and it can be shown [19], Theorem 25.17 that *ψ*(*α*) admits a Lévy–Khintchine representation of the form
2.3with the property that (2.2) holds for all *α*∈*A*_{C} . Here, 1{⋅} denotes the indicator function, *p*∈R and *q*≥0 are constants, and the so-called Lévy measure *ν*(d*z*) is a positive measure defined on R\{0} satisfying
2.4If the Lévy process possesses exponential moments, then for *α*∈*A* we also have
2.5The Lévy measure has the following interpretation: if *B* is a measurable subset of R\{0}, then *ν*(*B*) is the rate at which jumps arrive for which the jump size lies in *B*. Consider the sets defined for *n*∈N by *B*_{n}={*z*∈R:1/*n*≤|*z*|≤1}. If *ν*(*B*_{n}) tends to infinity for large *n*, we say that {*ξ*_{t}} is a process of infinite activity, meaning that the rate of arrival of small jumps is unbounded. If one says that {*ξ*_{t}} has finite activity. We refer to the data *K*=(*p*,*q*,*ν*) as the characteristic triplet (or ‘characteristic’) of the associated Lévy process. Thus, we can classify a Lévy process abstractly by its characteristic *K*, or equivalently its exponent *ψ*(*α*). This means that one can speak of a ‘type’ of Lévy noise by reference to the associated characteristic or exponent.

Now suppose we fix a measure on a measurable space , and let {*ξ*_{t}} be -Lévy, with exponent *ψ*_{0}(*α*). There exists a parametric family of probability measures on such that for each choice of *λ* the process {*ξ*_{t}} is -Lévy. The changes of measure arising in this way are called Esscher transformations [25–29]. Under an Esscher transformation, the characteristics of a Lévy process are transformed from one type to another, and one can speak of a family of Lévy processes interrelated by Esscher transformations. The relevant change of measure can be specified by use of the process defined for *λ*∈*A* by
2.6where . One can check that is an -martingale: indeed, as a consequence of the fact that {*ξ*_{t}} has stationary and independent increments, we have
2.7for *s*≤*t*, where denotes conditional expectation under with respect to . It is straightforward to show that {*ξ*_{t}} has -stationary and independent increments, and that the -exponent of {*ξ*_{t}}, which is defined on the set *A*^{λ}_{C} :={*w*∈C:Re *w*+*λ*∈*A*}, is given by
2.8from which by use of the Lévy–Khintchine representation (2.3) one can work out the characteristic triplet *K*_{λ} of {*ξ*_{t}} under . We observe that if the Esscher martingale (2.6) is expanded as a power series in *λ*, then the resulting coefficients, which are given by polynomials in *ξ*_{t} and *t*, form a so-called Sheffer set [30], each element of which defines an -martingale. The first three of these polynomials take the form *Q*^{1}(*x*,*t*)=*x*−*ψ*′*t*, , and , where *ψ*′=*ψ*_{0}′(0), *ψ*′′=*ψ*_{0}′′(0), and *ψ*′′′=*ψ*_{0}′′′(0). The corresponding polynomial Lévy–Sheffer martingales are given by , , and .

In what follows, we use the terms ‘signal’ and ‘message’ interchangeably. We write C^{I}={*w*∈C:Re *w*=0}. For any random variable *Z* on we write , and when it is convenient we write for . For processes, we use both of the notations {*Z*_{t}} and {*Z*(*t*)}, depending on the context.

With these background remarks in mind, we are in a position to define a *Lévy information process*. We confine the discussion to the case of a ‘simple’ message, represented by a random variable *X*. In the situation when the noise is Brownian motion, the information admits a linear decomposition into signal and noise. In the general situation, the relation between signal and noise is more subtle, and has the character of a fibre space, where one thinks of the points of the base space as representing the different noise types, and the points of the fibres as corresponding to the different information processes that one can construct in association with a given noise type. Alternatively, one can think of the base as being the convex space of Lévy characteristics, and the fibre over a given point of the base as the convex space of messages that are compatible with the associated noise type.

We fix a probability space , and an Esscher family of Lévy characteristics *K*_{λ}, *λ*∈*A*, with associated Lévy exponents *ψ*_{λ}(*α*), *α*∈*A*^{λ}_{C} . We refer to *K*_{0} as the fiducial characteristic, and *ψ*_{0}(*α*) as the fiducial exponent. The intuition here is that the abstract Lévy process of characteristic *K*_{0} and exponent *ψ*_{0}(*α*), which we call the ‘fiducial’ process, represents the noise type of the associated information process. Thus, we can use *K*_{0}, or equivalently *ψ*_{0}(*α*), to label the noise type.

### Definition 2.1

By a Lévy information process with fiducial characteristic *K*_{0}, carrying the message *X*, we mean a random process {*ξ*_{t}}, together with a random variable *X*, such that {*ξ*_{t}} is conditionally *K*_{X}-Lévy given .

Thus, given we require {*ξ*_{t}} to have conditionally independent and stationary increments under , and to possess a conditional exponent of the form
2.9for *α*∈C^{I}, where *ψ*_{0}(*α*) is the fiducial exponent of the specified noise type. It is implicit in the statement of definition 2.1 that a certain compatibility condition holds between the message and the noise type. For any random variable *X*, we define its support *S*_{X} to be the smallest closed set *F* with the property that . Then we say that *X* is compatible with the fiducial exponent *ψ*_{0}(*α*) if *S*_{X}⊂*A*. Intuitively speaking, the compatibility condition ensures that we can use *X* to make a random Esscher transformation. In the theory of signal processing, it is advantageous to require that the variables to be estimated should be square integrable. This condition ensures that the conditional expectation exists and admits the interpretation as a best estimate in the sense of least squares. For our purpose, it will suffice to assume throughout the paper that the information process is square integrable under . This in turn implies that *ψ*′(*X*) is square integrable, and that *ψ*′′(*X*) is integrable. Note that we do not require that the Lévy information process should possess exponential moments under , but a sufficient condition for this to be the case is that there should exist a nonvanishing real number *ϵ* such that *λ*+*ϵ*∈*A* for all *λ*∈*S*_{X}.

To gain a better understanding of the sense in which the information process {*ξ*_{t}} actually ‘carries’ the message *X*, it will be useful to investigate its asymptotic behaviour. We write *I*_{0}(*y*) for the inverse marginal fiducial exponent.

### Proposition 2.2

*Let* {*ξ*_{t}} *be a Lévy information process with fiducial exponent* *ψ*_{0}(*α*) *and message* *X*. *Then for every* *ϵ*>0 *we have*
2.10

### Proof.

It follows from (2.9) that *ψ*′_{X}(0)=*ψ*′_{0}(*X*), and hence that at any time *t* the conditional mean of the random variable *t*^{−1}*ξ*_{t} is given by
2.11A calculation then shows that the conditional variance of *t*^{−1}*ξ*_{t} takes the form
2.12which allows us to conclude that
2.13and hence that
2.14On the other hand, for all *ϵ*>0 we have
2.15by Chebychev's inequality, from which we deduce that
2.16and it follows that *I*_{0}(*t*^{−1}*ξ*_{t}) converges to *X* in probability.

Thus, we see that the information process does indeed carry information about the message, and in the long run ‘reveals’ it. The intuition here is that as more information is gained, we improve our estimate of *X* to the point that the value of *X* eventually becomes known with near certainty.

## 3. Properties of Lévy information

It will be useful if we present a construction that ensures the existence of Lévy information processes. First, we select a noise type by specification of a fiducial characteristic *K*_{0}. Next, we introduce a probability space that supports the existence of a -Lévy process {*ξ*_{t}} with the given fiducial characteristic, together with an independent random variable *X* that is compatible with *K*_{0}.

Write for the filtration generated by {*ξ*_{t}}, and for the filtration generated by {*ξ*_{t}} and *X* jointly: . Let *ψ*_{0}(*α*) be the fiducial exponent associated with *K*_{0}. One can check that the process defined by
3.1is a -martingale. We are thus able to introduce a change of measure on by setting
3.2It should be evident that {*ξ*_{t}} is conditionally -Lévy given , since for fixed *X* the measure change is an Esscher transformation. In particular, a calculation shows that the conditional exponent of *ξ*_{t} under is given by
3.3for *α*∈C^{I}, which shows that the conditions of definition 2.1 are satisfied, allowing us to conclude the following:

### Proposition 3.1

*The* -*Lévy process* {*ξ*_{t}} *is a* -*Lévy information process, with message* *X* *and noise type* *ψ*_{0}(*α*).

In fact, the converse also holds: if we are given a Lévy information process, then by a change of measure we can find a Lévy process and an independent ‘message’ variable. Here follows a more precise statement.

### Proposition 3.2

*Let* {*ξ*_{t}} *be a Lévy information process on a probability space* *with message* *X* *and noise type* *ψ*_{0}(*α*). *Then there exists a change of measure* *such that* {*ξ*_{t}} *and* *X* *are* -*independent*, {*ξ*_{t}} *is* -*Lévy with exponent* *ψ*_{0}(*α*), *and the probability law of* *X* *under* *is the same as probability law of* *X* *under* .

### Proof.

First, we establish that the process defined by is a -martingale. We have
3.4by virtue of the fact that {*ξ*_{t}} is -conditionally Lévy under . By use of (2.9), we deduce that *ψ*_{X}(−*X*)=−*ψ*_{0}(*X*), and hence that , as required. Then we use to define a change of measure on by setting
3.5To show that *ξ*_{t} and *X* are -independent for all *t*, it suffices to show that their joint characteristic function under factorizes. Letting *α*,*β*∈C^{I}, we have
3.6where the last step follows from (2.9). This argument can be extended to show that {*ξ*_{t}} and *X* are -independent. Next we observe that
3.7for *u*≥*t*≥0, and it follows that *ξ*_{u}−*ξ*_{t} and *ξ*_{t} are independent. This argument can be extended to show that {*ξ*_{t}} has -independent increments. Finally, if we set *α*=0 in (3.6), it follows that the probability laws of *X* under and are identical; if we set *β*=0 in (3.6), it follows that the exponent of {*ξ*_{t}} is *ψ*_{0}(*α*); and if we set *β*=0 in (3.7), it follows that {*ξ*_{t}} is -stationary.

Going forward, we adopt the convention that always denotes the ‘physical’ measure in relation to which an information process with message *X* is defined, and that denotes the transformed measure with respect to which the information process and the message decouple. Therefore, henceforth we write rather than . In addition to establishing the existence of Lévy information processes, the results of proposition 3.2 provide useful tools for calculations, allowing us to work out properties of information processes by referring the calculations back to . We consider as an example the problem of working out the -conditional expectation under of a -measurable integrable random variable *Z*. The -expectation of *Z* can be written in terms of -expectations, and is given by a ‘generalized Bayes formula’ [31] of the form
3.8This formula can be used to obtain the -conditional probability distribution function for *X*, defined for *y*∈R by
3.9In the Bayes formula, we set *Z*=1{*X*≤*y*}, and the result is
3.10where is the *a priori* distribution function. It is useful for some purposes to work directly with the conditional probability measure *π*_{t}(d*x*) induced on R defined by . In particular, when *X* is a continuous random variable with a density function *p*(*x*), one can write *π*_{t}(d*x*)=*p*_{t}(*x*)d*x*, where *p*_{t}(*x*) is the conditional density function.

### Proposition 3.3

*Let* {*ξ*_{t}} *be a Lévy information process under* *with noise type* *ψ*_{0}(*α*), *and let the a priori distribution of the associated message* *X* *be* *π*(*dx*). *Then the* -*conditional a posteriori distribution of* *X* *is*
3.11

It is straightforward to establish by use of a variational argument that for any function *f*:R→R such that the random variable *Y* =*f*(*X*) is square integrable, the best estimate for *Y* conditional on the information is given by
3.12By the ‘best estimate’ for *Y*, we mean the -measurable random variable that minimizes the quadratic error .

It will be observed that at any given time *t* the best estimate can be expressed as a function of *ξ*_{t} and *t*, and does not involve values of the information process at times earlier than *t*. That this should be the case can be seen as a consequence of the following:

### Proposition 3.4

*The Lévy information process* {*ξ*_{t}} *has the Markov property*.

### Proof.

For the Markov property, it suffices to establish that for *a*∈R we have
3.13where and . We write
3.14where is defined as in equation (3.1). It follows that
3.15since {*ξ*_{t}} has the Markov property under the transformed measure .

We note that since *X* is -measurable, which follows from proposition 2.2, the Markov property implies that if *Y* =*f*(*X*) is integrable, we have
3.16This identity allows one to work out the optimal filter for a Lévy information process by direct use of the Bayes formula. It should be apparent that simulation of the dynamics of the filter is readily approachable on account of this property.

We remark briefly on what might appropriately be called a ‘time consistency’ property satisfied by Lévy information processes. It follows from (3.11) that, given the conditional distribution *π*_{s}(d*x*) at time *s*≤*t*, we can express *π*_{t}(d*x*) in the form
3.17Then, if for fixed *s*≥0, we introduce a new time variable *u*:=*t*−*s*, and define *η*_{u}=*ξ*_{u+s}−*ξ*_{s}, we find that {*η*_{u}}_{u≥0} is an information process with fiducial exponent *ψ*_{0}(*α*) and message *X* with *a priori* distribution *π*_{s}(d*x*). Thus given up-to-date information, we can ‘re-start’ the information process at that time to produce a new information process of the same type, with an adjusted message distribution.

Further insight into the nature of Lévy information can be gained by examination of expression (2.9) for the conditional exponent of an information process. In particular, as a consequence of the Lévy–Khintchine representation (2.3) we are able to deduce that
3.18for *α*∈C^{I}, which leads to the following:

### Proposition 3.5

*The randomization of the* -*Lévy process* {*ξ*_{t}} *achieved through the change of measure generated by the randomized Esscher martingale* *induces two effects on the characteristics of the process: (i) a random shift in the drift term, given by*
3.19*and (ii) a random rescaling of the Lévy measure, given by* *ν*(d*z*)→e^{Xz}*ν*(d*z*).

The integral appearing in the shift in the drift term is well defined because the term *z*(e^{Xz}−1) vanishes to second order at the origin. It follows from proposition 3.5 that in sampling an information process an agent is in effect trying to detect a random shift in the drift term, and a random ‘tilt’ and change of scale in the Lévy measure, altering the overall rate as well as the relative rates at which jumps of various sizes occur. It is from these data, within which the message is encoded, that the agent attempts to estimate the value of *X*. It is interesting to note that randomized Esscher martingales arise in the construction of pricing kernels in the theory of finance [32,33].

We turn to examine the properties of certain martingales associated with Lévy information. We establish the existence of a so-called innovations representation for Lévy information. In the case of the Brownian filter, the ideas involved are rather well understood [9], and the matter has also been investigated in the case of Poisson information [34]. These examples arise as special cases in the general theory of Lévy information. Throughout the discussion that follows, we fix a probability space .

### Proposition 3.6

*Let* {*ξ*_{t}} *be a Lévy information process with fiducial exponent* *ψ*_{0}(*α*) *and message* *X*, *let* *denote the filtration generated by* {*ξ*_{t}}, *let* *Y* =*ψ*′_{0}(*X*), *where* *ψ*_{0}′(*α*) *is the marginal fiducial exponent, and set* . *Then the process* {*M*_{t}} *defined by*
3.20*is an* -*martingale*.

### Proof.

We recall that {*ξ*_{t}} is by definition -conditionally -Lévy. It follows therefore from (2.11) that , where *Y* =*ψ*′_{0}(*X*). As before, we let denote the filtration generated jointly by {*ξ*_{t}} and *X*. First, we observe that the process defined for *t*≥0 by *m*_{t}=*ξ*_{t}−*Y* *t* is a -martingale. This assertion can be checked by consideration of the one-parameter family of -martingales defined by
3.21for *ϵ*∈C^{I}. Expanding this expression to first order in *ϵ*, we deduce that the process defined for *t*≥0 by is a -martingale. Thus, we have
3.22Then using to make a change of measure from to we obtain
3.23and the result follows if we set *Y* =*ψ*_{0}′(*X*). Next, we introduce the ‘projected’ process defined by . We note that since {*m*_{t}} is a -martingale we have
3.24and thus is an -martingale. Finally, we observe that
3.25where we have made use of the fact that the final term is -measurable. The fact that and are both -martingales implies that
3.26from which it follows that , which is what we set out to prove.

Although the general information process does not admit an additive decomposition into signal and noise, it does admit a linear decomposition into terms representing (i) information already received and (ii) new information. The random variable *Y* entering via its conditional expectation into the first of these terms is itself in general a nonlinear function of the message variable *X*. It follows on account of the convexity of the fiducial exponent that the marginal fiducial exponent is invertible, which ensures that *X* can be expressed in terms of *Y* by the relation *X*=*I*_{0}(*Y*), which is linear if and only if the information process is Brownian. Thus, signal and noise are deeply intertwined in the case of general Lévy information. Vestiges of linearity remain, and these suffice to provide an overall element of tractability.

## 4. Examples of Lévy information processes

In a number of situations one can construct explicit examples of information processes, categorized by noise type. The Brownian and Poisson constructions, which are familiar in other contexts, can be seen as belonging to a unified scheme that brings out their differences and similarities. We then proceed to construct information processes of the gamma, the variance gamma, the negative binomial, the inverse Gaussian, and the normal inverse Gaussian type. It is interesting to take note of the diverse nature of noise, and to observe the many different ways in which messages can be conveyed in a noisy environment.

### Example 1 Brownian informationBrownian information

On a probability space , let {*B*_{t}} be a Brownian motion, let *X* be an independent random variable, and set
4.1The random process {*ξ*_{t}} thereby defined, which we call the Brownian information process, is -conditionally *K*_{X}-Lévy, with conditional characteristic *K*_{X}=(*X*,1,0) and conditional exponent *ψ*_{X}(*α*)=*Xα*+1 2*α*^{2}. The fiducial characteristic is *K*_{0} = (0,1,0), the fiducial exponent is *ψ*_{0}(*α*)=1 2*α*^{2}, and the associated fiducial process or ‘noise type’ is standard Brownian motion. In the case of Brownian information, there is a linear separation of the process into signal and noise. This model, considered by Wonham [35], is perhaps the simplest continuous-time generalization of the example described by Wiener [1]. The message is given by the value of *X*, but *X* can only be observed indirectly, through {*ξ*_{t}}. The observations of *X* are obscured by the noise represented by the Brownian motion {*B*_{t}}. Because the signal term grows linearly in time, whereas , it is intuitively plausible that observations of {*ξ*_{t}} will asymptotically reveal the value of *X*, and a direct calculation using properties of the normal distribution function confirms that *t*^{−1}*ξ*_{t} converges in probability to *X*; this is consistent with proposition 2.2 if we note that *ψ*′_{0}(*α*)=*α* and *I*_{0}(*y*)=*y* in the Brownian case.

The best estimate for *X* conditional on can be derived by use of the generalized Bayes formula (3.8). In the Brownian case, there is an elementary method leading to the same result, worth mentioning briefly because it is of interest. First, we present an alternative proof of proposition 3.4 in the Brownian case that uses a Brownian bridge argument.

We recall that if *s*>*s*_{1}>0, then *B*_{s} and are independent. More generally, we observe that if *s*>*s*_{1}>*s*_{2}, then *B*_{s}, , and are independent, and that . Extending this line of reasoning, we see that for any *a*∈R we have
4.2since *ξ*_{t} and *ξ*_{s} are independent of , and that gives us the Markov property (3.13). Since we have established that *X* is -measurable, it follows that (3.16) holds. As a consequence, the *a posteriori* distribution of *X* can be worked out by use of the standard Bayes formula, and for the best estimate of *X*, we obtain
4.3

The innovations representation (3.20) in the case of a Brownian information process can be derived by the following argument. We observe that the -martingale {*Φ*_{t}} defined in (3.14) is a ‘space–time’ function of the form
4.4By use of the Ito calculus together with (4.3), we deduce that , and thus by integration we obtain
4.5Since {*ξ*_{t}} is an -Brownian motion, it follows from (4.5) by the Girsanov theorem that the process {*M*_{t}} defined by
4.6is an -Brownian motion, which we call the innovations process (see [36]). The increments of {*M*_{t}} represent the arrival of new information.

We conclude our discussion of Brownian information with the following remarks. In problems involving prediction and valuation, it is not uncommon that the message is revealed after the passage of a finite amount of time. This is often the case in applications to finance, where the message takes the form of a random cash flow at some future date, or, more generally, a random factor that affects such a cash flow. There are also numerous examples coming from the physical sciences, economics, and operations research, where the goal of an agent is to form a view concerning the outcome of a future event by monitoring the flow of information relating to it. How does one handle problems involving the revelation of information over finite time horizons?

One way of modelling finite time horizon scenarios in the present context is by use of a time change. If {*ξ*_{t}} is a Lévy information process with message *X* and a specified fiducial exponent, then a generalization of proposition 2.2 shows that the process {*ξ*_{tT}} defined over the time interval 0≤*t*<*T* by
4.7reveals the value of *X* in the limit as , and one can check that
4.8In the case where {*ξ*_{t}} is a Brownian information process represented as above in the form *ξ*_{t}=*Xt*+*B*_{t}, the time-changed process (4.7) takes the form *ξ*_{tT}=*Xt*+*β*_{tT}, where {*β*_{tT}} is a Brownian bridge over the interval [0,*T*]. Such processes have had applications in physics [37–40] and in finance [41–45]. It seems reasonable to conjecture that time-changed Lévy information processes of the more general type proposed above may be similarly applicable.

### Example 2 Poisson informationPoisson information

Consider a situation in which an agent observes a series of events taking place at a random rate, and the agent wishes to determine the rate as best as possible because its value conveys an important piece of information. One can model the information flow in this situation by a modulated Poisson process for which the jump rate is an independent random variable. Such a scenario arises in many real-world situations, and has been investigated in the literature [34,46–49]. The resulting scheme can be seen to emerge naturally as an example of our general model for Lévy information.

As in the Brownian case, one can construct the relevant information process directly. On a probability space , let {*N*(*t*)}_{t≥0} be a standard Poisson process with jump rate *m*>0, let *X* be an independent random variable, and set
4.9Thus, {*ξ*_{t}} is a time-changed Poisson process, and the effect of the signal is to randomly modulate the rate at which the process jumps. It is evident that {*ξ*_{t}} is -conditionally Lévy and satisfies the conditions of definition 2.1. In particular,
4.10and for fixed *X* one obtains a Poisson process with rate *m*e^{X}. It follows that (4.9) is an information process. The fiducial characteristic is given by *K*_{0} = (0,0,*mδ*_{1}(d*z*)), that of a Poisson process with unit jumps at the rate *m*, where *δ*_{1}(d*z*) is the Dirac measure with unit mass at *z*=1, and the fiducial exponent is *ψ*_{0}(*α*)=*m*(e^{α}−1). A calculation using (2.9) shows that *K*_{X}=(0,0,*m*e^{X}*δ*_{1}(d*z*)), and that *ψ*_{X}(*α*)=*m*e^{X}(e^{α}−1). The relation between signal and noise in the case of Poisson information is rather subtle. The noise is associated with the random fluctuations of the inter-arrival times of the jumps, whereas the message determines the average rate at which the jumps occur.

It will be instructive in this example to work out the conditional distribution of *X* by elementary methods. Since *X* is -measurable and {*ξ*_{t}} has the Markov property, we have
4.11for *y*∈R. It follows then from the Bayes law for an information process taking values in N_{0} that
4.12In the case of Poisson information, the relevant conditional distribution is
4.13After some cancellation, we deduce that
4.14and hence
4.15and thus
4.16which we can see is consistent with (3.11) if we recall that in the case of noise of the Poisson type the fiducial exponent is given by *ψ*_{0}(*α*)=*m*(e^{α}−1).

If a Geiger counter is monitored continuously in time, the sound that it produces provides a nice example of a Poisson information process. The crucial message (proximity to radioactivity) carried by the noisy sputter of the instrument is represented by the rate at which the clicks occur.

### Example 3 Gamma informationGamma information

It will be convenient first to recall a few definitions and conventions [50–52]. Let *m* and *κ* be positive numbers. By a gamma process with rate *m* and scale *κ* on a probability space we mean a Lévy process {*γ*_{t}}_{t≥0} with exponent
4.17for *α*∈*A*_{C} ={*w*∈C:Re *w*<*κ*^{−1}}. The probability density for *γ*_{t} is
4.18where *Γ*[*a*] is the gamma function. A short calculation making use of the functional equation *Γ*[*a*+1]=*aΓ*[*a*] shows that and . Clearly, the mean and variance determine the rate and scale. If *κ*=1, we say that {*γ*_{t}} is a *standard* gamma process with rate *m*. If *κ*≠1, we say that {*γ*_{t}} is a scaled gamma process. The Lévy measure associated with the gamma process is
4.19It follows that and hence that the gamma process has infinite activity. Now let {*ξ*_{t}} be a standard gamma process with rate *m* on a probability space , and let *λ*∈R satisfy *λ*<1. Then the process defined by
4.20is an -martingale. If we let act as a change of measure density for the transformation , then we find that {*γ*_{t}} is a *scaled* gamma process under , with rate *m* and scale 1/(1−*λ*). Thus we see that the effect of an Esscher transformation on a gamma process is to alter its scale. With these facts in mind, one can establish the following:

### Proposition 4.1

*Let* {*γ*_{t}} *be a standard gamma process with rate* *m* *on a probability space* *and let the independent random variable* *X* *satisfy* *X*<1 *almost surely. Then the process* {*ξ*_{t}} *defined by*
4.21*is a Lévy information process with message* *X* *and gamma noise, with fiducial exponent* *for* *α*∈{*w*∈C:Re *w*<1}.

### Proof.

It is evident that {*ξ*_{t}} is -conditionally a scaled gamma process. As a consequence of (4.17), we have
4.22for *α*∈C^{I}. Then we note that
4.23It follows that the -conditional exponent of {*ξ*_{t}} is *ψ*_{0}(*X*+*α*)−*ψ*_{0}(*X*). □

The gamma filter arises as follows. An agent observes a process of accumulation. Typically, there are many small increments, but now and then there are large increments. The unknown factor *X* appearing in the overall rate *m*/(1−*X*) at which the process is growing is the figure that the agent wishes to estimate as accurately as possible. The accumulation can be modelled by gamma information, and the associated filter can be used to estimate *X*. It has long been recognized that the gamma process is useful in describing phenomena such as the water level of a dam or the totality of the claims made in a large portfolio of insurance contracts [53–55]. Use of the gamma information process and related bridge processes, with applications in finance and insurance, is pursued in Brody *et al.* [51], Hoyle [56] and Hoyle *et al.* [57]. We draw the reader's attention to Yor [50] and references cited therein, where it is shown how certain additive properties of Brownian motion have multiplicative analogues in the case of the gamma process. One notes in particular the remarkable property that *γ*_{t} and *γ*_{s}/*γ*_{t} are independent for *t*≥*s*≥0. Making use of this relation, it will be instructive if we present an alternative derivation of the optimal filter for gamma noise. We begin by establishing that the process defined by (4.21) has the Markov property. We observe first that for any times *t*≥*s*≥*s*_{1}≥*s*_{2}≥⋯≥*s*_{k} the variables *γ*_{s1}/*γ*_{s},*γ*_{s2}/*γ*_{s1}, and so on, are independent of one another and are independent of *γ*_{s} and *γ*_{t}. It follows that
4.24since {*γ*_{t}} and *X* are independent, and this gives us (3.13). In working out the distribution of *X* given it suffices therefore to work out the distribution of *X* given *ξ*_{t}. We note that the Bayes formula implies that
4.25where *π*(d*x*) is the unconditional distribution of *X*, and *ρ*(*ξ*|*X*=*x*) is the conditional density for the random variable *ξ*_{t}, which can be calculated as follows:
4.26It follows that the optimal filter in the case of gamma noise is given by
4.27We conclude with the following observation. In the case of Brownian information, it is well known (and implicit in Wiener's example [1]) that if the signal is Gaussian, then the optimal filter is a linear function of the observation *ξ*_{t}. One might therefore ask in the case of a gamma information process if some special choice of the signal distribution gives rise to a linear filter. The answer is affirmative. Let *U* be a gamma-distributed random variable with the distribution
4.28where *r*>1 and *θ*>0 are parameters, and set *X*=1−*U*. Let {*ξ*_{t}} be a gamma information process carrying message *X*, let *Y* =*ψ*_{0}′(*X*)=*m*/(1−*X*), and set *τ*=(*r*−1)/*m*. Then the optimal filter for *Y* is given by
4.29

### Example 4 Variance-gamma informationVariance-gamma information

The so-called variance-gamma or VG process [58–60] was introduced in the theory of finance. The relevant definitions and conventions are as follows. By a VG process with drift *μ*∈R, volatility *σ*≥0, and rate *m*>0, we mean a Lévy process with exponent
4.30The VG process admits representations in terms of simpler Lévy processes. Let {*γ*_{t}} be a standard gamma process on , with rate *m*, as defined in the previous example, and let {*B*_{t}} be a standard Brownian motion, independent of {*γ*_{t}}. We call the scaled process {*Γ*_{t}} defined by *Γ*_{t}=*m*^{−1}*γ*_{t} a standard gamma subordinator with rate *m*. Note that *Γ*_{t} has dimensions of time and that . A calculation shows that the Lévy process {*V* _{t}} defined by
4.31has the exponent (4.30). The VG process thus takes the form of a Brownian motion with drift, time-changed by a gamma subordinator. If *μ*=0 and *σ*=1, we say that {*V* _{t}} is a ‘standard’ VG process, with rate parameter *m*. If *μ*≠0, we say that {*V* _{t}} is a ‘drifted’ VG process. One can always choose units of time such that *m*=1, but for applications it is better to choose conventional units of time (seconds for physics, years for economics), and treat *m* as a model parameter. In the limit we obtain a gamma process with rate *m* and scale *μ*/*m*. In the limit we obtain a Brownian motion with drift *μ* and volatility *σ*.

An alternative representation of the VG process results if we let and be independent standard gamma processes on , with rate *m*, and set
4.32where *κ*_{1} and *κ*_{2} are nonnegative constants. A calculation shows that the exponent is of the form (4.30). In particular, we have
4.33where *μ*=*m*(*κ*_{1}−*κ*_{2}) and *σ*^{2}=2*mκ*_{1}*κ*_{2}, or equivalently
4.34where *α*∈{*w*∈C:−1/*κ*_{2}<Re *w*<1/*κ*_{1}}. Now let {*ξ*_{t}} be a standard VG process on , with exponent for . Under the transformed measure defined by the change-of-measure martingale (2.6), one finds that {*ξ*_{t}} is a drifted VG process, with
4.35for . Thus, in the case of the VG process, an Esscher transformation affects both the drift and the volatility. Note that for large *m* the effect on the volatility is insignificant, whereas the effect on the drift reduces to that of an ordinary Girsanov transformation.

With these facts in hand, we are now in a position to construct the VG information process. We fix a probability space and a number *m*>0.

### Proposition 4.2

*Let* {*Γ*_{t}} *be a standard gamma subordinator with rate* *m*, *let* {*B*_{t}} *be an independent Brownian motion, and let the independent random* *variable* *X* *satisfy* *almost surely. Then the process* {*ξ*_{t}} *defined by*
4.36*is a Lévy information process with message* *X* *and VG noise, with fiducial exponent*
4.37for .

### Proof.

Observe that {*ξ*_{t}} is -conditionally a drifted VG process of the form
4.38where the drift and volatility coefficients are
4.39The -conditional -exponent of {*ξ*_{t}} is by (4.30) thus given for *α*∈C^{I} by
4.40which is evidently by (4.37) of the form *ψ*_{0}(*X*+*α*)−*ψ*_{0}(*X*), as required. □

An alternative representation for the VG information process can be established by the same method if one randomly rescales the gamma subordinator appearing in the time-changed Brownian motion. The result is as follows.

### Proposition 4.3

*Let* {*Γ*_{t}} *be a gamma subordinator with rate* *m*, *let* {*B*_{t}} *be an independent standard Brownian motion and let the independent random variable* *X* *satisfy* *almost surely. Write* *for the subordinator:*
4.41*Then the process* {*ξ*_{t}} *defined by* *is a VG information process with message* *X*.

A further representation of the VG information process arises as a consequence of the representation of the VG process as the asymmetric difference between two independent standard gamma processes. In particular, we have:

### Proposition 4.4

*Let* *and* *be independent standard gamma processes, each with rate* *m*, *and let the independent random variable* *X* *satisfy* *almost surely*. *Then the process* {*ξ*_{t}} *defined by*
4.42*is a VG information process with message* *X*.

### Example 5 Negative-binomial informationNegative-binomial information

By a negative binomial process with rate parameter *m* and probability parameter *q*, where *m*>0 and 0<*q*<1, we mean a Lévy process with exponent
4.43for . There are two representations for the negative binomial process [61,52]. The first of these is a compound Poisson process for which the jump size *J*∈N has a logarithmic distribution
4.44and the intensity of the Poisson process determining the timing of the jumps is given by . One finds that the characteristic function of *J* is
4.45for . Then if we set
4.46where {*N*_{t}} is a Poisson process with rate *λ*, and {*J*_{k}}_{k∈N} denotes a collection of independent identical copies of *J*, representing the jumps, one deduces that
4.47and that the resulting exponent is given by (4.43). The second representation of the negative binomial process makes use of the method of subordination. We take a Poisson process with rate *Λ*=*mq*/(1−*q*), and time-change it using a gamma subordinator {*Γ*_{t}} with rate parameter *m*. The moment-generating function thus obtained, in agreement with (4.43), is
4.48With these results in mind, we fix a probability space and find:

### Proposition 4.5

*Let* {*Γ*_{t}} *be a gamma subordinator with rate* *m*, *let* {*N*_{t}} *be an independent Poisson process with rate* *m*, *let the independent random variable* *X* *satisfy* *almost surely, and set*
4.49*Then the process* {*ξ*_{t}} *defined by*
4.50*is a Lévy information process with message* *X* *and negative binomial noise, with fiducial exponent* (4.43).

### Proof.

This can be verified by direct calculation. For *α*∈C^{I} we have
4.51which by (4.43) shows that the conditional exponent is *ψ*_{0}(*X*+*α*)−*ψ*_{0}(*X*). □

There is also a representation for negative binomial information based on the compound Poisson process. This can be obtained by an application of proposition 3.5, which shows how the Lévy measure transforms under a random Esscher transformation. In the case of a negative binomial process with parameters *m* and *q*, the Lévy measure is given by
4.52where *δ*_{n}(d*z*) denotes the Dirac measure with unit mass at the point *z*=*n*. The Lévy measure is finite in this case, and we have , which is the overall rate at which the compound Poisson process jumps. If one normalizes the Lévy measure with the overall jump rate, one obtains the probability measure (4.44) for the jump size. With these facts in mind, we fix a probability space and specify the constants *m* and *q*, where *m*>1 and 0<*q*<1. Then as a consequence of proposition 3.5 we have the following:

### Proposition 4.6

*Let the random variable* *X* *satisfy* *almost surely, let the random variable* *J*^{X} *have the conditional distribution*
4.53*let* *be a collection of conditionally independent identical copies of* *J*^{X}, *and let* {*N*_{t}} *be an independent Poisson process with rate* *m*. *Then the process* {*ξ*_{t}} *defined by*
4.54*is a Lévy information process with message* *X* *and negative binomial noise, with fiducial exponent* (4.43).

### Example 6 Inverse Gaussian informationInverse Gaussian information

The inverse Gaussian (IG) distribution appears in the study of the first exit time of Brownian motion with drift [62]. The name ‘inverse Gaussian’ was introduced by Tweedie [63], and a Lévy process whose increments have the IG distribution was introduced by Wasan [64]. By an IG process with parameters *a*>0 and *b*>0, we mean a Lévy process with exponent
4.55for . Let us write {*G*_{t}} for the IG process. The probability density function for *G*_{t} is
4.56and we find that and that . It is straightforward to check that under the Esscher transformation induced by (2.6), where 0<*λ*<1 2*b*^{2}, the parameter *a* is left unchanged, whereas . With these facts in mind, we are in a position to introduce the associated information process. We fix a probability space and find the following:

### Proposition 4.7

*Let* *G*(*t*) *be an inverse Gaussian process with parameters* *a* *and* *b*, *let* *X* *be an independent random variable satisfying* 0<*X*<1 2*b*^{2} *almost surely, and set* *Z*=*b*^{−1}(*b*^{2}−2*X*)^{1/2}. *Then the process* {*ξ*_{t}} *defined by*
4.57*is a Lévy information process with message* *X* *and inverse Gaussian noise, with fiducial exponent* (4.55).

### Proof.

It should be evident by inspection that {*ξ*_{t}} is -conditionally Lévy. Let us therefore work out the conditional exponent. For *α*∈C^{I} we have:
4.58which shows that the conditional exponent is of the form *ψ*_{0}(*α*+*X*)−*ψ*_{0}(*X*). □

### Example 7 Normal inverse Gaussian informationNormal inverse Gaussian information

By a normal inverse Gaussian (NIG) process [65,66] with parameters *a*, *b* and *m*, such that *a*>0, |*b*|<*a* and *m*>0, we mean a Lévy process with an exponent of the form
4.59for *α*∈{*w*∈C:−*a*−*b*<Re *w*<*a*−*b*}. Let us write {*I*_{t}} for the NIG process. The probability density for its value at time *t* is given by
4.60where *K*_{ν} is the modified Bessel function of third kind [67]. The NIG process can be represented as a Brownian motion subordinated by an IG process. In particular, let {*B*_{t}} be a standard Brownian motion, let {*G*_{t}} be an independent IG process with parameters *a*′ and *b*′, and set *a*′=1 and *b*′=*m*(*a*^{2}−*b*^{2})^{1/2}. Then the characteristic function of the process {*I*_{t}} defined by
4.61is given by (4.59). The associated information process is constructed as follows. We fix a probability space and the parameters *a*, *b* and *m*.

### Proposition 4.8

*Let the random variable* *X* *satisfy* −*a*−*b*<*X*<*a*−*b* *almost surely, let* *be* -*conditionally IG, with parameters* *a*′=1 *and* *b*′=*m*(*a*^{2}−(*b*+*X*)^{2})^{1/2}, *and let* . *Then the process* {*ξ*_{t}} *defined by*
4.62*is a Lévy information process with message* *X* *and NIG noise, with fiducial exponent* (4.59).

### Proof.

We observe that the condition on is that
4.63for *α*∈C^{I}. Thus, if we set for *α*∈C^{I} it follows that
4.64which shows that the conditional exponent is of the required form.

Similar arguments lead to the construction of information processes based on various other Lévy processes related to the IG distribution, including for example the generalized hyperbolic process [68], for which the information process can be shown to take the form
4.65Here, the random variable *X* is taken to be -independent of the standard Brownian motion {*B*(*t*)}, and is -conditionally a generalized inverse Gaussian process with parameters (*δ*,(*a*^{2}−(*b*+*X*)^{2})^{1/2},*ν*). It would be of interest to determine whether models can be found for information processes based on the Meixner process [30] and the CGMY process [69,70].

We conclude this study of Lévy information with the following remarks. Recent developments in the phenomenological representation of physical [38] and economic [42] time series have highlighted the idea that signal-processing techniques may have far-reaching applications to the identification, characterization and categorization of phenomena, both in the natural and in the social sciences, and that beyond the conventional remits of *prediction*, *filtering*, and *smoothing* there is a fourth and important new domain of applicability: the *description* of phenomena in science and in society. It is our hope therefore that the theory of signal processing with Lévy information herein outlined will find a variety of interesting and exciting applications.

## Acknowledgements

The research reported in this paper has been supported in part by Shell Treasury Centre Limited, London, and by the Fields Institute, University of Toronto. The authors are grateful to N. Bingham, M. Davis, E. Hoyle, M. Grasselli, T. Hurd, S. Jaimungal, E. Mackie, A. Macrina, P. Parbhoo and M. Pistorius for helpful comments and discussions.

- Received July 19, 2012.
- Accepted September 20, 2012.

- © 2012 The Author(s) Published by the Royal Society. All rights reserved.