## Abstract

In this paper, we introduce a flexible approach for the time-frequency analysis of multicomponent signals involving the use of analytic vectors and demodulation. The demodulated analytic signal is projected onto the time-frequency plane so that, as closely as possible, each component contributes exclusively to a different ‘tile’ in a wavelet packet tiling of the time-frequency plane, and at each time instant, the contribution to each tile definitely comes from no more than one component. A single reverse demodulation is then applied to all projected components. The resulting instantaneous frequency of each component in each tile is not constrained to a set polynomial form in time, and is readily calculated, as is the corresponding Hilbert energy spectrum. Two examples illustrate the method. In order better to understand the effect of additive noise, the approximate variance of the estimated instantaneous frequency in any tile has been formulated by starting with pure noise and studying its evolving covariance structure through each step of the algorithm. The validity and practical utility of the resulting expression for the variance of the estimated instantaneous frequency is demonstrated via a simulation experiment.

## 1. Introduction

The analysis of signals with time-evolving oscillatory structure is an area of active interest. Basic representations of such signals are bilinear distributions (e.g. Cohen 1995), which are calculated by filtering the signal and its complex conjugate with a kernel function. This generally entails problems for signals with multiple components, unless the kernel decays very rapidly in time-frequency, thus avoiding damaging interference effects. Our aim in this work is to introduce a simple approach to the problem of tracking the time-dependent frequency content of each component of a multicomponent signal via the undecimated discrete wavelet packet transform. It is well known that wavelet transforms ‘tile’ the time-frequency plane in a proportional bandwidth or octave band manner. The extension to wavelet packets allows more irregular, but still rectangular, partitioning of the time-frequency plane. Such rectangular tilings of the time-frequency plane are not best suited to representing some classes of signals (such as FM signals), which may have energy distributions which are slanted, nonlinear, etc., in the time-frequency plane.

Non-rectangular tilings can be designed. Baraniuk & Jones (1993) designed scale-shear fan bases corresponding to tiling with truncated wedges, well matched to representing linear chirp signals with few expansion coefficients. Chirplet transforms (Mann & Haykin 1992, 1995; Baraniuk & Jones 1995, 1996) provide a systematic framework for designing new signal representations, and can be extended to deal with not just slant but also curvature in the time-frequency plane. None the less, a general approach to discrete chirplet transforms is not forthcoming. Gaussian chirplets do not form an orthogonal basis, and adaptive matching pursuit Gaussian chirplet decompositions have thus been advocated (Yin *et al.* 2002). Philosophically related is research by Hlawatsch *et al*. (1999), Papandreou-Suppappola *et al*. (2001) and Papandreou-Suppappola & Suppappola (2002), which explore time and frequency warping.

The fractional Fourier transform (FrFT), probably more widely used in optical research, is also closely related to wavelet and chirp transforms (Ozaktas *et al.* 1994). The Fourier transform of fractional order *a* is defined in such a way that the standard Fourier transform is a special case with order *a*=1, while successively transforming with fractional orders *a*_{1} and *a*_{2} is the same as transforming with order *a*_{1}+*a*_{2} (so that the standard Fourier transform corresponds to two successive half-transforms). Ozaktas *et al*. (1994) pointed out that the effect of the FrFT is to rotate the Wigner distribution of a signal, and showed, that consequently, linear chirp functions are simply the time domain representation of signals that appear as delta functions or harmonics in other fractional domains. Similarly, Capus *et al*. (2000) noted that the FrFT is composed of a multiplication by a linear chirp in the time domain, followed by Fourier transformation, then by multiplication by a linear chirp in the transformed domain, and, finally, by a complex scaling. Mendlovic *et al*. (1997), see also Huang & Suter (1996), suggested a ‘fractional wavelet transform’ algorithm, which consists of an FrFT of order *a* followed by continuous wavelet transformation; for reconstruction, use the inverse wavelet transform, followed by an FrFT of order −*a*. If the signal was dominated by a linear chirp, then the value of *a* should match the slant in the time-frequency plane. In general, the order *a* is chosen by optimization, and it is suggested that the high computational burden can be reduced by use of optical processing.

Other recent work includes that by Stevenson *et al*. (2001) and Capus & Brown (2003). Our approach can be viewed as a Fourier/wavelet packet transform hybrid. Instead of using undecimated wavelet packet coefficient sequences, it uses undecimated wavelet packet detail sequences, each of which can be viewed as arising by Fourier transforming the signal, multiplying it by the modulus squared of the Fourier transform of a band-pass filter, followed by inverse Fourier transformation. Our approach to generalizing this approach is to replace the standard Fourier transformation by the generalized Fourier transformation (GFT) of Detka & El-Jaroudi (1996). For a signal *x*(*t*), the GFT is given by , where *s*_{0}(*t*) is a real-valued function depending on time only, and specifies the evolutionary phase behaviour of the signal. Note this is the same as applying the standard Fourier transform to . With the inverse GFT, we apply the usual inverse Fourier transform to *X*_{G}(*f*), followed by multiplication of the result by because(1.1)Hence, if *X*_{G}(*f*)≡δ(*f*−*f*_{0}), then , that is, a signal with instantaneous frequency *ν*(*t*)=*f*_{0}+*s*′_{0}(*t*) will be mapped to the point *f*=*f*_{0}, where . Therefore, if we wish to map a signal with, for example, a curved path in the time-frequency plane specified by *f*_{0}+*s*′_{0}(*t*) into a wavelet packet passband (rather than a single point), we simply need to specify a function *s*(*t*) *approximating* *s*_{0}(*t*) such that the demodulated function will be mapped into the passband. Of course, this result is very flexible in that *s*′_{0}(*t*) is not constrained to be linear or even quadratic.

Many audio-, acoustic- and speech-type signals can be characterized as multicomponent signals, with time-frequency distribution consisting of several parallel (or similarly oriented) linear or curved energy paths with low instantaneous bandwidth. Examples abound: Flandrin (1988) examined bat sonar, Cohen (1989, 1992) considers seismic signals and whale sounds, Sucic & Boashash (2002) look at bird song, and Gribonval & Bacry (2003) consider musical recordings. Our approach is particularly well suited to (and most easily implemented for) such signals because with the demodulation, the components should generally be shiftable into different wavelet packet frequency bands.

In order to track the time-dependent frequency content of each component of a multicomponent signal, we shall first demodulate the complete signal, then project it onto the time-frequency plane in a manner such that (i) as closely as possible, each component contributes exclusively to a different ‘tile’ in the chosen (wavelet packet) tiling of the time-frequency plane, and (ii) at each time instant, the contribution to each tile definitely comes from no more than one component. The projected components are ‘analytic’ vectors. The single reverse demodulation is then applied to all projected components. The instantaneous frequency of each component in each tile is then well defined, and can be calculated and weighted by the energy to yield the ‘Hilbert energy spectrum’ for that projection. Agglomeration over projections yields the complete Hilbert spectrum (Huang *et al.* 1998; Olhede & Walden 2004*a*).

## 2. The generalized demodulation algorithm

We give details of our algorithm using the more practically useful discrete-time notation. We assume we have sampled a continuous-time (multicomponent) signal *x*(*t*) with a sample interval of unity to get an even-length vector of observations * x*=[

*x*

_{0},…,

*x*

_{N−1}]

^{T}and have done likewise to

*s*(

*t*) to get

*=[*

**s***s*

_{0},…,

*s*

_{N−1}]

^{T}. For a sample interval of unity, the Nyquist frequency is

*f*=1/2.

### (a) Analytic vectors and forward demodulation

We firstly create an ‘analytic’ vector * y*=

*+i*

**x***, where is the discrete Hilbert transform (DHT) matrix (Marple 1999), and then apply the forward demodulation to obtain ,*

**x***l*=0,…,

*N*−1. We can write this in matrix form as

*=(−i)*

**d***, where , and , where*

**y***δ*

_{lm}is the Kronecker delta, and

*m*=0,…,

*N*−1.

The discrete Fourier transform of an analytic vector has zero coefficients for negative frequencies. Hence, by working with the analytic vector, any energy in * x* in the negative frequencies is not inadvertently moved into the positive frequencies by the demodulation. The new complex-valued vector

*is not analytic, and so we now create a new analytic vector*

**d***=*

**z***+i*

**d***.*

**d**We assume * x* to be band-limited to for discrete computation of the analytic vector

*. The effect of the demodulation is to change frequencies by −*

**y***s*′

_{l}, where

*s*′

_{l}denotes the discrete version of

*s*′(

*t*). When the new analytic vector

*is created, we must assume that*

**z***is band-limited to , or, equivalently, that*

**d****is band-limited to , for**

*y**l*=0,…,

*N*−1. This is the same as saying that the demodulation must not move signal component energy out of the frequency range and this is simple to check graphically, as shown by the examples of §3.

### (b) Wavelet packet projections

The particular wavelet-type transform we will apply to * z* is the so-called maximal overlap discrete wavelet packet transform (MODWPT; Walden & Contreras Cristán 1998). This is an undecimated transform. Each MODWPT is associated with a transform level

*j*, (

*j*=1,…,

*J*

_{0}), and the

*j*th level decomposes the frequency interval [0,1/2] into 2

^{j}equal passband intervals , where

*n*, (

*n*=0,…,2

^{j}−1), denotes the frequency band index within level

*j*. For any

*j*, the partitions (tile) the time-frequency plane into equal width rectangles. Of course, a larger value of

*j*corresponds to a narrower passband, and our interest will be in the final level

*J*

_{0}MODWPT, which decomposes the frequency interval [0,1/2] into equal passband intervals; the choice of

*J*

_{0}is considered in §2

*f*.

For the MODWPT coefficients indexed by (*j*, *n*), we write , where is an *N*-length column vector, and is an *N*×*N* transformation matrix. The wavelet packet coefficients for *t*=0,…,*N*−1 can be written as , where is a *j* th level and *n*th band, MODWPT wavelet packet filter of length , derived from the basic DWT scaling and wavelet filters {*g*_{l}, *l*=0,…,*L*−1} and {*h*_{l}, *l*=0,…,*L*−1}, as described in Percival & Walden (2000, §6.6). The tilde notation is used to distinguish the MODWPT filters and coefficients from the decimated (DWPT) equivalents, as in Percival & Walden (2000).

Let be the filter obtained by periodizing to length *N*; that is, . Then, an equivalent representation is(2.1)

We will actually make use of sequences with *j*=*J*_{0}, *n*=1,…,2^{j}−1. For *n*>0, the sequence {, *l*=0,…,*N*−1} consists of level *j*, frequency band *n*, MODWPT *detail* coefficients. (The sequence {, *l*=0,…, *N*−1} involves only low-pass filters and covers low frequencies and trends.)

For a level *j*, the analytic vector can be recovered via so that if we define projection matrices (Olhede & Walden 2004*a*), then ; that is, we can decompose * z* into a linear combination of the contributions to

*in projection subspaces that tile the time-frequency plane via a set of equal bandwidth rectangles. We note that at any time*

**z***l*,

*l*=0,…,

*N*−1, energy in

*in the frequency band , is mapped to*

**x**_{j,n}.

Any MODWPT detail sequence , created as above, is an analytic vector. To see this, note that because * z*=

*+i*

**d***, we find that . However, the matrices and commute (see appendix A) so that and hence is an analytic vector.*

**d**### (c) Reversed demodulation

Any details vector produced as above can be viewed as having been created by (discrete) GFT, followed by multiplication with the modulus squared of the Fourier transform of a band-pass filter, and then by inverse Fourier transformation. To turn this last step of inverse Fourier transformation into inverse GFT, we carry out a reverse demodulation (see equation (1.1)) via , *l*=0,…,*N*−1.

### (d) The Hilbert spectrum and instantaneous frequency

For any projection indexed by (*j*, *n*), the Hilbert spectrum (Huang *et al.* 1998; Olhede & Walden 2004*a*) can be formed by firstly calculating the amplitude and phase sequences given by(2.2)

The derivative of the phase at discrete time *l* may be calculated via a fourth-order generalized phase-difference estimator (Boashash 1992, p. 542) so that the instantaneous frequency is given by(2.3)with , .

Now, for a large *M*, define *M* frequencies *f*_{k}=*k*Δ*f*, *k*=0,…,*M*−1, where Δ*f*=1/[2(*M*−1)], covering the interval [0,1/2]. Then, the Hilbert energy spectrum of {_{j,n,l}} may be defined as(2.4)Here, *δ*_{k,m} is the Kronecker delta, 〈*x*〉 denotes the integer closest to *x*; we used *M*=512.

### (e) The algorithm summarized

Create the analytic vector

=**y**+i**x**;**x**Forward demodulate using

=(−i)**d**;**y**Create new analytic vector

=**z**+i**d**;**d**Compute MODWPT projections ;

Carry out reverse demodulation: ;

Compute the instantaneous frequency sequences , according to equation (2.3). Compute and plot the Hilbert energy spectrum as detailed in equation (2.4).

### (f ) The choice of *J*_{0}

In our algorithm, we will compute MODWPT projections for level *j*=*J*_{0}, and so we need to choose *J*_{0}. Using, as in Olhede & Walden (2004*a*), the optimal asymptotic frequency resolution filters {*g*_{l}, *l*=0,…,*L*−1}, known as Fejér-Korovkin filters, the band-pass concentration of the MODWPT filters is very good, having the property that the only significant leakage from band *n* is into band *n*−1 and/or *n*+1, but no further. This is true at least for *J*_{0}=2, 3 with *L*=18 and 4 with *L*=22. (See fig. 4 of Olhede & Walden (2004*a*) for *J*_{0}=3.) *J*_{0}=2, 3 and 4 corresponds to dividing [0,1/2] into 4, 8 or 16 frequency bands. Our ‘rule’ is to choose *J*_{0} as small as possible, consistent with having at least one empty projection band between bands containing signal from different components in step 4 of the algorithm of §2*e*. In the examples that follow, it is seen that this strategy can be achieved by taking *J*_{0}=3. Since the leakage properties hold at least to the 16 band case of *J*_{0}=4, the minimum frequency separation between components suitable for our algorithm is one-sixteenth of the Nyquist frequency.

## 3. Examples

### (a) Application to real data

Our first example is that of the echolocation pulse emitted by the large brown bat, *Eptesicus fuscus*, digitized with a sample interval of Δ*t*=7 μs, with *N*=400. Figure 1*a* shows the standard Wigner-Ville distribution for the analytic signal vector ** y**. The

*p*=3 significant components generate

*p*(

*p*−1)/2=3 ‘outer’ interferences (i.e. interferences due to interaction of components; Flandrin 1999, p. 232), and there is also visible ‘inner’ interference where the time-frequency energy distributions are noticeably nonlinear. Figure 1

*b*shows the ordinary Hilbert energy spectrum via MODWPT projections,

*excluding any demodulation*; that is, steps 2, 3 and 5 are omitted in the algorithm of §2

*e*. Here,

*j*=

*J*

_{0}=3 so that , for

*n*=0,…,7. These eight frequency passbands are delineated by the horizontal dotted lines; the band between zero and the first horizontal dotted line corresponds to

*n*=0, with increasing

*n*until the top interval between the uppermost horizontal line and the top of the plot which covers the band

*n*=7. Although, as required in multicomponent analysis,

*at each time instant*, the different frequency chirps have been successfully separated into different frequency sub-bands; each component spans at least two sub-bands and is noticeably affected by the band-edge imperfections in the (albeit extremely good) band-pass filters generated using Fejér–Korovkin filter coefficients (see Olhede & Walden 2004

*a*).

Turning now to demodulation, the form of *s*_{l} used here was *s*_{l}=*b*_{2}*l*^{2}+*b*_{1}*l*, *l*=0,…,*N*−1, where *b*_{2}=−0.000 3 and *b*_{1}=0.175; hence, . The value of *b*_{2} is easily ‘guesstimated’ from the Hilbert spectrum results of figure 1*b* by following the middle component, and *b*_{1} can then be chosen suitably to align the signal components within the passbands as shown via the diagonal dashed lines in figure 1*b*. The bottom dashed line shows , *l*=0,…,*N*−1 for *n*=1; that is, , *l*=0,…,*N*−1. The next to bottom line shows , *l*=0,…,*N*−1, for *n*=1, i.e. , *l*=0,…,*N*−1, Thus, the interval delineated by the two bottom lines is , *l*=0,…,*N*−1, for *n*=1 and *j*=*J*_{0}=3, and as pointed out above, energy in such an interval will be mapped to following the demodulation in step 2 of the algorithm of §2*e*. This interval is marked with ‘*n*=1’ on the right of the plot. The corresponding intervals for *n*=3 and *n*=5 are also marked on the right of figure 1*b* and, following the demodulation, energy in these intervals will be mapped to _{3,3} and _{3,5}.

Figure 1*c* shows the Hilbert spectrum via MODWPT projections *incorporating forward demodulation only*; that is, only step 5 is omitted from the algorithm in §2*e*. We note as predicted from figure 1*b*, the energy in the signal has been mapped to the passbands _{3,1}, _{3,3} and _{3,5}. The demodulation has succeeded in realigning the time-frequency energy to lie (almost perfectly) within the frequency sub-bands. Figure 1*d* shows the extremely good results obtained using the full algorithm *incorporating both the forward and reverse demodulation steps*. The curvature of the lower frequency component is well preserved—there is no assumption of linear chirp components.

Our chosen demodulation has kept signal component energy in the frequency range , as required (see §2*a*). We note that a poor choice of *s*_{l} could cause energy loss by not satisfying this requirement, or result in one or more demodulated components crossing band boundaries and suffering band edge-effects as seen in the basic Hilbert spectrum without demodulation in figure 1*b*.

### (b) Application to synthetic data

Here, we consider a synthetic signal that closely resembles a whale vocalization in shape. It consists of two quadratic chirps, one having a shorter duration than the other:with *a*_{1}=−1.256 6×10^{−6}, *a*_{2}=0.001 9, *a*_{3}=1.198 3, *a*_{4}=−1.278 3×10^{−6}, *a*_{5}=1.382 3. Figure 2*a* shows the standard Wigner–Ville distribution for the analytic signal vector * y*. The two components generate a single case of outer interference (between the components), while the notable curvature of the components generates severe inner interference. Figure 2

*b*shows the Hilbert spectrum via MODWPT projections, excluding any demodulation. Here,

*j*=

*J*

_{0}=3 again so that, with Δ

*t*=1, , for

*n*=0,…,7; these 8 frequency passbands are delineated by the horizontal dotted lines. Although, as with the bat signal, at any time the different frequency chirps have been successfully separated into different frequency sub-bands, here both components cross three sub-bands leading to band-edge imperfections. To eliminate these artefacts, we can apply the demodulation method. To find {

*s*

_{l}} we defined the quadratic and found the parameter values

*b*

_{3},

*b*

_{2},

*c*by matching of its values at

*l*=0, 500 and 1000 to

*f*=0.125, 0.281 25 and 0.14, read off the plot by eye; this choice of points corresponds roughly to an arc halfway between the two components. Then, we let so that , where

*b*

_{1}is chosen to align the signal components sensibly within the passbands, as shown via the curved dashed lines in figure 2

*b*. The values used were

*b*

_{1}=−8.561 1×10

^{−2},

*b*

_{2}=3.059 2×10

^{−4}and

*b*

_{3}=−1.987 4×10

^{−7}. We have not used the known simulation parameters

*a*

_{1},…,

*a*

_{5}to obtain

*s*

_{l}, but worked entirely from the Hilbert spectrum of figure 2

*b*, as would be the case in practice.

Figure 2*c* shows the Hilbert spectrum via MODWPT projections incorporating forward demodulation only. The energy in the signal has been mapped to the passbands _{3,2} and _{3,4}. The time–frequency energy lies comfortably within the frequency sub-bands. Also shown by the dash–dotted lines in figure 2*c* are the instantaneous frequencies calculated directly from the two separate components of the synthetic whale vocalization signal.

Our chosen demodulation has kept signal component energy in the frequency range , as required.

Figure 2*d* shows the final result using both forward and reverse demodulation steps. Most of the artefacts visible in figure 2*b* have been completely eliminated, and the result is excellent. Moreover, the instantaneous frequencies are also plotted as dash–dotted lines in this figure, but cannot be distinguished from the Hilbert spectrum calculated with our algorithm.

## 4. Standard deviation of the instantaneous frequency estimate in the presence of noise

In §2*e*, we gave the steps in the demodulation algorithm, ending with the calculation of instantaneous frequency (and thus the Hilbert spectrum). Hypothetically, some bias could be introduced in the calculation of the instantaneous frequency of a component, through the steps of the deterministic algorithm of §2*e*. However, the whale vocalization was shown to exhibit no significant bias, and so it is not inevitable. Minimization of wavelet packet filter leakage effects by choosing *J*_{0} as described in §2*f*, and choosing *s*′_{l} to align the signal components comfortably within (i.e. not on the edge of) the passbands will always act to reduce bias.

We will now derive an expression for the variance of the *estimated* instantaneous frequency when the signal is observed in the presence of additive white noise. It does not refer to any formal uncertainty principle associated with our algorithm of §2*e*. Starting with white noise, we examine the effects of the various steps of the algorithm on the correlation structure of the real and imaginary parts of the resulting complex vectors.

### (a) General correlation structure

We can write the DHT matrix defined by the algorithm in Marple (1999) as the circulant(4.1)where (e.g. Hahn 1996, p. 143) the periodized filter is given by(4.2)for *l*=0,…,*N*−1. However,where ↔ denotes ‘Fourier transform pair’ and is given by(4.3)where *k*/*N* denotes a Fourier frequency. The definition of *q*_{l} means that is, importantly, skew-symmetric (i.e. =−^{T}).

Setting * x*=

**ϵ**, an

*N*-length vector of independent and identically distributed (IID) Gaussian noise (mean zero, variance ), we firstly create the analytic vector , say, where . Then,The approximation for the second variance follows because of the transfer function structure in equation (4.3)—energy at Fourier frequencies 0 and one-half are removed by the DHT. (Because zero-mean white noise will have no energy at frequency zero, only the latter frequency will matter.) In fact (Olhede & Walden 2004

*b*),(4.4)Thus, provided that

*N*is large, then for practical purposes, we can replace the approximation by an equality, and this is done hereafter. Other useful results areand

Step 2 of the algorithm in §2*e* consists of the demodulation , *l*=0,…,*N*−1. Since and are both diagonal matrices they are symmetric (so that e.g. =^{T}) and commute with each other. Also, will commute with any matrix which has all diagonal entries of zero, and similarly for ; hence, both and will commute with . Then, with , we obtainIf we now write , we getwhere we have made use of the symmetry and commutative properties of and . Now, and ^{T}+=**0** since is skew-symmetric, so we see that . In a similar way, we obtain and

Therefore, we have shown the covariance structure is unaffected by the demodulation, as would be expected.

Step 3 of the algorithm in §2*e* creates a new analytic vector , say, where . We might expect the real and imaginary parts of * z* to have the same covariance properties as the real and imaginary parts of

*, because the demodulation has not changed these second-order properties. However, there is an important difference;*

**y***is the analytic version of a complex-valued vector, whereas*

**z***is the analytic version of a real-valued vector. , thus,*

**y**The final line follows from the skew-symmetry of . Now, as shown in appendix B, ^{T} is given by the symmetric Toeplitz matrix with the first row(4.5)and as *N*→∞, we see that ^{T}→* I*. Hence, for large

*N*,(4.6)and similarly, for large

*N*,(4.7)Importantly, we see that, apart from the factor of 4, the variances and covariances of the real and imaginary parts of

*are the same as for the real and imaginary parts of*

**z***.*

**y**In step 4 of the algorithm, we compute MODWPT projections .

Since **P**_{j,n} is real-valued,say. Then,and

It was shown in Olhede & Walden (2004*b*) that for any real-valued vector, * x*, so that ,

*l*=1,…,

*N*, that is, all diagonal entries of the covariance matrix are zero.

A simple approximate expression for the elements of is derived in equation (C 3), with the same expression for . Likewise, in equation (C 2), a simple approximate expression for the elements of , is derived.

Step 5 of the algorithm consists of the reverse demodulation: , which, as we showed for the forward step, will not alter the covariance structure. Hence, if we write , then the variances and covariances of and are the same as for and above.

Hence, starting with * x*=

**ϵ**, a zero-mean IID Gaussian noise vector, we have found the (approximate) covariance structure of the complex-valued random vector

_{j,n}resulting from the five steps of the algorithm.

### (b) Instantaneous frequency

Since all steps of the algorithm up to and including reverse demodulation consist of linear matrix operations, we know that if our original input signal * x* consists of both deterministic and zero-mean white noise components, then the result of the algorithm at step 5 is given by , where, henceforth, the superscripts

*and*

**D***indicate the deterministic and noise components, respectively, and has a mean of zero. We can express a component of the deterministic part in terms of its amplitude and phase:(4.8)However, from equation (2.2), the*

**ϵ***l*th component of the estimated instantaneous phase vector at level

*j*frequency band index

*n*isand a first-order Taylor expansion about gives(4.9)

From equation (4.8), we know that . Since the variances and covariances of and are the same as for and , we use equations (C 2) and (C 3) to see that the variance of the *estimated* phase—since there is now additive noise—is

The estimated instantaneous frequency is got by substituting the estimated phase into equation (2.3). Hence, its variance takes the form(4.10)

Now, using equation (4.9), can be expressed as

Thus, is approximatelywhere the symmetric autocovariance sequence, , and approximate skew-symmetric cross-covariance sequence, , are defined in equations (C 2) and (C 3). However,and similarly for the other terms. Our approximate form for becomes

Finally, our approximation for is thus given by(4.11)

When calculating this expression with real data, the sequences of amplitudes, , and phases, , are unknown, but these can be replaced by their estimates.

In order to show the validity of the above variance expression, consider the linear chirp *x*_{l}=sin(*a*_{1}*l*^{2}+*a*_{2}*l*), *l*=0,…,1023, with *a*_{1}=0.02*π*/1024 and *a*_{2}=0.36*π*. This has an instantaneous frequency increasing from 0.18 to 0.2 over the 1024 time-points. White Gaussian noise was added to achieve a signal-to-noise variance ratio of 10. A level *J*_{0}=2 MODWPT was used; the signal lies in band *n*=1, frequency band [0.125,0.25]. was calculated using equation (4.11) and the known amplitudes and phases and is shown in figure 3*a*. Figure 3*b* shows the mean standard deviation obtained at each time from 200 independent repeat simulations, again using equation (4.11), but with the amplitudes and phases estimated from the noisy data of each simulation. Finally, figure 3*c* compares the result shown in figure 3*b* with the empirical standard deviation computed from the 200 independent repeat simulations of the estimation of instantaneous frequency at each time. It can be seen that there is good agreement between all three cases, supporting the validity of equation (4.11).

When calculating equation (4.11) with real data, not only must the sequences of amplitudes, , and phases, , be estimated, but so too must the variance of the white noise, , occurring in the formula for in equation (C 3) and in the formula for in equation (C 2). This variance can be estimated from the MODWPT details in an appropriate signal-free level and band. For example, for figure 3*b*, we used *j*=*J*_{0}=2 and *n*=3, which corresponds to the highest frequency band. When incorporating the demodulation and reverse demodulation steps we must proceed with caution: energy moved out of the positive frequency region of time-frequency into the negative region (via wraparound) by the demodulation in step 2 of the algorithm in §2*e* will be eliminated in the creation of the new analytic vector in step 3. Of course, we assume that the demodulation is chosen so that the signal is not eliminated in this way, but some noise will inevitably be, so that when estimating in this case, care should be taken not to choose MODWPT details in a signal-free region where the noise has been affected by the demodulation.

## 5. Summary

We have described a flexible approach for the time-frequency analysis of multicomponent signals. It involves the use of analytic vectors and demodulation. The demodulated analytic signal is projected onto the time-frequency plane so that, as closely as possible, each component contributes exclusively to a different ‘tile’ in a wavelet packet tiling of the time-frequency plane, and at any time the contribution to each tile definitely comes from no more than one component. A single reverse demodulation is then applied to all projected components. Two examples of how to choose the form of the demodulation sequence *s*_{l} were presented in some detail within the context of the real and simulated data examples. Following the above procedures, the instantaneous frequency of each component in each tile is well defined and may be calculated, followed by the Hilbert energy spectrum.

In order to better understand the effect of additive noise, the approximate variance of the estimated instantaneous frequency in any tile has been formulated by starting with pure noise and studying its evolving covariance structure through each step of the algorithm. The validity and practical utility of the resulting expression was demonstrated by comparing the theoretical standard deviation with all parameters taken as known, with its mean over 200 simulations with unknown quantities estimated from the noisy data, and with the empirical standard deviation computed from 200 simulations of the estimation of instantaneous frequency.

The site http://www.ma.imperial.ac.uk/statistics/research/wavelets/ contains Matlab code for our algorithm.

## Acknowledgments

The authors wish to thank Curtis Condon, Ken White and Al Feng of the Beckman Centre at the University of Illinois for the bat data and for permission to use them in this paper. The authors thank the referees for their very helpful comments leading to an improved exposition.

## Footnotes

- Received January 26, 2004.
- Accepted February 17, 2005.

- © 2005 The Royal Society