## Abstract

Empirical mode decomposition (EMD), an adaptive data analysis methodology, has the attractive feature of robustness in the presence of nonlinear and non-stationary time series. Recently, in this journal, Pegram and co-authors (Pegram *et al*. 2008 *Proc. R. Soc. A* **464**, 1483–1501), proposed a modification to the EMD algorithm whereby rational splines replaced cubic splines in the extrema envelope-fitting procedure. The modification was designed to reduce variance inflation, a feature frequently observed in cubic spline-based EMD components primarily due to spline overshooting, by introducing a spline tension parameter. Preliminary results there demonstrated the proof of concept that increasing the spline tension parameter reduces the variance of the resultant EMD components. Here, we assess the performance of rational spline-based EMD for a range of tension parameters and two end condition treatments, using a global dataset of 8135 annual precipitation time series. We found that traditional cubic spline-based EMD can produce decompositions that experience variance inflation and have orthogonality concerns. A tension parameter value of between 0 and 2 is found to be a good starting point for obtaining decomposition components that tend towards orthogonality, as measured by an orthogonality index (OI) metric. Increasing the tension parameter generally results in: (i) a decrease in the range of the OI, which is offset by slight increases in (ii) the median value of OI, (iii) the number of intrinsic mode function components, (iv) the average number of sifts per component, and (v) the degree of amplitude smoothing in the components. The two end conditions tested had little influence on the results, with the reflective case being slightly better than the natural spline case as indicated by the OI. The ability to vary the tension parameter to find an orthogonal set of components, without changing any sifting parameters, is a powerful feature of rational spline-based EMD, which we suggest is a significant improvement over cubic spline-based EMD.

## 1. Introduction

Empirical mode decomposition (EMD) in conjunction with Hilbert spectral analysis forms an adaptive data analysis methodology termed the Hilbert–Huang transform (HHT). Introduced by Huang *et al*. (1998), this methodology is used across a wide range of research areas, including the geophysical sciences, where its attractive feature of robustness in the presence of nonlinear and non-stationary time series is frequently exploited. The HHT is a two-part process and consists of an EMD of a time series, followed by Hilbert transformation of the decomposition results, to produce a graphical representation of the frequency–energy–time space that encapsulates the spectral properties of the time series. A recent review of the HHT methodology and its application to geophysical problems has been provided in Huang & Wu (2008), where they discuss outstanding open problems associated with the HHT methodology. Three of these open problems: (i) choice of spline type, (ii) effects of end conditions on EMD, and (iii) best intrinsic mode function (IMF) selection and uniqueness (the choice of spline type strongly affects (ii) and (iii)) are addressed in this paper.

Spline type selection for use in the EMD algorithm is a key outstanding problem of the HHT methodology. Huang *et al*. (1998) originally suggested using cubic splines to define the upper and lower envelopes (respectively, fitted through the local maxima and minima of the time series) in each sift. The average of the two splines formed the envelope mean, which was subtracted from the time series being sifted. The impact of using alternative interpolation techniques to cubic splines in EMD is not well understood or well reported. Huang *et al*. (1998) noted that trials with taut splines showed only marginal improvement over cubic splines. Based on some unreported experiments, Rilling *et al*. (2003) suggested that cubic splines are preferred to linear (polygonal) interpolation.

Cubic spline interpolation has proved to be a difficulty in the development of a mathematic basis for EMD. Hawley *et al*. (2008) replaced cubic spline interpolation with trigonometric interpolation with the aim of facilitating analysis of EMD's theoretical basis. Chen *et al*. (2006) proposed an alternative envelope mean approach to facilitate the mathematical study of EMD. Rather than fitting splines separately to the upper and lower local extrema, they combined the extrema and took the moving average of a linear combination of cubic B-splines through the local extrema. The moving average is then treated as the envelope mean and the EMD algorithm continues as usual. Also seeking a theoretical basis for EMD, Deléchelle *et al*. (2005) proposed a fourth-order parabolic partial differential equation-based method for directly estimating the envelope mean, dispensing with the upper and lower envelope steps of the sifting process. Moving away from envelopes, sifting and cubic splines altogether, Frei & Osorio (2007) proposed an alternative to EMD called intrinsic time-scale decomposition that directly estimates a ‘baseline’ (like an EMD envelope mean) and then subtracts the baseline from the signal to obtain a residual, again as in EMD. A concern of Frei & Osorio (2007) with cubic spline-based EMD is that the over- and undershooting in the upper and lower envelopes using cubic splines can ‘generate spurious extrema, and may shift or exaggerate existing’ extrema.

Recently, in this journal, Pegram *et al*. (2008), expanding on Peel *et al*. (2007), proposed a modification to the EMD algorithm, whereby rational splines replaced cubic splines in the extrema envelope-fitting procedure. The rational spline modification was designed to reduce the variance inflation frequently observed in cubic spline-based EMD components, primarily due to spline overshooting, by introducing a spline tension parameter. Based on three exemplary annual rainfall time series, Pegram *et al*. (2008) demonstrated the proof of concept that increasing the spline tension parameter reduced the variance of the resultant EMD components. Furthermore, it was noted that varying the spline tension parameter influenced the total number of sifts required to decompose a time series as well as the total number of resultant components.

In this paper, we assess the impact of using rational splines as an extension and generalization of cubic splines in EMD applications. We apply rational spline-based EMD for a range of tension parameters and two alternative end condition treatments to a global dataset of 8135 annual precipitation time series and compare the decomposition results with those from cubic spline-based EMD. Owing to the recursive nature of the EMD algorithm, the construction of synthetic time series that demonstrate cubic spline overshooting at different stages of the sifting process is difficult. Here, we use a large dataset of real-world time series, from a wide range of climatic conditions, in order to make a general assessment of the performance of rational spline-based EMD when applied to time series where cubic spline overshooting may or may not occur during the sifting process. However, as a consequence of using real-world signals, rather than synthetic signals, we are unable to comment upon the generation of spurious extrema in the EMD process because we do not know the true structure of the signals *a priori*. Following this section, in §2, we discuss the background to EMD analysis, identify the problem, define the orthogonality index (OI) and describe the global precipitation dataset used in the application. In §3, the rational spline interpolation approach (fully described in Pegram *et al*. 2008) and the end conditions adopted are briefly outlined. The application of the spline and end condition algorithms to the 8135 annual time series is presented in §4. In §5, we discuss the results of the application and draw conclusions.

## 2. EMD background, the problem and data

### (a) Background

EMD adaptively decomposes nonlinear and non-stationary time series into a set of components comprising nearly orthogonal IMFs and a residual. A detailed description of EMD can be found in Huang *et al*. (1998) and Pegram *et al*. (2008) and will not be repeated here. For the purposes of this paper, it suffices to say that there are four steps in computing an estimate of an IMF. As an example, consider the first sift of the first residual, to obtain the second IMF of the annual precipitation time series at Ottapalam (India) in figure 1. The first residual (‘residual 1’ in figure 1) of the EMD process was obtained in a prior application of the procedure, by subtracting the first IMF from the original raw annual precipitation time series. This example illustrates the problem of over- and undershooting when cubic splines are fitted to extrema, which will be discussed in §2*b*.

The procedure (illustrated in the above example) is accomplished in four steps. First, the local maxima and minima associated with residual 1 are identified. Second, splines are fitted to the sequences of maxima and minima. In the third step, the mean of the two splines is computed (‘envelope mean’ in figure 1) and, finally at the fourth step, the envelope mean is subtracted from residual 1 giving a new potential residual that is the input to the second sift for the second IMF. The above process of sifting is repeated until a sifting stopping rule is satisfied.

There are several rules that have been developed to guide the analyst to stop the sifting process and to adopt the final residual as the IMF. Details about sifting stopping rules can be found in Huang *et al*. (1998, 2003), Rilling *et al*. (2003), Peel *et al*. (2005) and Huang & Wu (2008). The stopping rule suggested by Huang *et al*. (2003) is adopted here. The rule is that the sifting procedure should be stopped if the number of zero crossings and extrema is (i) equal to or differs by 1 and (ii) stays the same for *S* consecutive sifts; in this paper the *S*=5 stopping rule is adopted, as they have suggested. Thus at least five and generally more sifts are required to extract each IMF.

### (b) The problem

An important property of the EMD algorithm is that the IMFs and the final residual are expected to be mutually orthogonal (Huang *et al*. 1998). The Ottapalam annual precipitation time series has been chosen to illustrate the EMD technique as it is an example in which the procedure, when using cubic splines, does not preserve the independence of the decomposition components. The results of applying EMD to Ottapalam annual precipitation (*N*=63) for two cases are presented in table 1. Case 1 is for a cubic spline, whereas case 2 (to be discussed later) is an example of a rational spline with a tension parameter set at 0.6.

Case 1 (table 1) shows that the annual time series for Ottapalam can be decomposed into three IMFs and a residual. The average period is 3.2 years for IMF1, 9 years for IMF2 and 18 years for IMF3. In column (3), the variances accounted for in each IMF and the residual, relative to the variance in the original time series (254 000 mm^{2} for this rainfall series), are listed. For Ottapalam the sum of the variance ratios of the components of the decomposition equals 2.489, a value far greater than 1, which indicates that the decomposition is unsatisfactory. In particular, in column 3 of table 1, the variance of IMF2 (case 1) is 0.990, which is virtually the variance of the original time series. A relatively large sum of the variances of the components as a ratio of the observed variance is indicative of overshooting of the envelopes, but it does not indicate another property we look for—orthogonality.

We would like to define a metric that indicates a tendency to orthogonality. We note that, because the IMFs are amplitude and frequency modulated, they are very unlikely to be strictly orthogonal to each other and to the residual, unlike in a Fourier series decomposition, where the components are orthogonal by construction. If the IMFs and residual were mutually independent, then the sum of the IMFs and residual variances plus twice the covariances would equal the variance of the original time series, since the covariance terms between the IMFs and residual would equal zero. However, this criterion is not strictly reversible. If the sum of the variances of the IMFs and residual equals that of the original time series, then the IMFs and residual might be mutually independent. It is possible that negative and positive covariances between the IMFs and residual cancel each other out, resulting in decomposition components not being mutually independent, but when summed the component variances give a total close to the observed variance. To avoid any ambiguity in assessing orthogonality, we propose an OI, involving only the cross product terms, based on a suggestion by G. Rilling (2009, personal communication), and defined in equation (2.1),(2.1)In equation (2.1), *N* is the record length; NI is the number of IMFs; IMF_{ik} is the *i*th IMF at time step *k*; *x*_{k} is the original signal; and *r*_{k} is the EMD residual component at time step *k*. In this analysis, the EMD residual is deliberately not included in the OI calculation, since it embodies any trend in the annual precipitation time series. All the IMFs should have near zero mean by construction, as any trend remains in the residual which is carried forward to the end of the decomposition. The OI compares cross products between purely ‘zero mean’ components and is scaled by the detrended original series, allowing a fair comparison. We suggest that it does not make sense to compute the covariance between a signal with zero mean and one with a trend and so restrict the OI to the covariances between the IMFs. The advantage of the OI is that, by taking the absolute value of the IMF cross products, the possibility of covariance terms of opposite sign cancelling each other out is removed. Thus, OI→0 indicates a decomposition approaching orthogonality.

For Ottapalam, using cubic spline-based EMD, the decomposition components are clearly not satisfactory, since (i) the sum of the IMF and residual variances divided by the original time-series variance is much greater than 1 and (ii) OI=0.73, which is far from the zero expected for full orthogonality. The cause of the excess variance in the IMFs is primarily due to the significant over- and undershooting when the cubic spline is fitted to the extrema in the EMD sifting process (figure 1, near the year 1920), which results in significant non-zero covariance terms between the IMFs and residual. Reduction in variance inflation is of considerable importance for EMD applications where the IMFs and residual are used individually or combined into frequency filters (for example: in filtering, Gloersen & Huang 2003; in stochastic data generation, McMahon *et al*. 2008; in two-dimensional spatial analysis, Sinclair & Pegram 2005; Fauchereau *et al*. 2008).

In Pegram *et al*. (2008), we demonstrated the usefulness of the rational spline as an alternative approach to the fitting of cubic splines to extrema in the sifting process associated with EMD analysis. The rational spline procedure introduces a tension parameter (*p*), which varies for practical purposes from 0 (the cubic spline case) to large values in which the spline approaches a linear (polygonal) interpolator. A second parameter (*r*) controls the end conditions of the fitted splines and is either 0 or 1, with *r*=0 indicating a ‘natural’ spline end condition and *r*=1 indicating a ‘reflective’ end condition (see Pegram *et al*. (2008) for details). In §4, we apply the rational spline-based EMD algorithm to a global dataset of annual precipitation station data and examine how the tension (*p*) and end condition (*r*) parameters affect three variables: (i) the OI, (ii) the number of IMFs, and (iii) the number of sifts per IMF.

### (c) Data

The EMD algorithm is applied to annual time series of 8135 precipitation stations from v. 2 of the Global Historical Climatology Network database (Peterson & Vose 1997; Vose *et al*. 2005). These stations have been chosen on the basis that each station has at least 30 years of continuous (no missing values and uniform sampling rate) record. The average record length is 59.1 years, with a maximum record length of 299 years. These stations measure rainfall under a representative set of global climatic conditions (figure 2) and provide a test of the EMD algorithm's ability to decompose a wide range of natural time series.

## 3. Rational spline interpolation

Following Pegram *et al*. (2008) and using the same nomenclature, the rational spline can be described as follows. Given a set of discrete data (*X*_{j}, *Y*_{j}), *j*=1, 2, …, *N*, the purpose is to fit spline interpolant segments, to either their local maxima or minima at {*x*_{k}}, in intervals (*x*_{k}, *x*_{k+1}) *k*=1, 2, …, *n*−1, (*n*<*N*); note that {*x*_{k}} is a decimation of {*X*_{j}}. The individual segment functions are given by rational splines of the following form:(3.1)where *t*=(*x*−*x*_{k})/*h*_{k}=(*x*−*x*_{k})/(*x*_{k+1}−*x*_{k}), *u*=1−*t* and *p* is the tension parameter.

The parameter *p* controls the spline tension in the following way. The last two terms of equation (3.1) have denominators (1+*pt*) and (1+*pu*). If these go to zero anywhere in the interval as *t* and *u* go from 0 to 1, then *s*_{k}(*x*) is undefined. This will not happen if *p*≥−1. When *p* is set to −1 the denominators become, respectively, *u* and *t* reducing equation (3.1) to a quadratic spline interpolant, which usually fluctuates more wildly than even the cubic, so will not be used in this application. As *p* increases the last two terms in *s*_{k}(*x*) diminish, so that, in the limit, the spline becomes a linear (polygonal) interpolator.

### (a) End condition

In the EMD algorithm, a method for extrapolating the spline, from the last known extremum to the end of the time series (at *x*_{1} and *x*_{n}), is required. Several end condition methods have been used to deal with the beginning and end of the spline segments. Originally, Huang *et al*. (1998) added ‘characteristic waves’ to each end, which were based on the amplitude and frequency of the two nearest internal extrema. Rilling *et al*. (2003) used a ‘mirrorizing’ method, whereby an extra extremum was added to the data at each end by reflecting the extrema closest to the ends. Dätig & Schlurmann (2004) list various methods of dealing with cubic spline end conditions in their application to water waves that include both data addition and non-data addition methods. In their investigations they found that their proposed non-reflective, but wavelength-based, natural spline end condition performed adequately. Chiew *et al*. (2005) adopted an ‘averaging’ condition by using the mean of the two extrema closest to each end of the spline.

In Pegram *et al*. (2008), two end conditions were presented in combination with rational spline-based EMD. The first is the reflective or mirrored spline end condition. Here the first derivative at the end of the original series (at either or both *x*_{1} or *x*_{n}) is zero because a virtual point (at either or both *x*_{0} or *x*_{n+1}), having the same value and distance from the end as the penultimate point (*x*_{2} or *x*_{n−1}), is used to terminate the spline curve sequence. This is achieved by forcing the virtual point to have the same second derivative as the penultimate point it mirrors. This end condition is similar to the mirrorizing data addition method of Rilling *et al*. (2003). This method has been successfully used in recent analyses (Peel & McMahon 2006). The second end condition, suggested in Pegram *et al*. (2008), is known as the natural spline ending. Here, the second derivative of the virtual points (defined the same way as in the reflective case, either or both of *x*_{0} or *x*_{n+1}) is set to zero, so that the first derivative of the spline at the end of the original series (*x*_{1} or *x*_{n}) tends to be close to zero (although it is not constrained to be). The reflective end condition is obtained by setting the end condition parameter (*r*) to 1 and the natural end condition is obtained by setting *r* to 0 (Pegram *et al*. 2008). We will use this notation throughout this paper.

## 4. Application

The main objective of this paper is to investigate how varying the rational spline tension parameter *p* and the two spline end conditions governed by *r* influence the OI, the number of IMFs and the number of sifts required to identify the IMFs, when rational spline-based EMD is applied to 8135 annual precipitation time series. Following this analysis, recommendations will be made for ensuring that resultant IMFs and the residual are likely to be independent (orthogonal) and achieved in an efficient manner.

### (a) Orthogonality

Revisiting the Ottapalam time series, figure 3 shows the OI for a range of tension values *p* and for the natural spline end condition, *r*=0. At *p*=0 (not plotted, due to the log scale), the OI for Ottapalam is 0.73 (table 1, case 1) and, as observed in figure 3, the OI drops sharply from 0.68 to 0.12 when *p*≈0.05, reaches a minimum at *p*=0.6 (table 1, case 2), rises to approximately 0.18 when *p*=3.5 and 5 and then fluctuates about 0.14 when *p* is between 7 and 10 000. Also shown in figure 3 is the number of IMFs required for each decomposition. When *p* lies between 0 and 2 there are three IMFs, for *p*=3.5 and 5 there are five IMFs (coinciding with a jump in OI) and when *p*=7 to 10 000 there are four IMFs. The lowest OI values near *p*≈0.5 require only three IMFs to decompose the time series. The increase in OI for *p*=3.5 and 5 is due to the number of IMFs required for increasing from 3 to 5, and the subsequent reduction in OI when *p*>5 is due to the number of IMFs required for reducing from 5 to 4. The sharp drop in OI near *p*≈0.05 is due to the emergence of a riding wave in the first residual (near 1915 in figure 1), which introduces two extra extrema that reduce the spline over- and undershooting associated with decompositions when *p*<0.05.

Figure 4 shows the resultant IMFs and residual for (*a*) *p*=0 and (*b*) *p*=0.6 cases, the characteristics of which are presented in table 1. For case 2 in table 1 (shown in figure 4*b*), we note that IMF1 accounts for 65 per cent of the observed variance, IMF2 for 27 per cent, IMF3 23 per cent and the residual 20 per cent. Contrast these values with case 1, where IMF2 dominates with 99 per cent of the variance of the original series. Even though the minimum OI (0.087), for the tested *p* values, was achieved in case 2, the sum of the relative variances is relatively large at 1.333, which indicates that the sum of the relative variances is not a good indicator of orthogonality, although it may well be an indicator of spline overshooting. Figure 4*a* illustrates how the over- and undershooting in the cubic spline evident at the beginning of the series in figure 1 propagates through the subsequent IMFs and the residual. The influence of over- and undershooting is removed in figure 4*b* when the higher tension parameter value is used and the resultant IMFs and residual are much better behaved.

The general results of applying the revised methodology to the global dataset are presented in figures 5–11. Here, rational spline-based EMD was applied to the 8135 annual precipitation records for tension parameter settings: *p*={0, 0.5, 1, 2, 5, 10 and 20}. Each tension parameter setting was run for the two end conditions *r*=0 and 1.

The interaction between the OI and the rational spline tension parameter is explored in figure 5. Since the distribution of OI at a given *p* value is often highly positively skewed, the results are presented in box plot form in figure 5. First, we observe that the inter-quartile and 95% ranges of OI across the 8135 stations reduce as *p* increases, which is consistent with the proof of concept results of Pegram *et al*. (2008). In the cubic spline case (*p*=0), the inter-quartile and 95% ranges are the widest of all the *p* values tested. Furthermore, the upper and lower limits of the inter-quartile and 95% ranges are, respectively, the highest and lowest for all the *p* values tested. Second, the median OI is at a minimum for *p*=0.5 for both end conditions (0.076 for *r*=0 and 0.072 for *r*=1) and then gradually rises as *p* increases and is the highest for *p*=20.

A possible explanation for the decrease in the 95% range of OI as *p* increases is that the variance inflation introduced by over- and undershooting associated with cubic and near cubic interpolation is removed as the interpolation becomes more linear (polygonal) with increasing *p* values. Finally, we observe that the natural spline end condition (*r*=0) generally produces a median value of OI and inter-quartile ranges slightly higher than the reflective end condition over the range of *p* tested. Although the median values for the two end conditions indicate little difference between the two methods; in general, the reflective end condition results in more orthogonal sets of IMFs than the natural spline end condition, for a given value of *p*. This is reflected in the lower median and inter-quartile range values.

The results summarized in figure 5 are confirmed when observing which *p* value setting had the maximum or minimum OI value, for a given end condition, at each of the 8135 stations (table 2, note that each row sums to 8135). The cubic spline (*p*=0) had the greatest number of stations with the maximum (2114 and 2058) and the minimum (2085 and 2010) OI values for *r*=0 and 1, respectively, which is consistent with the wide 95% range seen in figure 5. For almost a quarter of all stations, cubic spline-based EMD provides the most orthogonal decomposition and for another quarter of the stations it provides the least orthogonal decomposition. When *p*=20 the number of stations with the maximum (2055.5 and 2036) OI value is similar to that for *p*=0. However, the number of stations with the minimum (1240 and 1179.5) OI value for *r*=0 and 1, respectively, is considerably less than for when *p*=0. The tendency for *p*=20 to produce less orthogonal results is consistent with the rise in median OI and associated inter-quartile ranges from *p*=2 as seen in figure 5. Generally, for *p* between 0.5 and 5, the number of stations with the minimum OI value is greater than the number of stations with the maximum OI value for a given *p*. The least number of stations with the maximum OI value was achieved when *p*=1 for both end conditions. Based on the results presented in figure 5 and table 2, using a tension parameter of between 0.5 and 2 to identify an orthogonal set of IMFs provides a balance between (i) reducing the influence of over- and undershooting associated with cubic spline-based EMD and (ii) the increase in median (but not the range of) OI associated with higher values of *p*.

### (b) Minimize number of IMFs

An important objective of EMD analysis is to identify physically meaningful intrinsic oscillatory modes within a time series and extract them as IMFs. As discussed in Pegram *et al*. (2008), a form of EMD that decomposes a time series into the smallest number of IMFs is believed to be more likely to identify any meaningful signal than a form that produces more IMFs. Figure 6 shows the mean and ±1 s.d. (*s*) of the number of IMFs (for the *r*=0 case) against the tension parameter for the two end conditions. We observe that the average number of IMFs across the 8135 stations increases as *p* increases. For the *r*=0 end condition, the average number of IMFs for the cubic spline case (*p*=0) is 3.32, while for *p*=20 the average number of IMFs is 4.08.

The two end condition average values are both within 1 s.d. of each other, indicating little difference between the two methods. However, over the range plotted, the reflective end condition appears to produce fewer IMFs on average than the natural spline end condition. Also of note is that the standard deviation of the number of IMFs increases slightly as *p* increases (for *r*=0, when *p*=0, *s*=0.77; when *p*=20, *s*=1.09). From this figure and figure 5, a trade-off between the orthogonality of the IMFs as identified by the OI and the number of IMFs is apparent. On the one hand, we observe the minimum number of IMFs is produced when *p*=0, but we also observe that when *p*=0 the OI fluctuates across the widest range. However, as *p* increases, the number of IMFs increases and the median OI value also increases. In EMD analysis, we suggest that the criterion of orthogonality is more important than a smaller number of IMFs while noting that a small number of IMFs is almost certain to produce a lower value of OI than a large number of IMFs.

### (c) Minimize number of sifts

Another significant objective of EMD analysis is to minimize the number of sifts per IMF. Oversifting tends to produce smooth amplitude IMFs, while undersifting may not reveal all the hidden riding waves within a time series. The sifting stopping rule applied to all EMD runs in this paper is the *S*=5 rule of Huang *et al*. (2003), which effectively places a lower limit of 5 on the number of sifts per IMF. Like the OI results, the average number of sifts per IMF at each station is highly positively skewed across all 8135 stations, so in figure 7 the results are presented in box plot form. Figure 7 reveals that the median of the average number of sifts per IMF required at a station is constant up to *p*=2, and then increases slightly as *p* increases thereafter. For the natural spline end condition, the inter-quartile and 95% ranges increase as *p* increases over the range of *p*. For the reflective end condition, the inter-quartile and 95% ranges decrease to a minimum at *p*=2, and then increase as *p* increases. The 95% range for the *r*=0 end condition is always wider than the *r*=1 end condition and the inter-quartile range for *r*=0 is greater than for *r*=1 after *p*=1. Across all 8135 stations, the *r*=1 end condition on average required approximately 5 per cent fewer sifts per IMF than the *r*=0 end condition.

However, G. Rilling (2008, personal communication) noted that the interpolation method used influences the degree of amplitude smoothing of the resultant IMF for a given number of sifts. This effect is seen in figure 8, where the estimate of the first IMF of the Ottapalam annual precipitation series is shown for three values of rational spline tension parameters *p*=0, 5 and 20 after 5, 15 and 30 sifts using the natural (*r*=0) end condition. Clearly, as the interpolation becomes more linear (*p*=20), the degree of amplitude smoothing of the IMF estimate increases relative to the cubic interpolation (*p*=0) for the same number of sifts. This amplitude smoothing feature as the tension parameter value increases should be kept in mind when selecting a value of *p*.

The relationship between the number of IMFs and the number of sifts is explored further by plotting the total number of sifts versus the total number of IMFs for all 8135 stations, for the range of tension parameter values (*p*=0, 0.5, 1, 2, 5, 10 and 20) and both end conditions, in figure 9. The figure shows a roughly linear relationship between total number of IMFs and total number of sifts (approx. seven sifts per IMF) for the range of *p* tested and the two end conditions, with the total number of IMFs and sifts increasing with *p* for both end conditions. This result is consistent with the gradual increase in the average number of IMFs with increasing *p* observed in figure 6 and the relatively constant median of the average number of sifts per IMF seen in figure 7. With an almost constant median of the average number of sifts per IMF (between 6.7 and 7.0) across the range of tension values, the increase in total sifts is primarily due to the increase in the number of IMFs required at each station with increasing *p* value. The slight increase in the median of the average number of sifts per IMF when *p*>2 (figure 7) appears to be a secondary influence on the total number of sifts (figure 8).

Figure 10, a plot of OI versus the average number of sifts per IMF for *p*=20 and *r*=1, shows that the OI is virtually independent of the average number of sifts per IMF. This result is consistent across the range of *p* tested and for both end conditions, which indicates that stations experiencing non-orthogonality issues do not require any more or less sifting to decompose than orthogonal stations. However, the OI is not independent of the number of IMFs (figure 11) across the full range of *p* and the two end conditions. For low values of *p* (approx. less than 2), the OI is virtually independent of the number of IMFs (not shown), but as *p* and the number of IMFs increase (figure 6), the OI also slowly increases. This increase in OI with the number of IMFs is possibly due to the increasing number of absolute values of cross product terms in the OI calculation (equation (2.1)), at a rate of (NI[NI−1]/2), as the number of IMFs (NI) increases.

## 5. Discussion and conclusions

In the recent review of the HHT methodology, Huang & Wu (2008) have discussed outstanding open problems with HHT. Spline type, end effects and best IMF selection and uniqueness were among the problems listed and these issues are addressed in this paper. Using a dataset of 8135 annual precipitation time series, the relationships between spline type, end condition, orthogonality of IMFs and residual, number of IMFs and number of sifts required to decompose the time series have been investigated. Rational spline-based EMD (of which traditional cubic spline-based EMD is a special case) was applied to the annual precipitation series for a range of rational spline tension parameter values and two end condition settings.

With regard to best IMF selection and uniqueness, Huang & Wu (2008) noted that EMD is capable of generating ‘infinite sets of IMFs’ by varying the sifting parameters (in this case *S*). In rational spline-based EMD, there is a variable tension parameter, *p*, as well. The question then arises as to which set of IMFs of all the possible sets (by varying *S* and *p*) is the best. Here, we contend that a set of IMFs that is more nearly orthogonal (lowest OI) and achieved with the least number of IMFs and sifts is the best. In this analysis, *S* was held constant (*S*=5), and the influence of *p* and two end conditions on orthogonality, number of IMFs and sifts was investigated.

The closeness to orthogonality of a set of IMFs was assessed through the OI; if the cross product terms equal zero, then the sum of the absolute values of the cross products will be zero and the IMFs will be a truly orthogonal set. Based on median results from 8135 annual precipitation time series, we found (as shown in figure 5 and table 2) that a tension parameter of *p* between 0 and 2 provides a good starting point for identifying orthogonal sets of IMFs for both end conditions. Increasing *p* leads to a progressively lower range of OI and a higher median value of OI for both end conditions.

The ability to vary the tension parameter towards finding an orthogonal set of IMFs is a powerful feature of rational spline-based EMD over traditional cubic spline-based EMD. Using traditional cubic spline-based EMD, the only available option for obtaining an orthogonal set of IMFs was to increase the sifting parameter *S*, which progressively leads to oversifted IMFs.

Once orthogonal sets of IMFs have been identified, the set with the least number of IMFs and sifts is considered the best set. Figure 6 revealed that increasing *p* leads on average to more IMFs for both end conditions, and in figure 7 the median of the average number of sifts per IMF was constant up to *p*=2, after which it increased slightly with *p*. A trade-off exists between increasing the *p* value to achieve orthogonality of IMFs and increasing the number of IMFs and to a lesser extent increasing the number of sifts per IMF. A further consideration is that the lower the value of *p*, the less amplitude smoothing of the resultant IMFs for a given number of sifts.

The influence of the end condition specification on the results was generally small. However, figures 5–7 and 9 all suggest that the reflective end condition (*r*=1) is slightly better, resulting on average in lower OI, fewer IMFs and fewer sifts per IMF at a station than the natural (*r*=0) end condition.

In summary, based on the results of this analysis, we have the following recommendations for EMD practitioners. We note that the effects of two of the adjustable parameters (*p* and *r*) of rational spline EMD (in this analysis *S* was set to 5 throughout) are largely independent of each other, and specifically that the tension parameter (*p*) has the greatest effect on overshooting and orthogonality. Thus, when searching for the best set of IMFs for a natural time series using rational spline-based EMD (with *S*=5), we recommend starting with the following settings: (i) set the tension parameter (*p*) to 0 and (ii) choose the reflective end condition (*r*=1). As the search progresses, we recommend varying only the tension parameter and holding the other variables constant. Progressively, increase the tension parameter (*p*) and, for each decomposition, note (i) the value of OI, (ii) the number of IMFs, and (iii) the number of sifts per IMF. We suggest that the best set of IMFs is achieved with a *p* value that minimizes these three indicators.

## Acknowledgments

Review comments from Gabriel Rilling were particularly helpful in improving this paper, along with those of an anonymous reviewer. This research was financially supported by Australian Research Council grant DP0773016 and a CSIRO Flagship Collaborative Research Fund grant.

## Footnotes

- Received August 29, 2008.
- Accepted February 25, 2009.

- © 2009 The Royal Society