## Abstract

In the field of blind source separation, Jacobi-like diagonalization-based approaches constitute an important tool for independent component analysis (ICA). Recently, simultaneous diagonalization of cumulant matrices of third- and fourth-order has been studied by a number of authors. In this work, we present an optimal parametrized composition of these cumulants that puts two classical contrasts, namely, the cumulant-based ICA and the weighted fourth-order contrast in a common framework. It is shown that the optimal weight parameter depends on the *a priori* statistical knowledge of the original mixing sources. Following the same spirit of the ICA algorithm, we derive the analytical solution for the case of two sources. Finally, a number of computer simulations have been performed to illustrate the behaviour of the Jacobi-like iterations for the maximization of the proposed parametrized contrast.

## 1. Introduction and motivations

The problem of *blind source separation* (BSS) arises in many signal processing applications such as communications, array processing, speech analysis and speech recognition (Hyvärinen *et al*. 2001; Cichocki & Amari 2002). In all these instances, the underlying assumption is that several linear mixtures of unknown, random, zero-mean and statistically independent signals, called *sources*, are observed. The BSS problem consists of recovering the original sources from their mixtures without *a priori* information of coefficients of the mixtures and knowledge of the sources. The principle involved in the solution to this problem is nowadays called *independent component analysis* (ICA), which can be viewed as an extension of the widely known *principal component analysis* (PCA). Nowadays, ICA has become an active field of research that has attracted great interest because of its large number of applications in diverse fields.

The condition for separability has been studied in Comon (1994*a*), and it was pointed out that, for statistically independent non-Gaussian sources, the separation can be achieved by restoring the *independence*. The independence between the recovered sources can be measured by their *mutual information* (MI), which measures the information that one variable contains about another one. Most importantly, the MI is zero if and only if the sources are independent. A natural criterion to measure the mutual independence between *M* variables (say ) is the divergence between the joint probability density and the product of the marginal ones. If the Kullback–Leibler divergence is followed, it ends up with the MI (Comon 1994*a*)(1.1)where *p*(** y**) and

*p*

_{i}(

*y*

_{i}) are the multivariate and marginal probability density function (PDF) of

**and**

*y**y*

_{i}, respectively. In Comon (1994

*a*), the Edgeworth expansion was used to approximate the PDF in the MI criterion. The Edgeworth expansion of the MI of the standardized (i.e. whitened) real variables, up to an additive constant

*I*

_{0}and as a function of standardized

*cumulants*, is given as (Comon 1994

*a*)(1.2)where

*κ*

_{iii}(

**) and**

*y**κ*

_{iiii}(

**) are, respectively, the third- and fourth-order marginal cumulants of each entry of**

*y***, i.e. and .1**

*y*A cumulant of order higher than two qualifies as a *contrast* (Comon 1994*a*). By definition (Comon 2004), a contrast (** y**) is a mapping from the set of densities {

*p*

_{i}(

*y*

_{i}),

**∈**

*y*^{M}} to , where

*M*is the number of sources, such that if

**has independent components, then (**

*y***)≥(**

*y***), ∀**

*Ay***non-singular, with equality if and only if**

*A***is non-mixing; also, (**

*A***) is invariant to permutation or scaling of the components of**

*y***. Thus the maximization of specific cumulants would result in a successful blind separation for a particular type of sources.2 For example, if the sources are asymmetrical (skewed), then the maximization of the third-order cumulant would be enough to ensure successful separation; similarly, for symmetric sources, the maximization of fourth-order would be sufficient (Comon 1994**

*y**a*), i.e.(1.3)Equation (1.3) is discriminating over the set of random vectors

**, provided there is at most one null third-order (respectively, fourth-order) marginal cumulant for skewed (respectively, symmetrical) sources (Comon 1994**

*y**a*,

*b*). Recently, Blaschke & Wiskott (2004) studied the simultaneous use of third- and fourth-order cumulants as a better approximation of MI,(1.4)The Blaschke-Wiskott's ICA algorithm is known as cumulant-based ICA (CuBICA).3 Moreover, experimental results reported in Blaschke & Wiskott (2004) provided some evidence that better results may be obtained by using the contrast (1.4) than (1.3). Although (1.4) can handle both symmetric and asymmetric distributions simultaneously, it does not guarantee to

*always*perform better than (1.3). There exist scenarios where the contrast comprising only third-order cumulants performs better than (1.4).

To enhance the efficacy of cumulant-based contrasts, the literature witnesses a set of efforts where researchers proposed weighted contrasts. Firstly, the weighted sum of contrasts is also a contrast (Moreau *et al*. 1999; Comon 2004) and, secondly, these weights are supposed to be obtained from the true or estimated statistics of the original mixing sources subject to the optimization of a certain performance measure. Moreau & Thirion-Moreau (1999) suggested a weighted contrast using the fourth-order statistics, as given by(1.5)where *ϵ* indicates the sign of kurtosis and *w*_{i} are free weight parameters. In (1.5), it was assumed that all sources had the same sign of kurtosis. Optimal weight parameter was obtained for the case *M*=2. In the following year, Stoll & Moreau (2000) proposed another generalized form of weighted contrast,(1.6)The contrast (1.6) is a weighted combination of fourth-order auto- and cross-cumulants. In general, auto- and cross-cumulants are maximized and minimized, respectively. It was experimentally shown that better performance can be achieved by selecting appropriate binary values for weight parameters. It was, however, not suggested analytically how the optimal set of binary or real weights for the given source statistics can be obtained. Also, using *weighted*-centroids, closed-form near-optimal/optimal fourth-order ICA estimators have been obtained by Murillo-Fuentes & Gonzalez-Serrano (2004) and Zarzoso *et al*. (2006).

Interestingly, all of the aforementioned weighted contrasts are based on fourth-order statistics, which makes them more suitable for symmetric sources than for asymmetric ones. Note that asymmetric sources arise in many practical scenarios, such as in sonar signal processing (Nikias & Petropulu 1993) or source separation of urban images (Ziegaus & Lang 1997), see also Karvanen & Koivunen (2004). In some cases, digitized speech signals have non-zero skewness; separation of such signals obtains some benefit from third-order statistics (Choi *et al*. 1998). Also, in biomedical applications, skewness may become more important than kurtosis for certain categories of signals; certain artefacts (e.g. eye-blinking), and some components in electrocardiograms and electroencephalograms are not symmetric in nature.

Based on such considerations, in this work, we present a weighted contrast comprising third- and fourth-order cumulants, capable of handling symmetric and asymmetric sources jointly in an *optimal* manner. The proposed weighted contrast, the derivation of optimal weight parameter and the resulting ICA algorithm are described in §2. Experiments and results are provided in §3. We conclude briefly in §4. All simulations were carried out with Matlab; analytical calculations in §2*d* were supported by the symbolic toolbox of Matlab.

## 2. A weighted composite-order contrast and ICA algorithms

### (a) Notation and conditions

Consider an *M*-input *M*-output memoryless channel described by ** x**(

*n*)=

**(**

*As**n*), where

*n*∈ is the discrete time;

**(**

*x**n*) is an

*M*×1 vector of the observed sources;

**(**

*s**n*) is an

*M*×1 vector of the original sources; and

**∈**

*A*^{M×M}is an unknown (invertible) mixing matrix. The goal of BSS is to determine a separation matrix

**∈**

*B*^{M×M}, such that

**(**

*y**n*)=

**(**

*Bx**n*)=

**(**

*BAs**n*)=

**(**

*Cs**n*) recovers the original sources up to a permutation and scaling, where

**is a global matrix representing the mixing and non-mixing structures.**

*C*Source separation is typically carried out in two steps. In the first step, whitening or standardization projects the observed vector ** x**(

*n*) on the source subspace and yields a set of second-order decorrelated, normalized sources

**(**

*z**n*)=

**(**

*Wx**n*), such that E[

*zz*^{T}]=

*I*_{M}. As a result, the original and whitened sources are related through a unitary transformation

**(**

*z**n*)=

**(**

*Qs**n*). The separation problem thus reduces to the computation of unitary matrix

**, which is accomplished in a second step. The ICA approach to BSS consists of computing**

*Q***, such that the entries of the separator output are as independent as possible.4**

*Q*### (b) Givens rotation

The algebraic nature of cumulants is tensorial (McCullagh 1987); owing to the multilinearity of cumulants *κ*…(** y**) in

*κ*…(

**), a cumulant-based contrast becomes an implicit function of the elements of the unitary matrix . We obtain and , where the unitary transformation matrix is modelled as a product of (**

*z**M*(

*M*−1)/2) Givens (or plane) rotation matrices . Each is a rotation around the origin within the plane of two selected components

*μ*and

*ν*, and has the matrix form(2.1)with Kronecker symbol

*δ*

_{ab}and rotation angle

*ϕ*

_{μν}. The estimation of the plane rotation

*ϕ*

_{μν}is obtained by an iterative Jacobi method over the set of orthonormal matrices. The orthonormal transforms are thus obtained as a sequence of plane rotations. Each plane rotation is applied to a pair of coordinates, such that, and , while leaving the other coordinates unchanged.5 Thus, the Jacobi approach considers a pairwise sequence of two-dimensional ICA problems. Considering the subspace of only two selected components, the Givens rotation matrix becomes(2.2)

### (c) Proposed contrast and optimal weight

A generic weighted composition of third- and fourth-order cumulants is possibly given as(2.3)The contrast (2.3) does not arise directly from the MI criterion, but it is a weighted combination of two solutions (cf. (1.3)) that are not only contrast, but also provide a good approximation to MI under specific assumptions. However, in the absence of those assumptions, it is possible to obtain better results using (2.3) by appropriately selecting the values of the free parameters. This improved efficacy is possible because the other contrasts (1.3) are only *approximate* MI solutions. For the purpose of pairwise Jacobi-like iteration, we consider *M*=2 in (2.3),(2.4)Note that the number of weight parameters has been reduced to four to be determined in any iteration. Further, it can be reduced to three, if the contrast is normalized by any one of the four parameters. In the blind scenario, where we usually have no *a priori* knowledge of mixing sources, the estimation of these weight parameters (whether the number of parameters is three or four) is not trivial. In an earlier attempt, A. K. Nandi (2000, unpublished work) studied the contrast (2.4) for a two-source scenario. He limited his search to *two* weight parameters and put forward the suggestions: (i) *w*_{S,1}=*w*_{S,2} and *w*_{K,1}=*w*_{K,2}, (ii) and , for *i*, *j*=1, 2, and (iii) *αw*_{S,i}+*w*_{Ki}=1 (normalized), *i*=1, 2 and *α*∈. Exemplarily, he provided the following simple and suboptimum expressions (*i*=1, 2)(2.5a)and(2.5b)The expressions (2.5*a*) and (2.5*b*) did provide acceptable results in a number of experiments; however, due to their heuristic nature, it is difficult to consider them a suitable choice for a general ICA problem.

Suppose, if it is known that the mixing sources are highly skewed, then we can limit the search space to a *single* parameter by setting *w*_{S,1}=1 and *w*_{K,1}=*w*_{K,2}=0, and look for the optimal value of *w*_{S,2}. Similarly, if it is known that the sources are symmetrical, then we can set *w*_{S,1}=*w*_{S,2}=0 and *w*_{K,1}=1, and look for the optimal value of *w*_{K,2}. However, this strategy of weight optimization makes use of *either* third-order *or* fourth-order cumulants *but* not both. Another possibility is that we select *w*_{S,1}=*w*_{S,2}=1 and *w*_{K,1}=*w*_{K,2}=*β*, which leads to(2.6)

Interestingly, the study of the single weight parameter for the optimized use of a fourth-order contrast function has been reported in some recent articles including Moreau & Thirion-Moreau (1999), Murillo-Fuentes & Gonzalez-Serrano (2004) and Zarzoso *et al*. (2006). From these studies, we know that the optimal value of *β* can be obtained by performing *small-error analysis*, i.e. the value of *β* is optimum if it minimizes the asymptotic (large-sample) mean square error (m.s.e.) performance.

Owing to Moreau & Thirion-Moreau (1999), this analysis can easily be carried out. First, we consider that the mixing matrix is orthonormal so that the prewhitening stage is not necessary. Furthermore, the asymptotic analysis is carried out for the case of two real sources, subject to the planar (Givens) rotation. In such a case, it is assumed that a first stage has already realized the normalization of the observation vector, i.e. ** z** is supposed to be a whitened vector. Thus, we have to estimate an angle

*ϕ*according to the maximization of (.), i.e. , where is an estimate of the true (separation) value . In practice, the maximization of the contrast function does not provide the exact value of the parameter , since the true cumulants are actually approximated by the sample estimates. Replacing the expectations by sample averages leads to the empirical version of (

**), which is denoted and is given by(2.7)where and . As a result, an estimation error is involved in the estimation of the true value . The estimated angle is actually the solution of the estimating equation . Approximating this derivative around the true value by means of its Taylor series expansion yields , where and . Assuming to be in the neighbourhood of , we obtain . The m.s.e. is given by(2.8)**

*y*When , ** y**=

**. The m.s.e. expression (2.8) is generalized and is thus valid for any two-dimensional contrast for ICA problems. Furthermore, the**

*s**strong law of large numbers*ensures that converges with probability 1 to its expected value. As

*N*→∞, we have , where(2.9a)and(2.9b)Next, we obtainwhere(2.10a)

(2.10b)and(2.10c)

The m.s.e. depends on the statistics of the sources and on the parameter *β*. We now easily derive the optimum value of *β*, denoted *β*^{*}, such that the m.s.e. is minimum by solving the equation , i.e.(2.11)which indicates that *β*^{*} depends on the statistics of the mixing source and is independent of the coefficients of the unknown mixing matrix. Hence, given the source statistics, we can obtain a contrast with minimum asymptotic m.s.e. In the scenario that nothing is known *a priori* about the source statistics, a possible simple strategy is to use the statistical properties of the whitened version of the observed sources for the evaluation of *β*^{*}, and the separation can be repeated until *β*^{*} converges.

Note a resemblance among the expressions (2.5*a*), (2.5*b*) and (2.11): the denominators in these expressions can be seen to be equal if *α*=3*A*_{2}/(4*A*_{1}). Also note that, even if the two sources *s*_{1} and *s*_{2} have the same statistics, the value of *β* is not *necessarily* equal to 1. Compare this with Moreau & Thirion-Moreau (1999), where optimal weight is shown to be equal to unity for sources with similar statistics. In the next section, we discuss the details of the ICA algorithm derived from the proposed contrast (2.6).

### (d) Proposed ICA algorithms

We consider the case of real mixtures and assume that the angle of rotation is required to lie in the interval (−*π*/2,*π*/2]. Considering the pairwise estimation of the angle of rotation, the optimization problem reduces to a one-dimensional search. Looking carefully at the optimization criterion (2.4) reveals that stationary points can be obtained by mere polynomial rooting technique (as initially suggested by Comon (1994*a*)). First, we reformulate the cumulant expressions in a matrix form for ease of derivation, as given by(2.12a)and(2.12b)To avoid trigonometric functions in , we adopt the following form:(2.13)where *θ* is an auxiliary variable defined as . Now, we expand the squares of third-order cumulants as a function of *θ*,(2.14)where , , , , , and . We define, for the sake of simplicity, *b*_{0}=*κ*_{111}(** z**),

*b*

_{1}=3

*κ*

_{112}(

**),**

*z**b*

_{2}=3

*κ*

_{122}(

**) and**

*z**b*

_{3}=

*κ*

_{222}(

**). Similarly, we expand the squares of the fourth-order cumulants,(2.15)where , , , , , , , and . We define**

*z**a*

_{0}=

*κ*

_{1111}(

**),**

*z**a*

_{1}=4

*κ*

_{1112}(

**),**

*z**a*

_{2}=6

*κ*

_{1122}(

**),**

*z**a*

_{3}=4

*κ*

_{1222}(

**) and**

*z**a*

_{4}=

*κ*

_{2222}(

**). Combining (2.14) and (2.15), we obtain(2.16)where**

*z**e*

_{8}=

*c*

_{6}+

*d*

_{8},

*e*

_{7}=

*c*

_{5}+

*d*

_{7},

*e*

_{6}=

*c*

_{4}+

*d*

_{6}+

*c*

_{6},

*e*

_{5}=

*c*

_{3}+

*d*

_{5}+

*c*

_{5},

*e*

_{4}=

*c*

_{2}+

*d*

_{4}+

*c*

_{4},

*e*

_{3}=

*c*

_{1}+

*d*

_{3}+

*c*

_{3},

*e*

_{2}=

*c*

_{0}+

*d*

_{2}+

*c*

_{2},

*e*

_{1}=

*d*

_{1}+

*c*

_{1}and

*e*

_{0}=

*d*

_{0}+

*c*

_{0}. Taking the derivative of (2.16) with respect to

*θ*and setting that to zero, we obtain(2.17)where

*f*

_{8}=

*e*

_{7},

*f*

_{7}=2

*e*

_{6}−8

*e*

_{8},

*f*

_{6}=3

*e*

_{5}−7

*e*

_{7},

*f*

_{5}=4

*e*

_{4}−6

*e*

_{6},

*f*

_{4}=5

*e*

_{3}−5

*e*

_{5},

*f*

_{3}=6

*e*

_{2}−4

*e*

_{4},

*f*

_{2}=7

*e*

_{1}−3

*e*

_{3},

*f*

_{1}=8

*e*

_{0}−2

*e*

_{2}and

*f*

_{0}=−

*e*

_{1}. The expression (2.17) is eighth-order; however, the proposal

*w*

_{S,1}=

*w*

_{S,2}and

*w*

_{K,1}=

*w*

_{K,2}will help it to get reduced to fourth-order. By selecting

*w*

_{S,1}=

*w*

_{S,2}, we note that

*c*

_{6}=

*c*

_{0},

*c*

_{5}=−

*c*

_{1},

*c*

_{3}=0 and

*c*

_{4}=

*c*

_{2}; similarly, with

*w*

_{K,1}=

*w*

_{K,2}, we obtain

*d*

_{8}=

*d*

_{0},

*d*

_{7}=−

*d*

_{1},

*d*

_{6}=

*d*

_{2},

*d*

_{5}=−

*d*

_{3}, which allows us to write (2.16) as(2.18)Owing to Comon (1994

*a*), the substitution

*ξ*=

*θ*−1/

*θ*can be made, which simplifies (2.18) into a reduced-order form as a function of auxiliary variable

*ξ*, given by(2.19)where

*h*

_{4}=

*c*

_{0}+

*d*

_{0},

*h*

_{3}=−(

*c*

_{1}+

*d*

_{1}),

*h*

_{2}=5

*c*

_{0}+

*c*

_{2}+4

*d*

_{0}+

*d*

_{2},

*h*

_{1}=−(4

*c*

_{1}+3

*d*

_{1}+

*d*

_{3}) and

*h*

_{0}=

*d*

_{4}+4

*c*

_{2}+4

*c*

_{0}+2

*d*

_{0}+2

*d*

_{2}. Taking the derivative of (2.19) with respect to

*ξ*and setting that to zero, we obtain the polynomial(2.20)where

*g*

_{4}=−

*h*

_{3}/8,

*g*

_{3}=2

*h*

_{4}−

*h*

_{2}/4,

*g*

_{2}=3

*h*

_{3}/2−3

*h*

_{1}/8,

*g*

_{1}=

*h*

_{2}−

*h*

_{0}/2 and

*g*

_{0}=

*h*

_{1}/2. The explicit analytical solutions to the roots of polynomial (2.20) can be found using a standard algebraic procedure, such as Ferrari's formula. At most, two of the roots correspond to the maxima of (

*ξ*). Similar to the findings of Comon & Cardoso (1990), there are, in general, only two real roots to polynomial

*Ω*(

*ξ*), and the contrast (

*ξ*) admits, in general, a single maximum. After finding the roots of (2.20) and the corresponding value of the contrast, we retain

*ξ*that maximizes the contrast, and compute the value of

*θ*by solving

*θ*

^{2}−

*ξθ*−1=0, we retain

*θ*that satisfies

*θ*∈(−1, 1] and finally compute the rotation matrix (2.13). Next, we consider two cases of the proposed ICA algorithm as follows:

First, we assume constant values for free parameters:

*w*_{S,1}=*w*_{S,2}=4 and*w*_{K,1}=*w*_{K,2}=1 (or equivalently,*w*_{S,1}=*w*_{S,2}=1 and*w*_{K,1}=*w*_{K,2}=(1/4)). It results in the same contrast as in (1.4); however, unlike CuBICA, we have used a root-finding method for the maximization of the contrast. The resulting ICA algorithm is named a*composite-order ICA algorithm*(based on third- and fourth-order cumulants), COICA.Secondly, we assume

*w*_{S,1}=*w*_{S,2}=1 and*w*_{K,1}=*w*_{K,2}=*β*, where*β*is an optimized weight parameter, as we have derived in §2*d*. The resulting ICA algorithm is named an*optimized composite-order ICA algorithm*(based on third- and fourth-order cumulants), OCOICA.

It is to be noted that the aforementioned formulation of the proposed algorithms has some similarities with that of ICA algorithms, ICAj(3,4) and ICAs(3,4), which were presented in Moreau (2001). However, we use only auto-cumulants and Moreau (2001) uses both auto- and cross-cumulants. Furthermore, unlike us, Moreau (2001) does not use any weight parameter.

## 3. Experiments and results

In order to illustrate the potential benefits of the proposed algorithms, some computer simulations are now presented. We intend to compare the performance of COICA and OCOICA with the joint diagonalization of third-order cumulant matrices (Com3) proposed by Comon (1994*b*), the joint diagonalization of fourth-order cumulant matrices (Com4) proposed by Comon (1994*a*) and the joint diagonalization of third- and fourth-order cumulant matrices (CuBICA) proposed by Blaschke & Wiskott (2004). The performance measure, interference-to-signal ratio (ISR), introduced by Chevalier (1995), has been used in our simulation to characterize the restitution quality quantitatively. The performance index reads(3.1)where *c*_{ij} represents the element (*i*, *j*) of the global mixing and non-mixing matrix ** C**. Also, the ISR approximates the m.s.e. of the angle estimates around a valid separation solution (Zarzoso

*et al*. 2006).

We consider two cases of sources:

*A parametric source.*We borrow a parametric source*s*(*α*) from Moreau (2001). This is a discrete independently and identically distributed source that takes its values in the set {−1, 0,*α*} with the respective probability {1/(1+*α*), (*α*−1)/*α*, 1/(*α*(1+*α*))}. The real parameter*α*is called the cumulant parameter,*α*≥1. Note that E[*s*]=0, E[*s*^{2}]=1,*κ*_{3}(*s*)=*α*−1 and*κ*_{4}(*s*)=*α*^{2}−*α*−2. Various (discrete) distributions can be obtained with appropriate values of*α*, for instance,*α*>2 gives leptokurtic (*κ*_{4}>0),*α*<2 gives platykurtic (*κ*_{4}<0),*α*=2 gives mesokurtic (*κ*_{4}=0),*α*>1 gives asymmetrical (*κ*_{3}≠0) and*α*=1 gives symmetrical (*κ*_{3}=0) distributions.*Synthetic sources*. Random sources with desired skewness and kurtosis are generated by a*power distribution method*(Fleishman 1978). The sources used in the simulation are labelled 1–30 and are listed in table 1. Except for sources labelled 5, 6, 7 and 11, all other sources are skewed; similarly, except for sources labelled 7, 12 and 13, all other sources have non-zero kurtosis. All the sources are drawn with zero mean and unit variance.View this table:

### (a) Experiment 1: ISR versus cumulant parameter

In this experiment, we used parametric sources and estimated the ISR performance as a function of cumulant parameter *α* for a fixed sample size. The mixing matrix *A*_{2×2} is taken fixed for all experiments, such that *A*_{11}=*A*_{12}=*A*_{22}=1 and *A*_{21}=0.9. The condition number of this matrix is 38 and is reasonably high. The ISR is computed for 50 different values of *α* ranging from 1 to 2.5, as depicted in figure 1. The sample size *N* is taken to be 5000 for all algorithms and each trace of ISR is averaged over 700 Monte Carlo realizations.

Note that the ISR floor of Com3 is decreasing with an increase in *α*, which is quite natural, as the magnitude of the third-order cumulant increases with *α*. For *α* close to 1, the poor performance of Com3 is due to the fact that the third-order cumulants do not bring sufficient statistical information since their values are near zero. For *α*>2, however, the performance of Com3 can be seen to be much better than that of Com4 and slightly better than that of CuBICA and COICA. The better performance of Com3 over that of Com4 (for *α*>2) is quite justified based on the findings of Herrmann & Nandi (2000), where it was shown that, by considering the asymmetric nature of sources, one can gain better performance over solely fourth-order schemes. Moreover, the better performance of Com3 in comparison with that of CuBICA or COICA makes it clear that merely a joint (un-optimized) use of third- and fourth-order cumulants cannot guarantee a better performance.

Note that Com4 exhibits poor performance in the neighbourhood of *α*=2. This is not surprising because the fourth-order cumulants do not bring sufficient statistical information since their values are near zero. Moreover, in spite of the difference in their algorithmic formulation, the performances of CuBICA and COICA can be noted to be similar for all values of *α*.

Note that the performance of the OCOICA, in comparison with that of Com3, Com4, CuBICA and COICA, is almost insensitive to the variation in the statistics of the sources. For 1≤*α*<1.5, the ISR floor of OCOICA is quite close to those of Com4, CuBICA and COICA. Moreover, for *α*≥1.5, the ISR floor of OCOICA is lower than those of others by at least 5 dB. The performance gain, achieved by OCOICA, is significant. It is important to note that the *β* in OCOICA was estimated based on the sample statistics of the whitened version of the observed mixtures and no *a priori* statistical information of sources was exploited.

### (b) Experiment 2: ISR versus cumulant parameter for various sample sizes

This experiment provides a detailed account of the ISR performances that were partially investigated in experiment 1. Here, we obtain the ISR performances of Com3, Com4, COICA and OCOICA for various sample sizes (*N*=125×2^{i}, *i*=0, 1, …, 7) versus cumulant parameter *α*∈[1, 3]. Results obtained for Com3, Com4, COICA and OCOICA are depicted in figures 2*a*,*b* and 3*a*,*b*, respectively. Note in figure 2*a* that Com3 exhibits a satisfactory performance for all values of *N* (even as small as *N*=125). However, when *α*→1, the distribution comes close to symmetry and Com3 stops working and results in a high ISR. So, if it is not known *a priori* that the sources are symmetrical (corresponding to *α*→1), then Com3 is not an appropriate choice as an ICA algorithm.

Note similarly in figure 2*b* that Com4's performance deteriorates significantly, not only at *α*=2 (where *κ*_{4}=0), but also in the large vicinity around it (which is 1.5<*α*<2.5). So, if it is not known *a priori* that the sources are mesokurtic, then Com4 is not an appropriate choice as an ICA algorithm. Also note that, for *N*<500, Com4 is found to perform unsatisfactory in the limit *α*→1.

Note the performance of the proposed algorithm COICA in figure 3*a*; the COICA can be seen to exhibit an impressive behaviour for all values of *N* with no significant deterioration observed either in the vicinity of *α*=1 or that of *α*=2. Although, ISR floors can be seen to be lifted a little around *α*=2, but still (in this vicinity) the performance is as good as that of Com3.

Note in figure 3*b* that, for sample sizes less than 500, the OCOICA exhibits a relatively weak performance in the vicinity of *α*=1 and 2. These are the points where skewness and kurtosis vanish, respectively. A possible reason is that the value of *β*^{*}, which requires the computation of fifth- and sixth-order moments, cannot be estimated with reasonable accuracy using a small set of samples. Also, the true *β*^{*} requires the statistics of original mixing sources, which is not available in the given blind scenario.

For a larger sample size, note that, after being held constant and staying comparatively low for a range of *α*, the ISR floor of OCOICA disappointedly starts to increase. However, also note that the value of *α* (at which the ISR floor takes off) is bigger for larger sample size. This is an important observation here, which implies that, if a large enough sample set is ensured, then irrespective of the value of *α*, OCOICA is capable of giving a successful source separation, with a separation quality much better than those of Com3, Com4 and CuBICA.

The parametric source in consideration has a property that its higher order moments increase exponentially with *α*. Specifically, we need sample-based fifth- and sixth-order moments of the source in the estimation of *β*^{*}. We also know that (Bourin & Bondon 1995) the efficiency of the sample estimation of the *p*th-order moment is inversely and directly proportional to the square and variance of the *p*th-order moment, respectively; also, the variance of the *p*th-order moment is directly proportional to the magnitude of the 2*p*th-order moment. This implies that the variance in the estimation of *β*^{*} would be very high for large *α*. In future, we aim to quantify this estimation variance to gain more insight into OCOICA.

### (c) Experiment 3: contrast parameter *β* versus cumulant parameter *α*

For parametric sources, the optimal value of *β* can be presented in a closed form. Using E[*s*^{5}]=*α*^{3}−*α*^{2}+*α*−1 and E[*s*^{6}]=*α*^{4}−*α*^{3}+*α*^{2}−*α*+1, we obtain(3.2)To conform the analytical value of *β*^{*} (3.2), we simulated its values under the same mixing scenario as specified in experiment 1, except for *N*=500 000. The analytical and simulated values can be seen to be conforming with each other for all values of *α* in figure 4.

### (d) Experiment 4: m.s.e. versus contrast parameter *β*

The m.s.e. expression (2.8) can be used to obtain the m.s.e. performance of the proposed algorithm OCOICA for the parametric source,(3.3)Note in expression (3.3) that, irrespective of the value of *α*, if *β*=*β*^{*}, then the m.s.e. would be zero analytically. This set of experiments provides a comparison of the analytical m.s.e. (3.3) with those obtained from computer simulations for two cases: *α*<2 (sub-Gaussian) and *α*>2 (super-Gaussian). In both cases, we select two sources with similar statistics and mix them through the unitary transformation where *ϕ*=15°. The resulting m.s.e. performances are depicted in figure 5*a*,*b* for *α*=1.8 and 2.2, respectively. The values of sample size and Monte Carlo runs are mentioned in the figures. It is clear from these results that our analytical findings are in complete conformation with simulation results and the use of the optimal weight parameter *β* provides significant improvement for both types of distributions (sub- and super-Gaussian).

### (e) Experiment 5: ISR versus sample size (synthetic sources)

In this experiment, we estimate ISR performances as a function of sample size *N* for a number of random combinations of synthetic sources. The mixing matrices *A*_{M×M} are taken to be of the order 2×2, 5×5 and 10×10. Matrices are generated from normal distributions with zero mean and unit variance. The condition number of ** A** is constrained to be less than 50. The mixture is first whitened via PCA based on the singular value decomposition of the observed data matrix. The curves have been averaged by

*ν*independent Monte Carlo runs. The value of the product

*νN*is selected to be 1×10

^{8}, 1×10

^{6}and 5×10

^{5}for

*M*=2, 5 and 10, respectively. The value of the weight parameter has been computed from sample-based statistics of the whitened version of the observed mixtures. Results are depicted in figures 6 and 7 for

*M*=2 and

*M*>2, respectively. Each of these figures compares the performances of COICA and OCOICA with those of Com3, Com4 and CuBICA. Note that the proposed OCOICA algorithm is outperforming other algorithms for all combinations. Also note that the improvement in ISR reduction achieved by OCOICA is consistent in all cases, even though the weight parameter has been computed from the whitened version of the observed mixures.

## 4. Conclusions

We have studied an optimal parametric composition of third- and fourth-order cumulants for BSS based on Jacobi-like diagonalization and derived the analytical solution in the case of two sources. Five different computer simulations have been provided for the successful separation of two and more real sources. The resulting optimized composite-order ICA algorithm is shown to be performing better than three existing ICA algorithms. The estimate of the optimal contrast parameter was obtained from the sample-based statistics of the whitened version of the observed mixtures. The simulations show that one could take advantage of the weight parameter to derive an optimal contrast using composite-order statistics. Note that the proposed algorithms can handle symmetric and asymmetric distributed sources simultaneously, and provide better performance with little excess computational overheads. The effect of Gaussian/non-Gaussian background noise on the performance has not been studied here and it is left as a possible direction of future work.

## Acknowledgments

One of the authors, S.A., would like to acknowledge the financial support of the Overseas Research Studentship Awards Scheme, UK, the University of Liverpool, UK and COMSATS Institute of Information Technology (CIIT), Islamabad, Pakistan. S.A. is on leave of absence from the (Electrical Engineering Department) CIIT for higher studies.

## Footnotes

↵For zero-mean random variables

*X*_{i},*X*_{j},*X*_{k}and*X*_{l}, the third- and fourth-order cumulants are defined, respectively, as:*κ*_{ijk}()=E[*X**X*_{i}*X*_{j}*X*_{k}] and*κ*_{ijkl}()=E[*X**X*_{i}*X*_{j}*X*_{k}*X*_{l}]−E[*X*_{i}*X*_{j}]E[*X*_{k}*X*_{l}]−E[*X*_{i}*X*_{k}]E[*X*_{j}*X*_{l}]−E[*X*_{i}*X*_{l}]E[*X*_{j}*X*_{k}].↵If a certain criterion for BSS qualifies to be a contrast, then it is not necessary that it also approximates the MI. Similarly, if a certain statistical quantity approximates MI, then it does not necessarily qualify for a contrast. Note that the expression (1.2), which is a pretty good approximation to MI, is not a proven contrast as a whole.

↵We are referring to Blaschke & Wiskott (2004) due to their simple presentation; otherwise, the joint use of third- and fourth-order cumulants for ICA has been investigated earlier by a number of researches, including Moreau (2001, 2006).

↵Cardoso (1999) emphasizes that components which are as independent as possible according to some measure of independence are not necessarily uncorrelated, and it is for the reason that exact independence cannot be achieved in most practical applications. Thus, if decorrelation is desired, it must be enforced explicitly. Interestingly, this approach leads to a simple implementation; the whitening matrix

can be obtained straightforwardly by computing the matrix square root of the inverse covariance matrix of*W*, and the orthonormal matrix*x**Q*^{T}can be obtained by the Jacobi technique subjected to the maximization of some suitable criterion for independence. In short, whitening (second-order independence) solves the BSS problem up to an orthogonal transformation.↵This data-based Jacobi algorithm for ICA works through a sequence of sweeps on the whitened data until a given orthogonal contrast is optimized; sweep is defined to be one complete pass through all possible pairs of distinct indices. In simple words, the Jacobi iteration spans the whole set of rotation matrices in a sequential manner. The updating step on a pair (

*μ*,*ν*) partially undoes the effect of previous optimizations on pairs containing either*μ*or*ν*(Cardoso 1999). For this reason, it is necessary to go through several sweeps before optimization is completed.- Received September 22, 2008.
- Accepted January 9, 2009.

- © 2009 The Royal Society