## Abstract

A new method for obtaining multivariate distributions for sub-images of natural images is described. The information in each sub-image is summarized by a measurement vector in a measurement space. The dimension of the measurement space is reduced by applying a random projection to the truncated output of the discrete cosine transforms of the sub-images. The measurement space is then reparametrized, such that a Gaussian distribution is a good model for the measurement vectors in the reparametrized space. An Ornstein–Uhlenbeck process, associated with the Gaussian distribution, is used to model the differences between measurement vectors obtained from matching sub-images. The probability of a false alarm and the probability of accepting a correct match are calculated. The accuracy of the resulting statistical model for matching sub-images is tested using images from the Middlebury stereo database with promising results. In particular, if the probability of accepting a correct match is relatively large, then there is good agreement between the calculated and the experimental probabilities of obtaining a unique match that is also a correct match.

## 1. Introduction

One of the fundamental tasks in computer vision is to find statistical models for the patterns of pixel values in images. These statistical models have many applications, including object detection (Torralba & Oliva 2003; Charpiat *et al*. 2005), edge detection (Konishi *et al*. 1999), image classification (Srivastava *et al*. 2003; Torralba & Oliva 2003), image denoising (Srivastava *et al*. 2003; Tan & Jiao 2007), super resolution (Tappen *et al*. 2003) and the matching of patterns between images. Statistical models for images can be used to make quantitative predictions about the performance of algorithms. For example, if the distribution of a particular class of patterns is known, then it is possible to estimate the number of such patterns that are likely to be found in an image, and hence to estimate the probability of a false match between patterns or the probability of a false detection of an object associated with a pattern.

It is convenient to restrict the patterns of pixel values to rectangular regions or sub-images within a larger image. Let *J* be an image of size *n*_{1}×*n*_{2} pixels^{2}. Then a sub-image *w* of *J* is any *m*_{1}×*m*_{2} rectangle of pixel values in *J*, where 1≤*m*_{i}≤*n*, *i*=1, 2. In most applications *m*_{1} and *m*_{2} are much smaller than *n*_{1}, *n*_{2}. Each sub-image corresponds to a point in a measurement space specified by a vector with *m*_{1}*m*_{2} components, corresponding to the pixels in *w*. For a wide range of images, the description of a general sub-image is simplified by making an appropriate choice of basis in . The basis associated with the discrete cosine transform (DCT) is a convenient and simple choice (Gonzalez & Woods 2002). The vector *v*(*w*) of DCT components of *w* tends to have many components with relatively small absolute values and only a few components with relatively large absolute values. The few relatively large components can be retained and the remaining components set to zero with only a small loss of information (Field 1987). The components set to zero may be different for different vectors *v*(*w*), but this difficulty is overcome using a random projection to a new lower dimensional measurement space . Experiments reported in §3*b* below show that this method of reducing the dimension of the measurement space consistently outperforms principal components analysis (PCA) on a number of natural images.

There is a second reason for using a random projection; the properties of the projected vectors can be summarized by a simple statistical model. The directions of the projected measurement vectors tend to be uniformly distributed on the unit hypersphere in , centred at the origin (Diaconis & Freedman 1984; Dasgupta 1999, 2000). The fit of the directions of the projected vectors to a uniform distribution is improved by reparametrizing . In the reparametrized space, the distribution of the measurement vectors is modelled by the uniform distribution on the unit hypersphere for their directions and an empirical distribution for their lengths. The lengths of the vectors are then scaled such that the resulting vectors are compatible with the Gaussian distribution (0, *I*(*k*)) with expected value 0 and covariance *I*(*k*), where *I*(*k*) is the *k*×*k* identity matrix. This reparametrization does not lose any information.

The distribution (0, *I*(*k*)) is applied to the matching of sub-images, or more precisely, to the matching of the measurement vectors obtained from sub-images with given dimensions *m*_{1}×*m*_{2}. A pair (*h*_{1}, *h*_{2}) of measurement vectors in is accepted as a match if *h*_{2} is near to *h*_{1}. In order to define ‘near’ in this context, the difference, *h*_{2}−*h*_{1} between *h*_{1} and a correct match, *h*_{2}, is modelled by an Ornstein–Uhlenbeck process on with limiting distribution (0, *I*(*k*)). A false alarm occurs for a given *h*_{1} if *h*_{2} is sampled from (0, *I*(*k*)) and by chance is near to *h*_{1}. The distribution for sub-images is used to calculate various probabilities, including the probability of a false alarm and the probability of accepting a correct match. In §6, the distribution for sub-images is tested by using it to predict the performance of a stereo matching algorithm on images from the Middlebury stereo database (Scharstein & Pal 2007), with a good agreement between theory and experiment. To the author's knowledge, this is the first time that a statistical model for sub-images has been used to predict the performance of a stereo matching algorithm.

All the calculations in this paper were carried out using Mathematica (Wolfram 1999).

The remainder of this paper is organized as follows. Related work is described in §2. The reduction in the dimension of the measurement space using a random projection is described in §3. The probability density function for sub-images is obtained in §4 and applied in §5 to the task of finding matching sub-images. The experiments with images from the Middlebury stereo database are reported in §6 and some concluding remarks are made in §7.

## 2. Related work

The use of a random linear projection to reduce the dimension of a measurement space is one of the main themes of the recent work on compressive sensing. An introduction to compressive sensing is given by Baraniuk (2007) and there is a much more detailed discussion by Donoho (2006). Applications of random projections to dimension reduction for data mining and database search are described by Bingham & Mannila (2001). Dasgupta (1999) showed that the eccentricity of a high-dimensional Gaussian distribution is reduced under a random linear projection and Diaconis & Freedman (1984) showed that for many sets of vectors in high-dimensional spaces, the projections of the vectors to a one- or two-dimensional space are modelled accurately by Gaussian distributions. The application of random projections to face recognition is described by Goel *et al*. (2005) and by Han & Jin (2007). Their experiments show that random projections compare well with PCA. Good accounts of PCA are given by Gonzalez & Woods (2002) and Forsyth & Ponce (2003).

Many authors have shown that there are accurate univariate statistical models for the filter responses obtained from natural images (Field 1987; Huang & Mumford 1999; Huang 2000; Grenander & Srivastava 2001; Srivastava *et al*. 2003). More recently the emphasis has been moved to multivariate statistical models. Grenander & Srivastava (2001) showed how a bivariate distribution can be built-up from marginal distributions. Huang & Mumford (1999) modelled multivariate sets of filter responses using generalized Laplace distributions. Tan & Jiao (2007) used elliptically contoured distributions, among which are included the isotropic distributions, to model wavelet components in natural images, with applications to image denoising and image restoration.

Lee *et al*. (2003) made a detailed analysis of the distribution of high contrast 3×3 sub-images of natural images. These sub-images tend to cluster near to a two-dimensional manifold that corresponds to ideal images of edges. Srivastava *et al*. (2003) reviewed the statistical modelling of natural images and describe statistical models for whole images and for sub-images. They emphasize the non-Gaussian nature of traditional representations of sub-images, as reported by Field (1987), and note that in many cases sub-images cluster near to low-dimensional nonlinear submanifolds of the feature space.

Torralba & Oliva (2003) obtained statistics from the Fourier spectra of entire images. They showed that these statistics are affected by the large-scale properties of scenes in systematic ways, and that as a result images can be assigned to broad categories without the necessity of a detailed segmentation or accurate object detection. A general stochastic model for natural images is described by Mumford & Gidas (2001).

The general view emerging from the above work on image statistics by many authors over the past 20 years is that the filter responses for natural images are remarkably predictable, in that they can be modelled by relatively simple leptokurtic distributions that depend on only a small number of parameters. The purpose of this paper is to extend this work on image statistics by defining a useful multivariate statistical model for sub-images, and then applying it to the matching of sub-images.

The algorithm used in §5 below to test the statistical model for image matching is an example of a dense two-frame stereo algorithm. A comprehensive review of such algorithms is given by Scharstein & Szeliski (2002). The algorithm in §5 has some similarity to the first entry in Scharstein and Szeliski's table 1: traditional sum of squared differences (SSD) method, with aggregation over a square window.

## 3. Method for reducing the dimension of the measurement space

A new method for reducing the dimension of a measurement space is described in §3*a*, and an experimental comparison of the method with PCA is described in §3*b*. In the experiments, all RGB images are reduced to grey scale images by taking the average of the RGB values at each pixel.

### (a) DCT components

Let *w* be an *m*_{1}×*m*_{2} sub-image of an image *J*. It is known that the discrete cosine transform (DCT) of *w* tends to have only a few components with relative large absolute values (Field 1987). Let the *m*_{1}×*m*_{2} matrix DCT(*w*) be the discrete cosine transform of *w*, indexed such that DCT(*w*)_{11} is proportional to the mean grey level in *w*. Let *v*(*w*) be the *m*_{1}×*m*_{2}−1 dimensional vector obtained by listing the elements of DCT(*w*) row by row, except that DCT(*w*)_{11} is omitted,Let *c*_{i}(*v*), 1≤*i*≤*m*_{1}*m*_{2}−1 be the components of *v*(*w*) in decreasing order by absolute value. Thus, *c*_{i}(*v*)=*v*_{j(i)} for some integer *j*(*i*) andFigure 1*a* shows image number 1 from the Rocks1 sequence of the 2006 Middlebury stereo datasets1 (Scharstein & Pal 2007). Figure 1*b* shows the corresponding graph of the average value of |*c*_{i}(*v*(*w*))| as a function of *i*, where *w* ranges over a set of 7×7 sub-images of image 1. It is apparent from figure 1*b* that the average value of |*c*_{i}(*v*)| drops rapidly as *i* increases away from 1, and then decreases less rapidly for larger values of *i*. It is known that curves of this type are obtained from a wide range of natural images, and that they are well approximated by power laws (Field 1987; Ruderman 1997).

Figure 1*b* suggests that the information in *w* can be summarized, to within a small error, by recording the values of the *c*_{i}(*v*) for 1≤*i*≤*j*, where *j* is a fixed integer less than *m*_{1}*m*_{2}−1. A difficulty arises because a DCT component *v*_{i}(*w*), with *i* fixed, may be large for one choice of *w* but small for another choice of *w*. This difficulty is overcome using random measurements (RM). Let *l* be the number of DCT components in that is to be retained. Let *v*(*l*, *w*) be the vector in obtained by setting to zero the *m*_{1}*m*_{2}−*l*−1 components in *v*(*w*) with the smallest absolute values. The vector *v*(*l*, *w*) thus has at most *l* non-zero entries. The number of random measurements is made strictly greater than the number of DCT components that are retained. This ensures that with probability 1, the vectors *v*(*l*, *w*) that cannot be recovered exactly from their associated random measurements form a set of measure 0 (Baron *et al*. 2005). This result holds because the set of *v*(*l*, *w*), for varying *w*, is a union of manifolds of dimension *l* or less, and each measurement imposes one constraint on those *v*(*l*, *w*) compatible with it.

In this application the number, *k*, of random measurements is equal to *l*+1. Let *Φ* be a fixed *k*×(*m*_{1}*m*_{2}−1) matrix with entries *Φ*_{ij} sampled independently from the Gaussian distribution (0, 1) in with expected value 0 and variance 1. The components of the projected measurement vector *Φv*(*k*−1, *w*) constitute the *k* random measurements. The fact that *Φv*(*k*−1, *w*) uniquely determines *v*(*k*−1, *w*) for almost all *v*(*k*−1, *w*) is sufficient for the applications made below. It is not necessary to undertake the difficult task of recovering *v*(*k*−1, *w*) from *Φv*(*k*−1, *w*).

### (b) Numerical comparison between PCA and random measurements

PCA is applied to *n*=5000 sub-images sampled from image 1 in the Rocks1 sequence. Each sub-image *w* is of size 7×7. Let *C* be the covariance of the *n* vectors *v*(*w*), under the assumption that the mean value of the vectors is 0, and let *e*(*i*), 1≤*i*≤48, be the eigenvectors of *C*.

The root mean square error, r.m.s.(PCA, *i*), of *i* measurements is defined bySimilarly, the root mean square error, r.m.s.(RM, *i*), of *i* random measurements is defined byIt is noted thatThe calculation of the r.m.s. error using the vectors *v*(*w*) is legitimate because the DCT preserves the norms of vectors. The graphs of *i*↦r.m.s.(PCA, *i*) and *i*↦r.m.s.(RM, *i*) are shown in figure 2 for the Rocks1 image. The graph of *i*↦r.m.s.(PCA, *i*) is the upper one at *i*=20. Define *g*(PCA, *r*), *g*(RM, *r*) byThus, *g*(PCA, *r*) is the least number of measurements required by PCA to ensure that the normalized r.m.s. error is strictly less than *r*. Experiments with a range of images show that *g*(RM, *r*) is significantly smaller than *g*(PCA, *r*). The result is particularly noteworthy, because the use of the r.m.s. error favours PCA, due to the fact that PCA minimizes the r.m.s. error over all projections onto a space with a given dimension (Forsyth & Ponce 2003). Some values of *g*(PCA, *r*) and *g*(RM, *r*) are compared in table 1 for three images from the Middlebury stereo database (Scharstein & Szeliski 2002; Hirschmüller & Scharstein 2007), an image from the Groningen natural image data base2 (van Hateren & van der Schaaf 1998) and the Lena (http://www-ece.rice.edu/∼wakin/images/lena512.bmp) image. The terms Illum2, Exp2, etc. are explained on the Middlebury web pages accessible from http://vision.middlebury.edu/Stereo/data/scenes2006.

The variable *r* in the second column of table 1 is an upper threshold on the normalized r.m.s. error. The first image, from Rocks1, is figure 1*a* and the four remaining images are shown in figure 3.

## 4. A multivariate distribution for sub-images

A model is found for the distribution of the measurement vectors obtained from the sub-images of a natural image. The strategy is to reparametrize the measurement space, in order to improve the compatibility of the sampled measurement vectors with an isotropic distribution, and then scale the lengths of the sampled vectors in order to make them compatible with the Gaussian distribution (0, *I*(*k*)). If a second natural image is given, similar to the first, for example, the two images might be a stereo pair, then it is assumed that the measurement vectors from the second image are modelled by (0, *I*(*k*)), after reparametrization and scaling using the functions learnt from the first image.

Statistical tests to verify the compatibility of the reparametrized and scaled measurement vectors with (0, *I*(*k*)) are reported in §4*b*, and some remarks on statistical modelling are made in §4*c*.

### (a) Isotropic statistical models

Let *J* be an image, let *m*_{1}, *m*_{2} be given strictly positive integers and let *Φ* be a *k*×(*m*_{1}*m*_{2}−1) matrix with entries *Φ*_{ij} sampled independently from (0, 1). Let(4.1)where *v*(*k*−1,*w*) is as defined in §3*a*. Let |*V*| be the number of elements in *V*, define the matrix *C* byand define the set by(4.2)The subtraction of the mean value from the vectors in *V* or in {*Φv*, *v*∈*V*} prior to calculating *C* does not improve the results. Most of the vectors in *V* and {*Φv*, *v*∈*V*} are clustered about the origin, but the mean is strongly affected by a few vectors far from the origin. If the mean is subtracted out, then the directions of the resulting vectors are less isotropic.

The results of Diaconis & Freedman (1984) and Dasgupta (1999) suggested that an isotropic Gaussian distribution is a possible model for . With this in mind, a function *v*↦*f*(*v*) is sought, such that the directions of the vectors are compatible with the uniform distribution on *S*^{k−1}. The uniform distribution on *S*^{k−1} has expected value 0 and covariance *k*^{−1}*I*(*k*). Let *v* be a non-zero vector in and consider a perturbation of the form(4.3)where *M* is a symmetric *k*×*k* matrix such that the Frobenius norm, ‖*M*‖, is small. The asymmetric part of *M* is set to zero because to first order its effect on *v*+*Mv* is a rotation, which does not change the fit of a set of vectors to an isotropic distribution. In detail,and the antisymmetric part of *M* makes no contribution to ‖*v*+*Mv*‖, to first order, as claimed. Let . The covariance *C*(*M*) of the directions of the perturbed vectors in , omitting any zero vector in , is(4.4)The matrix *M* is estimated by omitting the *O*(‖*M*‖^{2}) term from the right-hand side of (4.4), setting the remaining terms equal to *k*^{−1}*I*(*k*), and then solving for *M*. The matrices *tI*(*k*) for are excluded from the solution space because the term in (4.4) linear in *M* vanishes if *M*=*tI*.

The perturbation (4.3) is iterated. In practice, four iterations are sufficient to ensure that the covariance of the directions of the resulting vectors is close to *k*^{−1}*I*(*k*). Let the four *k*×*k* matrices obtained as successive values of *M* be *M*_{i}, 1≤*i*≤4, define the matrices *A*_{i}, 1≤*i*≤4, by *A*_{i}=*I*(*k*)+*M*_{i}, and set *A*=*A*_{4}*A*_{3}*A*_{2}*A*_{1}. Then *v*↦*f*(*v*) is defined by *f*(0)=0 and(4.5)

An isotropic probability density function in is determined by the density for the lengths of vectors (Samorodnitsky & Taqqu 1994; Tan & Jiao 2007). Thus, a given isotropic density can be converted to any other isotropic density by an appropriate scaling of the lengths of vectors. The vectors in are scaled such that the resulting vectors are modelled by the distribution (0, *I*(*k*)). The scaling function is estimated using the empirical distribution of the lengths of the vectors in or equivalently the lengths of the vectors in .

Define the integer valued function *r*↦*n*(*r*) byIf *x* is sampled from (0, *I*(*k*)), thenwhere (*a*, *z*)↦*Γ*(*a*, *z*) is the incomplete gamma function (Abramowitz & Stegun 1965; Wolfram 1999). The required scaling *r*↦*ρ*(*r*), *r*≥0, is obtained by numerical solution of the equationThe scaling of the lengths of the vectors in isThe resulting function *ϕ* from *V* to vectors in compatible with (0, *I*(*k*)) is(4.6)

### (b) Statistical tests

Let *H*={*ϕ*(*v*),*v*∈*V*}, where *V* is defined by (4.1) and *ϕ* is defined by (4.6). The distribution of the vectors in *H* is tested for compatibility with (0, *I*(*k*)). There are three tests. In the first test, the vectors in *H* are separated by a random hyperplane through the origin. If *H* is modelled by (0, *I*(*k*)), then the fraction of vectors in a given half space should be close to 0.5, with a small standard deviation. The second test is similar, except that the hyperplane is chosen at a given non-zero distance from the origin. In the third test, the vectors in *H* are projected onto a random one-dimensional subspace. If *H* is modelled by (0, *I*(*k*)), then the empirical standard deviation of the projected vectors should be close to 1.

Let *u* be a random vector uniformly distributed on *S*^{k−1}, let *H*_{1}(*u*) be the set of vectors *h*∈*H* for which *u*^{⊤}*h*>0 and let *Y*(*u*)=|*H*_{1}(*u*)|/|*H*|. The hypersphere *S*^{k−1} was sampled 1000 times and the empirical standard deviation, , was calculated. If (0, *I*(*k*)) is a good model for the elements of *H*, then should be near to zero with a high probability.

Let *u*∈*S*^{k−1} and define *A*(*u*, *d*) for by . Let , for *d*=0.8. The empirical standard deviation was found for 1000 values of *u* sampled independently and uniformly from *S*^{k−1}. If (0, *I*(*k*)) is a good model for the elements of *H*, then should be near to zero.

The values of and are shown in table 2 for the same five images as in table 1. In each case, *k*=12, the sub-images are of size 7×7 and |*H*|=9000.

Typical results for a set *H* consisting of 9000 samples drawn randomly from (0, *I*(*k*)) are , . The values of , obtained from the five images are comparable with those obtained using samples from (0, *I*(*k*)).

In the final test, the projections of *H* are examined. If *u*∈*S*^{k−1} and the elements of *H* are sampled from (0, *I*(*k*)), then the elements of are sampled from (0, 1). Conversely, it is a consequence of the Cramér–Wold theorem that if *X* is a random variable defined on such that for all *u*∈*S*^{k−1}, then *X*∼(0, *I*(*k*)) (Moran 1984). As in the case of table 1, *k*=12, *m*_{1}=*m*_{2}=7 and |*H*|=9000. The normalized histogram of *H*(*u*) for the Rocks1 image in figure 1*a* is shown in figure 4, superposed on the density function for (0, 1). The vector *u* was sampled from the uniform distribution on *S*^{k}. There is a good agreement between the histogram and the density.

The standard deviation of *H*(*u*) is estimated under the assumption that the elements of *H*(*u*) are sampled from a zero mean Gaussian distribution. Let *σ* be a candidate value for the standard deviation. The real line, , is divided into 10 intervals such that each interval is assigned a probability of 1/10 by the distribution (0, 1). Let *H*_{i}(*u*) be the number of elements of *H*(*u*) in the *i*th interval, and let , 1≤*i*≤10. An application of Bayes rule yields(4.7)Let *q*_{i} be the probability mass assigned by *N*(0, *σ*^{2}) to the *i*th interval, 1≤*i*≤10. The probability is obtained using the multinomial distribution,and *P*(*σ*) is obtained from the scale invariant prior, *P*(*σ*)=*σ*^{−1} (Jaynes 2003). An estimate of *σ* is obtained by maximizing the numerator on the right-hand side of (4.7),

Values of were obtained for 100 samples *u* from *S*^{k−1}. The mean and the standard deviations of the 100 values of are shown in table 3 for the five images listed in table 2.

Typical results for a set *H* consisting of 9000 samples from (0, *I*(*k*)) are mean 0.9987 and standard deviation 0.0097, both of which are comparable with the appropriate entries in table 2.

### (c) Discussion

The fact that the empirical covariance of *H* is equal to *I*(*k*) and the fact that the results of the tests in §4*b* are compatible with the hypothesis that *H* is modelled by (0, *I*(*k*)) do not prove that the vectors in *H* are sampled from (0, *I*(*k*)). There is always the possibility that *H* inherits from the original image structures that are not modelled by (0, *I*(*k*)). However, the use of random measurements and the reduction of the dimension of the measurement space from *m*_{1}*m*_{2}−1 to *k* both tend to improve the compatibility of *H* with (0, *I*(*k*)), and the results on stereo matching, reported in §6 below, confirm that the model (0, *I*(*k*)) does contain a significant amount of useful information about *H*.

## 5. Theory of matching sub-images

The distribution for sub-images obtained in §4*a* is applied to the matching of sub-images. An Ornstein–Uhlenbeck process (Karatzas & Shreve 1988) is used to model the difference between a given measurement vector and a matching measurement vector. Formulae for the probability of a single false alarm and the probability of a single correct match with no false alarm are obtained.

### (a) Statistical model for the differences between sub-images

Let *J*_{1}, *J*_{2} be a pair of images and consider the task of matching a sub-image sampled from *J*_{1} with a sub-image sampled from *J*_{2}. Let the random variables *W*_{1}, *W*_{2} model the sub-images of *J*_{1}, *J*_{2,} respectively, and let the random variable *D*(*W*_{1}) model the difference between *W*_{1} and the matching random sub-image in *J*_{2}. It follows that *W*_{2}=*W*_{1}+*D*(*W*_{1}).

It is assumed that *W*_{1} and *W*_{2} have the same distribution. This assumption places a strong constraint on *D*(*W*_{1}). The correct choice of distribution for *D*(*W*_{1}) becomes apparent on considering the random vectors *G*_{i}=*ϕ*(*v*(*k*−1, *W*_{i})), , where *ϕ* is defined by (4.6). Let be the perturbation such that(5.1)and suppose that *G*_{1}∼(0, *I*(*k*)). It is necessary to define such that *G*_{2}∼(0, *I*(*k*)). The random vectors *G*_{1}, *G*_{2} are regarded as nearby states in a stochastic process on , and is regarded as an increment in the process. If the process has (0, *I*(*k*)) as a limiting distribution, then this distribution is preserved under the increment , giving , as required. The simplest option is to use the Ornstein–Uhlenbeck process *s*↦*X*_{s}, 0≤*s*, in (Karatzas & Shreve 1988), as described by the Itô stochastic differential equationThe process *B* is a standard Brownian motion on . If *X*_{0} is given, then the expected value *m*_{s} and covariance *C*_{s} of *X*_{s} are

The diffusion model for image matching preserves the symmetry between the two images *J*_{1}, *J*_{2}, in that the random vectors *G*_{1}, *G*_{2} are required to have the same distribution, (0, *I*(*k*)). The symmetry is broken when a single sub-image, *w*_{1}, is chosen in *J*_{1} and a matching sub-image is sought in *J*_{2}. The diffusion model biases the matching in favour of those sub-images that appear frequently in *J*_{2}. This bias is more extreme if the vector, *h*_{1}, in obtained from *w*_{1} is in a region of in which the probability density function for (0, *I*(*k*)) takes very small values.

The two random vectors *G*_{1}, *G*_{2} are regarded as states of a single Ornstein–Uhlenbeck process *X*. The pair (*h*_{1}, *h*_{2}) is defined to be a correct match, or equivalently a realization of (*G*_{1}, *G*_{2}), if *X*_{0}=*h*_{1} and *X*_{t}=*h*_{2} where *t* is a fixed time, to be determined. Let *x*↦*f*_{t}(*h*_{1}, *x*) be the probability density function for *X*_{t}, given that *X*_{0}=*h*_{1}. The density *f*_{t}(*h*_{1}, *x*) is Gaussian with expected value *m*_{t}(*h*_{1})=exp(−*t*)*h*_{1} and covariance *C*_{t}=(1−exp(−2*t*))*I*(*k*). Let *b*(*δ*) be defined for 0≤*δ*<1 such that(5.2)A pair (*h*_{1}, *h*_{2}) is accepted as a match if ‖*h*_{2}−*m*_{1}(*h*_{1})‖≤*b*(*δ*). The quantity *δ* is the probability that (*h*_{1}, *h*_{2}) is accepted, given that (*h*_{1}, *h*_{2}) is a correct match.

It remains to estimate *t*. A training set (*h*_{i1}, *h*_{i2}), 1≤*i*≤*m*, of correctly matched pairs of vectors is required. It is assumed that *h*_{i2} is sampled from the density *x*↦*f*_{t}(*h*_{i1}, *x*) and *t* is estimated numerically using maximum likelihood,(5.3)

The above matching algorithm is related to the sum of squared differences (SSD) algorithm, as listed by Scharstein & Szeliski (2002) in their table 1. The difference is that the above algorithm does not calculate the SSD using the original sub-images but instead calculates a version of the SSD in a measurement space, , in which the measurement vectors have the distribution (0, *I*(*k*)).

It is noted that other diffusion models for sub-images are possible. For example, the feature space could be parametrized such that the probability density function for sub-images is the uniform density on the unit ball. The natural diffusion for sub-images is then a Brownian motion in the unit ball with reflection at the boundary. Unfortunately, the numerical calculations associated with this diffusion are extremely complicated.

### (b) Probabilities of false alarm and acceptance

Let *E*(*h*_{1}, *h*_{2}) be the event that (*h*_{1}, *h*_{2}) is a correct match and let *A*(*h*_{1}, *h*_{2}) be the event that (*h*_{1}, *h*_{2}) is accepted as a match. It follows from (5.2) and the criterion for accepting a match that(5.4)It is assumed that there are *c* candidate matches to *h*_{1} and that the correct match is included among them. It follows that:The probability *P*(*A*(*h*_{1}, *h*_{2})|*E*(*h*_{1}, *h*_{2}),*h*_{1}) is affected by the assumption that the correct match is included in the *c* candidate matches, but this effect is neglected under the assumption that *c* is large.

If (*h*_{1}, *h*_{2}) is not a correct match, then it is assumed that *h*_{2} is sampled from (0, *I*(*k*)), independently of *h*_{1}. A false alarm occurs if (*h*_{1}, *h*_{2}) is accepted, even though it is not a correct match. Let be the complement of the event *E*(*h*_{1}, *h*_{2}). The probability of a false alarm is given byLet . A short calculation yieldsLet *N*(*h*_{1}) be the event that no match to *h*_{1} is accepted. It follows that:Suppose that vectors *H*={*h*_{i1}, 1≤*i*≤*n*} are obtained from sub-images of *J*_{1}, and each vector *h*_{i1} has *c* candidate matches obtained from sub-images of *J*_{2}, of which one is a correct match to *h*_{i1}. The average probability, *P*(*N*(*H*)|*H*), that no match is accepted is(5.5)Let *F*(*h*_{1}) be the event that the correct match to *h*_{1} is not accepted and that there is a single false alarm. It follows that:and(5.6)Let *T*(*h*_{1}) be the event that the correct match to *h*_{1} is accepted and that there are no false alarms. It follows that:and(5.7)

In §6, the calculated probabilities *P*(*N*(*H*)|*H*), *P*(*F*(*H*)|*H*), *P*(*T*(*H*)|*H*) are compared with their empirical estimates.

## 6. Experiments on stereo matching

The theory of matching described in §5 was tested experimentally on stereo images from the Middlebury stereo database. The aim of the test was to show that it is possible to predict the performance of the matching algorithm.

### (a) Experiments

All RGB colour images were reduced to grey scale images by taking the average of the RGB values at each pixel. The Rocks1 image sequence contains seven images numbered from 0 to 6, and rectified such that if (*x*_{1}, *y*_{1}) and (*x*_{2}, *y*_{2}) are matching points in different images then *x*_{1}=*x*_{2}. The disparities *y*_{2}−*y*_{1} for the matches between image 1 and 5 are available, rounded to integer values, in the file disp1.pgm. Image 1 is shown above in figure 1*a*, with the *y*-axis horizontal.

Suitable values for *k*, *m*_{1}, *m*_{2} were chosen and the function *ϕ* in (4.6) was calculated using 9000 *m*_{1}×*m*_{2} sub-images sampled from image 1 at the vertices of a square grid. Stereo matching was carried out between image 1 and 5 using 500 sub-images sampled from image 1, with the centres of the sub-images at the vertices of another square grid. The 500 sub-images are shown in figure 5 for *m*_{1}=*m*_{2}=15.

Let *D*(*x*_{1}, *y*_{1}) be the disparity, rounded to an integer, at (*x*_{1}, *y*_{1}) in image 1. If the disparity at (*x*_{1}, *y*_{1}) is not defined, for example, due to occlusion, then *D*(*x*_{1}, *y*_{1})=0. A match (*x*_{1}, *y*_{1})↔(*x*_{5}, *y*_{5}) was regarded as correct if(6.1)The bound of 2 in (6.1) is reasonable, first because *y*_{1}, *y*_{2} and *D*(*x*_{1}, *y*_{1}) are rounded to integers and second because the correlations between nearby sub-images lead to additional matches in the neighbourhood of a correct match. A consequence of (6.1) is that correct matches to (*x*_{1}, *y*_{1}) occur in groups of five.

Let *D*_{max}, *D*_{min} be, respectively, the maximum and minimum of the non-zero entries of *D*. The number of candidate matches to (*x*_{1}, *y*_{1}) is *D*_{max}−*D*_{min}+1. The number *c* of candidate matches, as defined in §5*b*, is defined pessimistically by *c*=*D*_{max}−*D*_{min}−3. This definition of *c* is obtained by subtracting four from *D*_{max}−*D*_{min}+1, to take account of the fact that five of the matches are grouped together and considered to be a single correct match. The value of *c* is adjusted for the relatively small number of sub-images that are near to the boundaries of the images. The parameter *t* was estimated using (5.3) applied to 1000 sub-images sampled from image 1 and their matching sub-images in image 5.

### (b) Results

The results of the experiments are shown in tables 4–8. The same random projection matrix *Φ* was used throughout. The variable *δ* in the first column of each table is the probability (5.4) that a correct match is accepted. The average probabilities *P*(*N*(*H*)|*H*), *P*(*F*(*H*)|*H*), *P*(*T*(*H*)|*H*) are as defined by (5.5)–(5.7), respectively, and , , are the empirical estimates of these average probabilities.

The empirical estimates were calculated as follows. Any sub-image with 11 or more false alarms was discarded on the grounds that the large number of false alarms was incompatible with the statistical model for matching described in §5*a*. To give a numerical example, in the case of the Rocks1 image, if *k*=12 and *δ*=0.95, then (5.2) yields *b*(*δ*)=2.6328, giving a probability of a false alarm less than 0.1379, and an exceedingly small probability of 11 or more false alarms. If, nevertheless, 11 false alarms are observed, then this is a sign that the model has failed for the particular sub-image under consideration. Let *n* be the number of the remaining sub-images and let *n*(*F*) be the number of sub-images for which no match was found. Then is defined by , and , are defined analogously. The values of *n* are given in the second column of each table.

The final column in tables 4–8 gives the value *σ*=(*p*(1−*p*)/*n*)^{1/2} of the standard deviation in the estimate of the probability *p*=*P*(*T*(*H*)|*H*) by the mean value of *n* independent Bernoulli random variables, each of which takes the value 1 with probability *p*. The prediction of by *P*(*T*(*H*)|*H*) is the best possible if .

The probability is similar to the percentage of unoccluded badly matched pixels defined by Scharstein & Szeliski (2002) in their §5.1. The values of in tables 4–8 are similar to the values of , in spite of the fact that the values of *c* are much larger than the maximum disparity of 15 given by Scharstein and Szeliski in their table 2.

The agreement between *P*(*T*(*H*)|*H*) and is good in tables 4–6 for the larger values of *δ*. In contrast to *P*(*T*(*H*)|*H*) and , the differences between *P*(*N*(*H*)|*H*) and and between *P*(*F*(*H*)|*H*) and in tables 4–6 are small at low values of *δ*, and tend to increase as *δ* increases. The increase in the value of *k* for table 5 led to increases in *P*(*T*(*H*)|*H*) and , when compared with table 4, but to a reduction in the accuracy with which *P*(*T*(*H*)|*H*) predicted .

There is less agreement between *P*(*T*(*H*)|*H*) and in table 7 for the Bowling2 sequence. The values of *n* are low, showing that a large number of sub-images have 11 or more false alarms. In addition, *P*(*T*(*H*)|*H*) and are very low. The Bowling2 image, as shown in figure 3*b*, is dominated by the uniform surface of the bowling ball. This suggests that the results in table 7 arise because the perturbations in (5.1) are comparable in size with the differences between the pairs of 7×7 sub-images sampled from the area covered by the bowling ball. Table 8 shows that the larger values of *n* and larger values of *P*(*T*(*H*)|*H*) are obtained if the sub-images are increased in size to 15×15, and *k* is increased proportionately to 56. (Note that for this large value of *k*, the covariance of the directions of the vectors in (see (4.2)) is near to *k*^{−1}*I*(*k*) and thus the matrix *A* in (4.5) can be set equal to *I*(*k*).) The agreement between *P*(*T*(*H*)|*H*) and is less in table 8 than in tables 4–6.

## 7. Conclusion

The sub-images of an image have been mapped to a measurement space in which they are modelled by a Gaussian distribution (0, *I*(*k*)). The differences between the measurement vectors of matching sub-images are modelled using an Ornstein–Uhlenbeck process which has (0, *I*(*k*)) as a limiting distribution. The resulting statistical model for image matching is tested by using it to calculate certain probabilities that measure the performance of a stereo matching algorithm. The calculated probabilities are compared with those obtained experimentally by applying the algorithm to images from the Middlebury stereo database. The performance of the algorithm is successfully predicted. In particular, if the probability of accepting a correct match is relatively large, then there is good agreement between the calculated and the experimental probabilities of obtaining a unique match which is also a correct match.

The probability density function for sub-images and the associated Ornstein–Uhlenbeck process has many potential applications, including image registration, object detection and stereo matching. In the case of stereo matching it may be possible to predict the amount of information contributed by each part of a matching algorithm, and in this way find out which parts of the algorithm are optimal and which parts could be improved.

## Footnotes

↵URL http://vision.middlebury.edu/Stereo/data/scenes2006, full size image, illumination 2, exposure 2.

↵The original Groningen image, imk00038.iml, at http://hlab.phys.rug.nl/imlib/l1_200/index.html was converted to the JPEG format for ease of use.

- Received May 23, 2008.
- Accepted November 14, 2008.

- © 2008 The Royal Society