## Abstract

We describe a fast algorithm to propagate, for any user-specified accuracy, a time-harmonic electromagnetic field between two parallel planes separated by a linear, isotropic and homogeneous medium. The analytical formulation of this problem (ca 1897) requires the evaluation of the so-called Rayleigh–Sommerfeld integral. If the distance between the planes is small, this integral can be accurately evaluated in the Fourier domain; if the distance is very large, it can be accurately approximated by asymptotic methods. In the large intermediate region of practical interest, where the oscillatory Rayleigh–Sommerfeld kernel must be applied directly, current numerical methods can be highly inaccurate without indicating this fact to the user. In our approach, for any user-specified accuracy *ϵ*>0, we approximate the kernel by a short sum of Gaussians with complex-valued exponents, and then efficiently apply the result to the input data using the unequally spaced fast Fourier transform. The resulting algorithm has computational complexity , where we evaluate the solution on an *N*×*N* grid of output points given an *M*×*M* grid of input samples. Our algorithm maintains its accuracy throughout the computational domain.

## 1. Introduction

A measurement system can be no more accurate than the least accurate of its constituent parts. A critical part of many computational optical systems is a numerical algorithm to propagate a time-harmonic electromagnetic field between two parallel planes separated by a linear, isotropic and homogeneous medium. Within the experimental community, it is well understood that algorithms used for this purpose give approximate solutions. However, virtually none of the current algorithms provide a mechanism to control their error and, for this reason, may generate inaccurate results without indicating this fact to the user. This state of affairs is somewhat surprising because one might expect that in the computer age, of all the sources of error in an optical system, numerical error ought to be the easiest to eliminate.

At the end of the nineteenth century, Rayleigh [1] (see also [2,3]) described wave propagation via the integral
1.1where . Given the field *u*(**y**,0)=*f*(**y**) in the plane *z*=0, (1.1) describes the field *u*(**x**,*z*), *z*>0, that satisfies the Sommerfeld radiation condition. Expressing all distances in wavelengths, we note that if the propagation distance is small, then the kernel of this integral operator is highly oscillatory, but the computation can then proceed in an accurate manner in the Fourier domain. On the other hand, if the distance is very large, then application of this kernel asymptotically reduces to a scaled Fourier transform. The computational difficulties arise in the intermediate region where, in order to obtain an accurate solution, it is necessary to apply this oscillatory kernel as it is. Currently, in this intermediate region, the standard practice is to replace the kernel by its Fresnel approximation. We show that this approximation yields only limited accuracy even near the optical axis and that the accuracy deteriorates significantly away from the optical axis. Perhaps what is most troubling is that the accuracy of approximation is not controlled.

In this paper, we present a fast algorithm to evaluate the Rayleigh–Sommerfeld integral (1.1) with any user-specified accuracy. We approximate the kernel by a short sum of Gaussians with complex-valued exponents. The number of terms in our approximation is nearly minimal for a given accuracy *ϵ*. The resulting approximate kernel is then efficiently applied to input data using the unequally spaced fast Fourier transform (USFFT) [4–6], yielding an algorithm of computational complexity , where we evaluate the solution on an *N*×*N* grid of output points given a grid of *M*×*M* input samples, the same order of complexity as algorithms based on the Fresnel approximation. Our approach also significantly increases the size of the output region where the evaluation of (1.1) is accurate.

The Fresnel approximation is an important, often-used and frequently referenced, approximation to the propagated field. While our method can be viewed as a generalization of the Fresnel approximation, the two approximations are not directly related. The Fresnel approximation replaces the Rayleigh–Sommerfeld kernel by a single Gaussian with a purely imaginary exponent. In our method, for any user-specified target accuracy, we use a nonlinear algorithm to approximate the kernel as a short linear combination of Gaussians with complex-valued exponents. In both the Fresnel approximation and our method, the use of Gaussians leads to a fast algorithm to apply the approximate kernel. However, accuracy control in our approach relies on analysis-based approximations rather than directly on analytical formulae. Equally as important as the approximation of the kernel with respect to accuracy control is a careful discretization of the resulting integrals using quadratures for band-limited functions (see §2*b*).

Given the known limitations of the Fresnel approximation (e.g. [7–9]), many researchers have sought methods to improve its accuracy, e.g. by constructing series expansions of the propagated field where the Fresnel approximation appears as the first term in the series [10,11]. Unfortunately (to the best of our knowledge), such an approach does not lead to a fast algorithm with controlled accuracy. The expansions derived in these papers can be used in a limited number of cases if the boundary data are known analytically. However, if the boundary data are provided numerically (e.g. measured or produced by a computational procedure such as phase recovery), then such analytical expansions can yield only a limited accuracy. We further comment on this topic in the electronic supplementary material, §C.4.

The need for an accurate propagation algorithm arises in areas such as computational holography [12], optical component design [13] and antenna design [14]. A particularly interesting application area is X-ray diffraction microscopy [15] and related techniques, where one attempts to form an image of a microscopic sample from measurements of the magnitude of its diffraction pattern. These inverse problems are usually solved by iterative methods that include a light propagation algorithm. Therefore, the accuracy of the propagation algorithm ultimately limits the accuracy of the reconstructed image. The speed of a propagation algorithm is obviously also of critical importance for applications using iterative methods.

The numerical algorithms that we use are designed to yield any user-specified accuracy. This includes controlled accuracy in the rapid computation of integrals. The methods that we use for this purpose (specifically the USFFT and generalized Gaussian quadratures for band-limited functions) can significantly improve the performance and accuracy of even the standard methods for light propagation (see the electronic supplementary material, §§A and B).

The paper is organized as follows. The necessary mathematical preliminaries are reviewed in §2. We describe our new algorithm in §3, then discuss its region of validity in §4. In §5, we provide several numerical examples, then summarize our results in §6. By introducing this new algorithm, we hope to stimulate accuracy improvements in computational optical systems by essentially eliminating numerical errors.

## 2. Preliminaries

### (a) The Rayleigh–Sommerfeld formula

The behaviour of a time-harmonic electromagnetic field in a linear, isotropic and homogeneous medium is described by the scalar Helmholtz equation,
2.1where the wavenumber *k*=2*π*/*λ*, *λ* is the wavelength, and *u*(*x*,*y*,*z*) is the complex amplitude of one component of the vector-valued electric field at a point . We may consider each component of the field separately because their governing equations decouple in an isotropic homogeneous medium, allowing us to work with the scalar form of the Helmholtz equation instead of its vector form.

It is convenient to associate one coordinate of the three-dimensional Cartesian system with the optical axis—we choose the *z*-coordinate for this purpose, and will often represent a point as (**x**,*z*_{0}), where lies in the plane *z*=*z*_{0} transverse to the optical axis. We find it natural to measure distances in the units of wavelengths and therefore, for the remainder of this paper, set the wavenumber *k*=2*π*.

The Rayleigh–Sommerfeld integral (1.1) yields the solution *u*(**x**,*z*) of the Dirichlet problem for (2.1) in the half-space *z*>0 that satisfies the Sommerfeld radiation condition [16,17],
Given the boundary data *u*(**x**,0)=*f*(**x**), we rewrite (1.1) as
2.2where the Rayleigh–Sommerfeld kernel *K*_{z}(*r*) is given by
2.3Denoting the Fourier transform of the boundary data as
we write (2.2) in the Fourier domain as
2.4where the Fourier transform of the Rayleigh–Sommerfeld kernel (cf. [18] and references therein) is given by
2.5

Our goal is to evaluate (2.2) accurately in such a way that the computational cost does not increase with the distance *z*. It is clear that the spatial kernel *K*_{z}(*r*) is a highly oscillatory function of *r* when *z* is small and that the Fourier domain kernel is a highly oscillatory function of *ρ* when *z* is large. For many physically interesting choices of the distance *z* in the intermediate region, both *K*_{z}(*r*) and are highly oscillatory, making the direct numerical computation of *u* using either (2.2) or (2.4) impractical. In §3, we show how to approximate (2.3) with controlled error and then describe a fast and accurate algorithm to apply the resulting approximate Green function to boundary data. Our algorithm mainly addresses the propagation problem for intermediate and large values of *z*. For small values of *z*, it is well known that the problem may be solved using Fourier methods and for very large values of *z*, the problem may be solved using asymptotic methods (see the electronic supplementary material, §§A and B).

### Remark 2.1

Given the normal derivative of the boundary data Rayleigh's formula for the Neumann problem reads 2.6With minor modifications, our approach is also applicable to evaluating (2.6).

### (b) Slepian functions

All physically realistic fields must eventually decay in space and, at the same time, are essentially band-limited in the Fourier domain. An appropriate mathematical description of such fields was initiated by Slepian and collaborators [19–23] by considering a space-limiting and band-limiting integral operator and using its eigenfunctions to identify a class of functions that have controlled concentration in both the space and the Fourier domains. Slepian *et al.* showed that this integral operator commutes with the differential operator of classical mathematical physics describing the prolate spheroidal wave functions, i.e. both operators share the same eigenfunctions.

For our purposes, we use eigenfunctions with controlled concentration in a square in the spatial domain and band-limited to a disc in the Fourier domain. The construction of such eigenfunctions is described in Beylkin *et al*. [24]; it differs from the traditional construction because there is no differential operator available in this case.

Denoting a square in the spatial domain by *A*=[−*a*/2,*a*/2]^{2} and selecting a disc of radius *c* in the Fourier domain, following [24], let us define the space-limiting and band-limiting operator ,
where *J*_{1} is the first-order Bessel function of the first kind. It is shown in Beylkin *et al*. [24] that, similar to the classical case, the eigenvalues of this operator,
quantify the fraction of energy (*L*^{2}-norm) of *ψ*_{j} outside of *A*,
The eigenvalues satisfy 0<*μ*_{j}<1 and we order them in decreasing order, *μ*_{0}>*μ*_{1}≥*μ*_{2}≥⋯>0. As they have a sharp transition from being nearly one to being nearly zero (see [24]), for a user-specified accuracy *ϵ*, we select a linear subspace of the eigenfunctions, with corresponding eigenvalues *μ*_{j}≥1−*ϵ*. Given boundary data *f*, we project *f* onto this subspace, where the choice of parameters, i.e. the domain *A* and the bandlimit *c*, is described in §2*d*.

Identifying this subspace allows us to accurately evaluate integrals involving the boundary data. Following [24,25] (see also [26]), we have

### Theorem 2.2

*Let W=[−w/2,w/2]*^{2} *be a square output window and A= [−a/2,a/2]*^{2} *be the spatial domain. Then for all functions* *and for any target accuracy ϵ, we can use the algorithms in* [24]*;* [25] *to obtain a (nearly optimal) tensor product grid of quadrature nodes* **y**_{mm′}*=(y*_{m}*,y*_{m′}*)∈A, m,m′=1,…,M, and corresponding weights τ*_{m}*τ*_{m′}*>0, so that
**The number of quadrature nodes required to obtain accuracy ϵ depends on the space-bandwidth product (a+w)c as* *. These quadratures are known as generalized Gaussian quadratures for band-limited functions.*

### (c) The unequally spaced fast Fourier transform

We need to evaluate trigonometric sums of the form
at output points **x**_{nn′}=(*x*_{n},*x*_{n′}), where . Such sums can be evaluated rapidly, for any user-specified accuracy *ϵ*, using the USFFT (see [4–6]) with computational complexity .

### (d) Band-limiting the boundary data

For a given accuracy *ϵ*, there exists some square region *A*=*A*(*ϵ*)=[−*a*/2,*a*/2]^{2} such that the values of the boundary data *f* in (2.2) outside of *A* may be neglected,
2.7In this paper, we refer to the region *A* where the field is concentrated as an aperture.

Let us determine the highest spatial frequency *c* that must be propagated in order to evaluate (2.2) accurately. It follows from (2.5) that evanescent waves corresponding to spatial frequencies above *ρ*=∥**p**∥>1 are attenuated exponentially fast as a function of the propagation distance *z*. This implies that, for a given distance *z* and accuracy *ϵ*, there exists some bandlimit *c*_{e}>1 such that frequencies greater than *c*_{e} may be neglected,
A good estimate of this bandlimit is obtained by setting so that
2.8

It may happen that the boundary data *f* has a bandlimit much larger than *c*_{e}. In such cases, we set *c*_{f}=2*c*_{e} and replace *f* by its band-limited version,
where the window function *h*(*ρ*) satisfies |*h*(*ρ*)−1|≤*ϵ* for 0≤*ρ*≤*c*_{e} and drops smoothly to zero in the interval *ρ*∈(*c*_{e},2*c*_{e}]. The function will be band-limited to the disc of radius *c*_{f} and concentrated in a square aperture that is somewhat larger than the original aperture *A*. This spreading can be controlled by an appropriate choice of the function *h*; one convenient choice is a linear combination of shifted Gaussians. We use this new, larger, aperture in place of the original aperture and therefore set . It may also happen that the bandlimit *c*_{f} of boundary data is known *a priori* and is less than *c*_{e}, so it is not necessary to propagate spatial frequencies with magnitudes *ρ*=∥**p**∥∈[*c*_{f},*c*_{e}]. In either case, we set the highest spatial frequency that must be propagated to *c*=*c*_{f}, where *c*_{f} is defined as just described.

### (e) Approximation of functions by linear combinations of exponentials and Gaussians

We use an algorithm in the study of Beylkin & Monzón [27] (see also [28]) to approximate, for a target accuracy *ϵ*, a smooth function *f*(*x*) by a nearly optimal linear combination of Gaussians. As the algorithm in Beylkin & Monzón [27] finds a nearly minimal number of exponential terms, we apply it to the function . Changing variables back, *t*↦*x*^{2}, yields an approximation by Gaussians with a (nearly) minimal number of complex-valued weights *w*_{ℓ} and exponents *η*_{ℓ}, such that
2.9For completeness, we recall this algorithm for approximation by exponentials in the electronic supplementary material, §C.1.

For the functions *f*(*x*) considered in this paper, the number of terms *L* in approximation (2.9) satisfies . This behaviour is typical and occurs for a wide variety of functions encountered in applications.

### (f) Decompositions of low-rank matrices

In order to compute the singular value decomposition (SVD) of a low-rank matrix , where **S** has numerical rank *k* for a given accuracy *ϵ*, we use algorithms described in earlier studies [29,30]. The computational complexity of these algorithms is (cf. for the direct approach using a rank-revealing QR factorization).

### (g) The approximations of Fresnel and Fraunhofer

Our method of approximating kernel (2.3) resembles the approach that leads to the Fresnel approximation, which we now recall. If the propagation distance is significantly larger than both the spatial extent of the input field and the desired output region, so that *r*=∥**x**−**y**∥<*z*, it is common to use this assumption to make the (rather dramatic) approximations in (2.3),
2.10and
2.11The Fresnel approximation uses this approximate kernel in place of the Rayleigh–Sommerfeld kernel in (2.2), yielding
2.12(e.g. [31], §4.2). As the latter integral can be computed using the fast Fourier transform (FFT), this approximation is widely used despite its potential low accuracy (it turns out that the poor approximation of the kernel's phase in (2.11) is especially deleterious, see §5*c*). In §§4 and 5, we demonstrate that the accuracy of the Fresnel approximation rapidly deteriorates away from the optical axis.

When *z* is much larger than the spatial extent of *f*(**y**), it is common to make the further approximation e^{i(π/z)∥y∥2}≈1, which, when used in (2.12), leads to the Fraunhofer (sometimes called far-field) approximation
2.13(e.g. [31], §4.3). The Fraunhofer approximation, which relates the output field to the scaled Fourier transform of the input field, is especially common in antenna design and X-ray diffraction microscopy. A more accurate approximation in the far field is given by
2.14which may be evaluated via the USFFT. We further discuss the Fraunhofer approximation and derive (2.14) in the electronic supplementary material, §B.

## 3. A new algorithm for fast and accurate light propagation

In this section, we describe a fast algorithm to compute, for a fixed propagation distance *z* and any user-specified accuracy *ϵ*>0, the field *u*(**x**,*z*) in a square output window *W*=[−*w*/2,*w*/2]^{2}. We assume that the boundary data *f* has already been replaced with its space-limited and band-limited version, as described in §2*d*. Hence, *f* is band-limited with some bandlimit *c* and concentrated in a square aperture *A*=[−*a*/2,*a*/2]^{2} so that, according to (2.2), we need to compute
3.1where *K*_{z} is the Rayleigh–Sommerfeld kernel (2.3). Our algorithm comprises three steps. First, in §3*a*, we accurately approximate the Rayleigh–Sommerfeld kernel by a linear combination of Gaussians using the algorithm briefly described in §2*e*. Second, in §3*b*, we use the resulting approximation in (3.1) and accurately discretize the ensuing integrals via the generalized Gaussian quadratures for band-limited functions from theorem 2.2. Finally, in §3*c*, we use the algorithms referred to in §2*f* for computing the SVDs of low-rank matrices to rearrange the resulting sums for rapid and accurate evaluation via the USFFT (see §2*c*).

### (a) Approximation of the kernel with controlled error

The key observation behind the Fresnel approximation is that the phase of the kernel (2.3) is approximately quadratic (cf. (2.11)), at least for small values of *r*/*z*. We also use this observation but, in addition, exploit the fact that the rest of the phase can be accommodated via an approximation with controlled error, valid throughout a large computational domain.

Owing to the finite sizes of the output window *W* and input aperture *A*, it is only necessary to approximate the kernel *K*_{z}(*r*) on the interval . We demonstrate how to obtain, for any user-specified accuracy *ϵ*_{K}>0, an approximation such that
3.2We emphasize that in (3.2) the desired accuracy *ϵ*_{K} is scaled by the propagation distance *z* since the magnitude of the kernel decays like *z*^{−1} along the optical axis.

Inspired by the Fresnel approximation, we rewrite the kernel as
where
3.3Having removed the factor e^{i(π/z)r2} capturing most of the oscillatory behaviour of the kernel, the function *A*_{z} is non-oscillatory over a large region of space. We use the algorithm in §2*e* to compute, for a desired accuracy *ϵ*_{K}>0, complex-valued weights *w*_{ℓ} and exponents *η*_{ℓ} such that
3.4leading to the approximation
3.5satisfying (3.2). We define to be the result of using the approximate kernel in (3.1),
3.6The following proposition bounds the absolute error of the approximation and is an immediate consequence of the preceding discussion.

### Proposition 3.1

*Let* *be the function defined in* (3.6), *with weights w*_{ℓ} *and exponents η*_{ℓ}, ℓ=1,…,*L*, *as in* (3.4). *Then*,
3.7*where the field u*(** x**,

*z*)

*is given by*(3.1).

### (b) Discretization of integrals

Letting and , where *η*_{ℓ}, ℓ=1,…,*L*, are as in (3.4), we rearrange (3.6) as
3.8A straightforward estimate of the bandlimit of the integrands (see the electronic supplementary material, §C.3.2) may be bounded (for each term, independently of ℓ) by
where *c* is the bandlimit of the input function *f*. Using the bandlimit *c*′, we discretize the integrals in (3.8), for a desired accuracy *ϵ*_{Q}, using the quadratures from theorem 2.2.

Let **y**_{mm′}=(*y*_{m},*y*_{m′})∈*A*, *m*,*m*′=1,…,*M*, be the *M*×*M* tensor product grid of quadrature nodes with the corresponding quadrature weights *τ*_{m}*τ*_{m′}. We select an *N*×*N* grid of output locations **x**_{nn′}=(*x*_{n},*x*_{n′})∈*W*, *n*,*n*′=1,…,*N*. We then apply the quadrature from theorem 2.2 to the integrals in (3.8) and obtain an approximation to the output field at the desired locations as
3.9In (3.9), the *N*×*N*×*M*×*M* fourth-order tensors **T**^{(ℓ)}, ℓ=1,…,*L*, are given by
3.10where *n*,*n*′=1,…,*N* and *m*,*m*′=1,…,*M*, and the *N*×*M* second-order tensors (matrices) **S**^{(ℓ)}, ℓ=1,…,*L*, are given by
3.11where *n*=1,…,*N* and *m*=1,…,*M*. From theorem 2.2, we obtain the bound
3.12where is given by (3.6) and *u*_{nn′} by (3.9).

### (c) Rapid evaluation of the field

In the Fresnel approximation of the kernel, the exponent in the quadratic phase factor is purely imaginary, making it easy to compute (2.12) via either the FFT or the USFFT. In our approach, the exponents in approximation (3.5) are complex valued, although the magnitude of their real parts is small relative to the aperture and output window sizes (we describe below how to ensure that this is the case). This observation allows us to develop a fast algorithm to evaluate (3.9).

In order to evaluate the inner summations in (3.9) rapidly, we look for an approximation of in a form where the output indices *n*,*n*′ are split from the input indices *m*,*m*′. As the first step, we use the SVD to write the matrices in (3.11) as a sum of outer products,
3.13where the singular values are arranged in decreasing order and the columns of matrices **U**^{(ℓ)} and **V**^{(ℓ)} are orthonormal. By properly selecting parameters as described in §3*d*, we ensure that the *N*×*M* matrices **S**^{(ℓ)} have a low numerical rank (typically less than 25). We then use the algorithms described in §2*f* to compute these SVDs rapidly and apply the result to approximate by a low-separation-rank tensor with indices *n*,*n*′ split from the indices *m*,*m*′. The error estimate is provided by lemma 3.2 (see the electronic supplementary material, §C.2 for a proof).

### Lemma 3.2

*Let* *and* *where ℓ*=1,…,*L*, *n*=1,…,*N and m*=1,…,*M*, *be as in* (3.13). *For a desired accuracy ϵ*_{R}>0, *let I*^{(ℓ)}, *ℓ*=1,…,*L*, *be the smallest integer such that*
*Then, for ℓ*=1,…,*L*, *n*=1,…,*N*, and *m*=1,…,*M*, *we have the approximations*

Using lemma 3.2, we approximate **T**^{(ℓ)} in (3.10) as , with ℓ=1,…,*L*, *n*,*n*′=1,…,*N* and *m*,*m*′=1,…,*M*, where we have re-indexed the resulting double summation using a single index, and also have collected terms that depend on the output coordinate **x**_{nn′} as the *N*×*N*×*R*^{(ℓ)} tensors **P**^{(ℓ)} and terms that depend on the input coordinate **y**_{mm′} as the *M*×*M*×*R*^{(ℓ)} tensors **Q**^{(ℓ)}. Lemma 3.2 implies that
3.14We define to be the result of using approximation (3.14) in (3.9),
3.15It follows from (3.14) and the estimate
that
3.16where *u*_{nn′} is given by (3.9). We also have
where *b* is a small constant (this can be shown using the techniques from the electronic supplementary material, §C.3). Incorporating *b* into *ϵ*_{R}, the bound (3.16) becomes
3.17Combining the error bounds (3.7), (3.12) and (3.17), we obtain theorem 3.3.

### Theorem 3.3

*The error of computing the field u from (3.1) using (3.15) is bounded by*3.18

The expression for in (3.15) allows us to evaluate the field rapidly. We first apply as a pre-factor to the input samples *f*(**y**_{mm′}), then compute the inner sums using the USFFT and finally apply to the result as a post-factor.

In the three steps of deriving the final approximation of the field (3.15), we used three different accuracies, *ϵ*_{K}, *ϵ*_{Q} and *ϵ*_{R}, in order to emphasize these as separate steps. In practice, we choose these accuracies to be the same and set *ϵ*_{K}=*ϵ*_{Q}=*ϵ*_{R}=*ϵ*/3 to achieve the final accuracy *ϵ*.

### Remark 3.4

It is not necessary for the aperture and output window to be square. Indeed, the USFFT allows us to place the output coordinates at arbitrary locations. We have used a tensor product grid here for simplicity; with minor modifications, our algorithm may be used to compute the field anywhere in the output window with the same computational cost. The input aperture may also have any shape, provided that accurate quadrature rules are used to discretize the integrals in (3.8). We note that near optimal quadratures for circular apertures are described in [24].

### Remark 3.5

*Simplifications for separable boundary data.* As with the Fresnel approximation, our approach simplifies in the case of boundary data that are separable in Cartesian or polar coordinates. For example, suppose the function *f* is separable in Cartesian coordinates, viz.,
3.19for some functions and , *s*=1,…,*S*. In such cases, the application of the approximate kernel (3.5) simplifies to the evaluation of several one-dimensional USFFTs. Substitute (3.19) into (3.8) and rearrange to obtain an approximation for the field *u* in a separated form,
where the functions and , ℓ=1,…,*L*, *s*=1,…,*S*, can be evaluated by one-dimensional integrals. We obtain similar formulae if the boundary data are concentrated in a disc and separable in polar coordinates.

### (d) Computational cost

The number of terms in (3.4) may be estimated as , where
3.20(see the electronic supplementary material, §C.3.1). In order to control the number of terms *L*, we restrict the parameter *γ* by the empirically determined constant
3.21This, in turn, limits the domain where our approximation is valid, although this domain is significantly larger than that of the Fresnel approximation. We discuss this further in §4. This bound also implies that the ratio |*α*_{ℓ}|/(*aw*) is small, leading to a low numerical rank of the matrices **S**^{(ℓ)} in (3.11).

The cost of evaluating (3.15) depends on the number of USFFTs, *R*=*R*^{(1)}+⋯+*R*^{(L)}, which is estimated as . Hence, the overall computational cost of our algorithm is . For actual computing times, see §5*d*.

## 4. Size of the output region

In §3*d*, we ensured that our algorithm is efficient by requiring *γ* from (3.20) to satisfy (3.21). The practical impact of this requirement is to establish a relationship between the input aperture side-length *a*, propagation distance *z*, and output window side-length *w*. In particular, for a fixed aperture size and propagation distance, the largest output window that our algorithm can accommodate is
4.1provided that this number is positive. If it is negative, then the propagation distance is small with respect to the aperture size; in such cases, the propagation problem under consideration should be treated in the Fourier domain or using near-field methods (see the electronic supplementary material, §A).

Using the same reasoning, we also define the quantity as
4.2which, for a fixed aperture size *a*, gives the minimum propagation distance before our algorithm can be used.

For comparison, let us find analogues of (4.1) and (4.2) for the Fresnel approximation (2.12). Recall that the only mechanism to control the error when using the Fresnel approximation is to restrict the size of the output region. We first determine the analogue of (4.1), that is, for a given accuracy *ϵ*, let us find , the largest possible output window where the Fresnel approximation is guaranteed to achieve accuracy *ϵ*. As the Fresnel approximation replaces the phase of the Rayleigh–Sommerfeld kernel (2.3) with e^{i2πz} e^{i(π/z)r2}, we find the maximum value of *r*′_{max} such that
Using , we obtain an analogue of (4.1) for the Fresnel approximation,
giving the largest possible square output window for a square aperture with side-length *a*. The analogue of (4.2) for the Fresnel approximation is
which gives the minimum propagation distance.

To illustrate the difference between and for our method and and for the Fresnel approximation, let us choose *ϵ*=10^{−3}. If *a*=5000 wavelengths, then after propagating *z*=5×10^{6} wavelengths, we find that
so the largest side-length of our output window is approximately 17 times larger than that of the Fresnel approximation. If the propagation distance is only *z*=250 000 wavelengths, then wavelengths while is negative, implying that three-digit accuracy of the Fresnel approximation cannot be guaranteed in *any* output window. In fact, for this accuracy, the minimum propagation distance for the Fresnel approximation is wavelengths, compared with for our method.

If we choose the accuracy threshold to be *ϵ*=10^{−6}, then the minimum propagation distance for the Fresnel approximation increases to wavelengths, whereas the minimum distance for our method does not depend on the desired accuracy, and therefore remains unchanged at .

## 5. Numerical examples

### (a) A Gaussian beam

To demonstrate the accuracy of our algorithm, we choose boundary data that allows the field to be accurately computed by an alternative approach. We select the boundary data with a Gaussian profile given by
5.1where *σ* determines the width of the beam measured in the units of wavelengths. It can be shown that the propagating (i.e. non-evanescent) portion of the field is given by (cf. (2.4))
5.2where *j*_{k} is the *k*th-order spherical Bessel function of the first kind, is the normalized *k*th degree Legendre polynomial, and the coefficients *f*_{k} are defined as
5.3These coefficients decay rapidly once *k* is sufficiently large, so that we may truncate the sum in (5.2) to obtain a simple formula to compute the non-evanescent portion of the field to any desired accuracy. The error committed by neglecting the evanescent waves may be bounded by
5.4Provided that (5.4) is less than the accuracy sought, we may disregard the evanescent portion of the field entirely and regard (5.2) as a formula to compute the field generated by the boundary data (5.1) for the desired accuracy.

In our example, we choose *σ*=5 wavelengths, a square aperture of size *a*=50 wavelengths, a propagation distance of *z*=1000 wavelengths, a square output window of size *w*=450 wavelengths and a desired accuracy of *ϵ*=10^{−6}. We then use our algorithm to evaluate the (axially symmetric) field at *N*=256 points along the *x*-axis using *M*×*M*=512×512 input samples. With this choice of parameters, the number of terms needed to approximate the kernel is *L*=8, and the number of USFFTs required to evaluate the field is *R*^{(1)}+⋯+*R*^{(8)}=52.

To determine the accuracy of the result, we first compute (5.4) and find that the evanescent part of the solution is undetectable, viz., |*u*_{e}(**x**,1000)|≤8.7×10^{−113} for all **x**. We also find that the coefficients (5.3) decay to |*f*_{k}|≤10^{−15} once *k*≥200, so we truncate the sum (5.2) after 200 terms and use it to determine the accuracy of our algorithm. We display the results in figure 1 and note that the obtained accuracy is better than the accuracy goal of 10^{−6} (the bound in lemma 3.2 is not tight).

### (b) An aperture in the near field of a source

We consider an aperture in the near field of a source and thereby demonstrate that our method maintains accuracy even when evanescent waves are present in the aperture field. We arrange Helmholtz point sources in the plane *z*=−1 so that the resulting field is given by (cf. (2.6))
where **r**_{j} and *ϱ*_{j}, *j*=1,…,*J*, give the position and intensity, respectively, of each point source. The function specifying the boundary data is given by *f*(**x**)=*u*(**x**,0); as shown in figure 2*a*, to high accuracy, it is confined to a square aperture of side-length *a*=550 wavelengths. We select the propagation distance *z*=1000 wavelengths, desired accuracy *ϵ*=10^{−3}, and apply our algorithm to compute the field in a square output window of side-length *w*=100 wavelengths. Because the aperture plane is only one wavelength from a collection of point sources, evanescent waves are present in the boundary data. In this case, we sample the boundary data on a grid of size *M*×*M*=1486×1486. In figure 2, we compare our approximate solution to the true solution *u*(**x**,1000) and, as requested, it is correct to slightly over three digits. For comparison, we also show the error of the Fresnel approximation , which has only about 1.5 digits of accuracy. Observe that the conditions of this numerical experiment are favourable for the Fresnel approximation because the maximum angle that any point in the output window makes with the optical axis is only approximately 4^{°}.

### (c) Focusing waves and the Fresnel approximation

Next, we compare the field computed by our algorithm to that obtained via the Fresnel approximation by considering the boundary data
representing a spherical wave restricted to a square aperture and converging to the point (**r**_{0},*z*_{0}). In figure 3, we show the magnitude of the resulting field, |*u*(**x**,*z*_{0})|, in the plane *z*=*z*_{0} transverse to the optical axis and containing the focal point, for the choice of parameters **r**_{0}=(0,0), *z*_{0}=100 000 wavelengths, and *a*=2500 wavelengths; as expected, the field magnitude is approximately a scaled version of the function |sinc(*x*_{1}) sinc(*x*_{2})|.

Now, let us move the focal point away from the optical axis. We fix the propagation distance to *z*_{0}=100 000 wavelengths and set the focal point to
where *θ* is the angle between the optical axis and the ray from the origin to the focal point (**r**_{0},*z*_{0}). We select accuracy *ϵ*=10^{−3} and compare, for several values of *θ* in the range 0^{°}–5^{°}, the field computed by our algorithm, , and the field computed by the Fresnel approximation, , near the focal point **x**=**r**_{0}. Results displayed in figure 4 demonstrate that the accuracy of the Fresnel approximation deteriorates rapidly as the focal point moves away from the optical axis. We also display the diffraction pattern computed by our algorithm and the pattern computed by the Fresnel approximation for *θ*=5^{°} in figure 5. The diffraction pattern obtained by the Fresnel approximation is both shifted and blurred when compared to the correct one.

In figure 6, we compare the error of each method at the focal point, i.e. and , as a function of the angle *θ* (we determined the true value *u*(**r**_{0},*z*_{0}) by direct numerical integration). Our method maintains its accuracy for all *θ*∈[0^{°},5^{°}], while the Fresnel approximation is accurate to approximately three digits for *θ*=0^{°}, but has essentially no accurate digits for *θ*>4^{°}. This example demonstrates that a belief that the Fresnel approximation produces accurate results at angles up to 18^{°} off the optical axis [32,33] is not true in general.

### Remark 5.1

From figure 4, it may appear tempting to attempt to ‘correct’ the Fresnel approximation by introducing a change of variable **x**↦*g*(**x**), where the function would be selected with the goal of rescaling the field computed by the Fresnel approximation, , to more closely match the true field, *u*(**x**,*z*). In effect, the strategy would be to rescale the *x*-axis for the dashed lines in figure 4 to better align the peaks of the solid and dashed lines. Unfortunately, our example shows that the Fresnel approximation incorrectly computes the shape of the focal spot, in addition to its position (compare the nulls between the main lobe and side lobes in figure 4*d*).

### (d) Representative examples of computational cost

The computational cost of our algorithm depends on the number of USFFTs required in (3.15), i.e. *R*=*R*^{(1)}+⋯+*R*^{(L)}, where *L* is the number of terms needed to approximate the kernel in (3.5). As it turns out, *R* decreases with increasing *z*, which is expected because the application of the Rayleigh–Sommerfeld kernel asymptotically reduces to a single scaled Fourier transform as , cf. (2.14). On the other hand, for smaller values of *z* the field changes rapidly, and many USFFTs are required to compute the field accurately in these computationally challenging regions.

Let us fix the aperture size *a*=2000 wavelengths, the desired accuracy *ϵ*=10^{−3}, and set the number of input samples and output samples to *M*×*M*=*N*×*N*=512×512. We now examine the dependence of *R* on the propagation distance *z* for two different choices of output window size:

— a fixed output window of size

*w*=10 000 wavelengths and— a variable output window , where was defined in (4.1) and is the largest output window that our method can accommodate for a given propagation distance

*z*.

In table 1, we provide the number of terms, *L*, needed to approximate the kernel for several propagation distances, as well as the number of USFFTs, *R*, required to compute the field for these two choices of output window size.

In table 1, we also provide timing results for a MATLAB-based implementation of the algorithm. These timings were obtained on a laptop computer with a 2.1 GHz AMD N950 processor and 8 GB of RAM. No effort was made to optimize the code, and we expect that a careful implementation of the algorithm will be significantly faster. We also note that all USFFTs in the evaluation of (3.15) may be computed in parallel, so that the total computational time can be reduced substantially on a multi-processor computer system.

## 6. Conclusion

We have described a fast algorithm for the propagation of coherent light between parallel planes separated by a linear, isotropic and homogeneous medium. By contrast to current algorithms, our algorithm achieves any user-specified accuracy. As a consequence, we can rapidly and accurately compute the field in non-paraxial regions, i.e. regions far from the optical axis, with computational complexity proportional to that of the FFT. The overall result is a fast algorithm that can achieve any user-specified accuracy over a large computational domain.

## Funding statement.

This research was partially supported by NSF grant nos. DMS-1009951, DGE-0801680, DMS-0602284 and DOE/ORNL grant no. 4000038129.

## Acknowledgements

We thank Dr Bradley Alpert from NIST for providing many useful comments and suggestions.

- Received May 16, 2013.
- Accepted August 13, 2013.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.