## Abstract

An increasing number of satellites are being launched to observe the atmospheric concentrations of a variety of trace species. They cover a wide area at once and are expected to provide more extensive information than the rare ground-based concentration measurements. The paper introduces an adjoint technique to retrieve the emissions based on a recent concept of renormalization. This technique is used with a set of synthetic column-averaged measurements for an idealized satellite corresponding to a prescribed ground-level source. The Indian region is considered with two contrast meteorological conditions in the months of January and July, corresponding to winter and monsoon season. Since it is not feasible to handle a large volume of satellite data in the inversion due to the time involved in the computation of the matrices, a preprocessing is suggested to extract the manageable data set as a representative of the whole data.

Considering a limited number of observations, it is shown that the emissions are underestimated without and with the renormalization procedure. The degree of underestimation is relatively more with non-renormalized estimates. The non-renormalized estimate is degraded further by a refined resolution of the model, whereas the renormalized estimate is not altered appreciably. The preprocessing based on aggregation of data is found to retrieve the prescribed emissions up to 75% in the month of January and 90% in the month of July. The relatively computationally expensive renormalization may be avoided except in the case of partial visibility of the area of interest, due to cloud cover or a technical constraint. A simple criterion for the optimum design of a monitoring network is suggested.

## 1. Introduction

Since the late 1970s, understanding of the atmospheric composition and chemistry has been enhanced with the availability of such space-borne data as obtained from the *total ozone mapping spectrometer* (TOMS) or the *stratospheric aerosol and gas experiment* (SAGE). These observations assume significance in the study of short-lived species and the action of man on the climate owing to their global high-frequency coverage of a few days. A 12 hour global coverage and 1°×1° horizontal resolution were already available on the *Tiros operational vertical sounder* (TOVS), operated since 1978 by the National Oceanic and Atmospheric Administration. This information is found to be useful (Chédin *et al*. 2002) for the study of greenhouse gases. The raw signal obtained for horizontal pixels by the instruments is generally a radiance in selected ultraviolet, visible or infrared channels with various geometries from nadir to limb. To retrieve the concentration distribution, the raw signal has to be assimilated. Several assimilation techniques have been proposed in the literature. The commonly used Bayesian technique is handled, for instance, to assimilate the data from the *measurements of pollution in the troposphere* (MOPITT; Deeter *et al*. 2003) and the *atmospheric infrared sounder* (AIRS; Engelen *et al*. 2004). An approach based on neural networks was used for data from AIRS (Crevoisier *et al*. 2004) and TOVS (Chédin *et al*. 2003). The merits and demerits of these techniques were discussed by Clerbaux *et al*. (1998) in the context of satellite meteorology.

The researchers have been involved in deriving the source of a tracer from measured concentrations (Penenko & Baklanov 2001). The availability of the space-borne observations opens new perspective to this problem (Arellano *et al*. 2004; Houweling *et al*. 2004; Pétron *et al*. 2004). Since the pioneering work of Rayner & O'Brien (2001), studies related to the computation of tracer sources from satellite data have been based upon the Bayesian technique (Tarantola 1987). A well-known limitation of the technique is the determination of the pivotal background error covariance matrix.

In the present paper a non-Bayesian approach is used to deal with the under-determination of the system arising in the inverse problem. As pointed out by Ustinov (2001), the treatment of the measurements is pursued without using the conventional cost function employed in most of the literature on assimilation. The main point in the approach advocated here is a *renormalization* of the functions adjoint to the measurements (Issartel 2005). No background statistics or background error covariance matrix is assumed. Indeed, *a priori* properties of the sought source are described by a *fundamental* geometry from which a *renormalized* geometry is derived in order to account for the visibility effectively provided by the observations. In its present form the theory applies to linear species, for instance carbon dioxide at the global scale and carbon monoxide or methane at the regional scale. Besides the quality of the reconstructions, the renormalization displays several advantages. The fundamental and renormalized geometries are easily realizable. The conceptual framework is self-consistent and allows one to efficiently address various issues related to the handling of a large amount of data.

The aim of this paper is to introduce an adjoint technique of inverse modelling for estimating the sources from the space-borne concentration measurements. For this purpose, a set of data is synthesized for an idealized satellite by column averaging of the concentrations generated from an atmospheric transport model for a Gaussian distribution of emissions at ground level. The synthetic data are used because (i) errors associated with the dispersion model and measurements at receptors are minimized and (ii) it is relatively easier to validate the inversion technique by comparing the source estimated with that originally prescribed.

Section 2 is a reminder about forward and backward atmospheric dispersion of the species. The Bayesian theory commonly used is examined in §3. Section 4 revisits the renormalization. The synthesized data are described in §5. The numerical experiments are presented in §6. The results are discussed in §7. Section 8 summarizes the conclusions of the study.

## 2. Forward and backward transport

Let be the intensity of the source, in unit amount of tracer released per unit mass of ambient air and per unit time, as a function of position and time *t* emitting into the atmosphere *Ω* during a period *T*. Here *x*, *y* and *z* are, respectively, the components along the zonal, meridional and vertical directions. We shall be concerned in this work by ground-level sources with an intensity in unit amount of tracer per unit area and per unit time, and these are related to volume sources as(2.1)where is the altitude of the ground at ; is the dry air density; and *δ* is the Dirac delta function.

The resulting distribution of the tracer in the atmosphere is often described in terms of concentration. However, it is appropriate to use the mixing ratio in place of the concentration for dealing with inverse transport modelling (Issartel & Baverel 2003). The mixing ratio is governed by the following advection–diffusion equation also denoted in operator form:(2.2)where is the wind; is the diffusion tensor; and *γ* is a decay coefficient. This equation is subjected to the appropriate initial and boundary conditions (Pudykiewicz 1998) in such a way that the linear operator becomes one-to-one and invertible.

When the source is known, it is easy to integrate equation (2.2) to compute *Χ* and deduce a set of finitely many mixing ratio measurements . Conversely, when a finite number of mixing ratio measurements is known, the purpose of assimilation is to determine a source *σ* from which they might have originated. In this, the calculation of the functions adjoint to the measurements is a preliminary step. Adjoint functions are necessarily defined through some scalar product. The following one, weighted by the air density, is particularly convenient from a computational point of view (Issartel & Baverel 2003):(2.3)To each measurement, we associate a sampling function *π* such that is the rate at which the air is sampled around the point per unit mass of ambient air and per unit time. In order to account for an observed mixing ratio, the mass of the sample should be normalized as . For example, in the case of a measurement at a point and time *t*_{i}, , the sampling function will take the form . The measurement in a general framework may be written as(2.4)This expression implies that the global mixing ratio of the sample is averaged from the mixing ratio of the contributions described by *π*_{i}.

For the adjoint function *r*_{i}, the sampling function *π*_{i} behaves like a source that would be spread back in time according to the following backward equation:(2.5)where the superscript ‘T’ denotes the transposition. The operator ^{*} adjoint to simply corresponds to a backward advection–diffusion equation. For this reason, the adjoint functions *r*_{i} defined for the scalar product (2.3) are called *retroplumes* (figure 1). An adjoint expression is obtained for the measurements (Marchuk 1992) directly in terms of the source *σ*, and this corresponds to the Lagrange identity ,(2.6)There will be infinitely many sources orthogonal to , and, thus, the real source cannot be retrieved out of a finite number of measurements, and the purpose of assimilation will be to construct a best estimate. Several approaches have been proposed in the literature to estimate a source from the observation vector . A common feature in all of them is the use of the adjoint functions.

## 3. Bayesian approach

Let be an unknown real source of a tracer with the corresponding initial guess *σ*_{b}, such that is the departure from it. *σ*_{b} is also known as the *background source term*, and Δ*σ*_{r} is the corresponding *background error*. Let be the measurements effectively observed to reduce this error, ** R** be the real observation operator linearly linking the measurements to the sources and

*R*_{m}be the corresponding operator associated with the transport model. The theory is traditionally formulated in a discrete form (Cohn 1997). Accordingly, a source

**represents a column vector of emissions in a vector space of finite dimension**

*σ**N*, the number of grid points in space and time. Since there are

*n*measurements, the observation matrix is given by(3.1)where

*r*

_{ij}is the value of the adjoint function associated with

*μ*

_{i}at the

*j*th grid point. Let be the

*representation*error (Lorenc 1986) incurred while adjoint functions

*r*

_{mi}are estimated from the backward integration of a numerical dispersion model, be the measurement error due to the detectors and be the probability distribution of the total error . Let be the value of the measurements calculated for the background term. The

*innovation*is the departure of the observation from this calculated value. Substituting , one obtains(3.2)Note that the system of equations in (3.2) is under-determined as the number of unknowns (

*N*) is larger than the number of equations (

*n*). The classical theory overcomes this problem by introducing

*a priori*statistical knowledge of the background error with a probability distribution . Once the innovation in equation (3.2) has been measured, a conditional probability distribution is introduced for the background error (Lorenc 1986) using Bayes' theorem (Papoulis 1984). Generally,

*p*

_{a}and

*p*

_{b}are assumed Gaussian with a measurement error covariance matrix

**(dimension**

*Q**n*×

*n*) and a background error covariance matrix

**(dimension**

*B**N*×

*N*). In this case,(3.3)where

*C*is a normalizing coefficient. In equation (3.3), the first term on the r.h.s. corresponds to the background information, whereas the second term represents the information from the measurements. The classical estimate is the most likely value of the background error minimizing (e.g. Rodgers 2000 eqns (2.44) and (2.45)),(3.4)where the matrix is often referred as the

*gain matrix*.

The major problem of the Bayesian approach is the determination of the background statistics. The background matrix ** B** is considered a physical property of a meteorological scenario. However, no description of its dependence on the meteorological field variables is ever provided. The theory just describes how a prior estimation is improved

*a posteriori*when successive sets of measurements are assimilated (e.g. Tarantola 1987, eqn. (7.85); Rodgers 2000, eqn. (4.7); Kalnay 2003, eqn. (5.6.9)). The difficulties arise in the evaluation of the background matrix

**, and researchers always make assumptions to simplify it.**

*B*** B** is commonly assumed diagonal or proportional to the identity matrix (Arellano

*et al*. 2004; Pétron

*et al*. 2004; Yumimoto & Uno 2006). To analyse the problem associated with this simplification, equation (3.4) is rewritten as(3.5)The estimate is a linear combination of . If

**is diagonal and is the scalar function read on its diagonal, .**

*B*Note that the retroplume is computed back in time as a concentration spread from inside the volume of receptor. If this volume is relatively small, the initial concentration is high. In the case of a point measurement characterized by the Dirac delta function, the volume is zero and the retroplume is initially singular. In the calculations with a transport model, the volume of receptor is at least one grid mesh, and the singularity is limited as a peak, especially if the resolution is low. Unless special precautions are taken, first, the diagonal coefficients of the *n*×*n* matrix become large or infinite (Issartel 2005), and, second, the peak or singularity is transmitted to . In practice, when the resolution is refined, is subjected to two competing tendencies. On the one hand, the coefficients *λ*_{i} go to zero, everywhere followed by except at the positions and times of the observations. On the other hand, at the positions and times of the observations, this tendency is balanced by the divergence of . Finally, this computation of Δ*σ*_{est} amounts to saying that the observed values are due to infinitesimal sources focused within the point detectors. We would like to stress that, in this, the covariance matrix ** Q** of the measurement errors plays no significant role as it is overcome by the divergence of . Of course, this dependence of on model resolution is irrelevant for assimilation purposes .

In case of a point measurement at , the difficulties are avoided only if . However, the choice of the function is never motivated by the removal of the peaks. Peaks are indeed observed by Houweling *et al*. (2004). They are illustrated in the present paper by the upper part of figures 6–8.

Owing to the importance of the background term in defining the background error covariance matrix, the Bayesian assimilation is formulated in terms of departures denoted with a symbol Δ. This formulation is not maintained in §4. In view of the difficulties associated with the Bayesian hypothesis of background statistics, we adopt a renormalization approach.

## 4. Renormalization

The present approach of assimilation relies on the specification of a vector space of *acceptable* sources among which an estimate will be sought for the real unknown source, and this vector space is associated with a scalar product called the *fundamental product*. The space of acceptable functions and the fundamental product jointly constitute the physically relevant frame accounting for a general physical, biological or social knowledge of the system.

In this study the acceptable sources are the ground-level sources defined on *Σ*×*T* in which *Σ* is the projection of the atmospheric domain on the *x*–*y* plane. This vector space is associated with the following fundamental product:(4.1)We wish to make the following remarks about this choice of fundamental product.

First, in the integral appearing in equation (4.1), all portions of the surface and time are equally weighted implying that no region is given a preference. Otherwise, a product should be used with weights *w*.

Second, the fundamental product (4.1) is defined by a simple integral over *Σ*×*T*. Hence, the releases from the sought source at any two different positions and times are regarded as uncorrelated. Such correlations would have been described using a product defined by a double integral with a kernel *k*. Uncorrelated releases from different positions and times further imply that no particular smoothness is expected from the sought source.

Substituting equation (2.1) in equation (2.4), one obtains(4.2)This expression shows that for the fundamental product , the function adjoint to the measurement *μ*_{i} corresponds to the retroplume evaluated at ground level. The measurement *μ*_{i} is the product through of a source with .

As a consequence, the sources fitting a given measurement vector are described as , where is (,)_{1}-orthogonal to , and *s*_{‖} is a linear combination of . The coefficients of the combination are obtained from the equations ,(4.3)where is the column vector of elements . Among the sources fitting , *s*_{‖} is the one with least norm (*s*_{‖}, *s*_{‖})_{1}. The estimation of the source using equation (4.3) has been discussed by Issartel (2005). It was shown that *s*_{‖} contains artefacts in the form of peaks having an increasing influence for improved resolution of the model. As explained in §3 these peaks correspond to a singularity of the retroplumes associated with the point measurements.

The artefacts are removed through a process called *renormalization* which amounts to optimizing a weighting function , with total weight , in the following transformation of the fundamental product into a weighted product:(4.4)The least *f*-norm estimate fitting the measurements is(4.5)We deduce from the definition of that . Taking the trace on each side of this equation, we obtain a simple identity,(4.6)On the r.h.s., *n* is the number of observed pieces of information. Note that *f* is chosen positive and the matrix is positive definite because *H*_{f} is a Gram matrix (e.g. Boseck 1967). Thus, on the l.h.s. of equation (4.6), the integrand is non-negative and vanishes exactly at points not seen by any of the receptors, i.e. . This integrand may be regarded as a density describing the distribution of information associated with the weights *f*. As such it is called *illumination* and denoted as *E*_{f},(4.7)On reconsidering equation (4.6), we note that the integrand on the l.h.s. may be addressed in two ways: (i) If the integral is considered from the point of view of the fundamental geometry, the integrand is indeed the illumination *E*_{f} (equation (4.7)). (ii) The integral may be considered alternatively from the point of view of the renormalized geometry, in which the function *f* belongs to the elementary volume and the integrand becomes . An optimum function *φ*, best at removing inversion artefacts, is obtained from the alternative interpretation of the integrand requiring that the density of information be homogeneous from the point of view of the weighted geometry (Issartel 2005),(4.8)for all (*x*, *y*, *t*) in This optimal function *φ* is calculated as the converged value of *f*_{k} from the following iterative scheme:(4.9)We observe that a maximum nine iterations are adequate to obtain *φ* satisfying equation (4.8) within 1% of relative error,(4.10)Once the optimum function *φ* is obtained, the unknown source is estimated as from equation (4.5) for a given set of measurements.

We hasten to point out that in the Bayesian approach, the evaluation of the background matrix ** B** is updated at successive steps of assimilation (Tarantola 1987). However, in the present approach, the renormalized product is not an updated version of the fundamental product.

## 5. Synthetic satellite data

The space-borne observations are increasingly aimed at investigating the natural and anthropogenic sources of greenhouse gases and pollutants. However, observations are directly related to the mixing ratio of the species rather than their emission sources. The validation of the assimilation technique used to retrieve the sources is not easy. Ideally, the validation of the source estimated from inversion is expected against the real source. In such a validation, the use of real data is not feasible as (i) the real source, in principle, is not known, and (ii) the noises in the real data will affect the estimation making the interpretation relatively difficult. The use of synthetic data is an alternative to overcome these difficulties. The synthetic data are generated for a prescribed source using an atmospheric transport model. This helps in validating the inversion technique by comparing the source estimated with that originally prescribed. In addition, the synthetic data minimize the errors associated with the transport model and measurements by the sensor.

In order to design an appropriate set of synthetic data, we first briefly describe the characteristic features of the sensors mounted on existing satellite platforms. The National Aeronautics and Space Administration's instrument MOPITT (Warner *et al*. 2001) on the Sun synchronous platform TERRA has been observing carbon monoxide and methane since 1999. The solar channels retrieve column amounts with an accuracy of 10% for carbon monoxide and 1% for methane. The horizontal resolution is 22×22 km at the nadir. The swath of 640 km provides a global coverage time at the equator of about 3 days. Since 2002, on board the European Space Agency's Sun synchronous platform *environmental satellite* (ENVISAT), ozone, nitrogen monoxide and nitrogen dioxide have been monitored by the instrument *scanning imaging absorption spectrometer for atmospheric chartography* (SCIAMACHY; Bovensmann *et al*. 1999). Two modes, nadir and limb, are operated along its 960 km swath. They provide a horizontal resolution of 60×30 km and a vertical resolution of 3 km. The global coverage frequency is 6 days at the equator and less at higher latitudes. Similar or better performances are planned for the next generation instruments (Bovensmann *et al*. 2004; Crisp *et al*. 2004) with horizontal resolution of a few square kilometres and a regional or global coverage frequency of less than 1 hour. The vertical resolution is not expected to be finer than 2 km in the satellites presently operating as well as planned for the near future.

In the present study a ground-level source *s*_{G} (figure 2*a*) of strength *M* is prescribed in terms of a Gaussian distribution function given by(5.1)in which *l* is the distance from the centre point (*x*_{0}, *y*_{0}); *t* is the time measured with reference to a time *t*_{0}; and *L* and *τ* are the standard deviations in space and time.

Let be the field of mixing ratio modelled for the source *s*_{G}. The synthetic data are prepared as a mixing ratio averaged vertically up to a height *H* at a point (*x*_{α}, *y*_{β}) and time *t*_{θ},(5.2)The corresponding sampling function is(5.3)in which the function has value 1 in the interval [0,*H*] and 0 elsewhere.

In order to capture the expected variability of the source, a collection of *n* points (*x*_{α}, *y*_{β}) at times *t*_{θ} will be chosen with integral labels in such a way that and are reasonably small compared with *L* and compared with *τ*. To be consistent with the theoretical framework, the measurements will also be referred as *μ*_{i} or jointly in column vector form as .

The features of an idealized satellite are chosen consistently with those in operation currently. The satellite is assumed to observe an area of 8.5°×8.5° within a swath of 940 km. The observations are assumed to be generated six hourly, which is in the range of frequency available with the current and next generation satellites. The resolution is taken as 0.5°×0.5° comparable with the horizontal resolutions in the existing space-borne sensors. However, for the sensitivity analysis, a finer resolution of 0.25°×0.25° has also been taken.

The source is prescribed with the same centre as the observed window and a standard deviation *L* in space of 220 km (2° latitude). Thus, the source is entirely embedded within the observed window, and the scale of its variations is larger than the resolution of the data. The standard deviation *τ* in time is 6 hours, so that its complete evolution may be captured by five successive observations at an interval of 6 hours in which the third one is referred to as *t*_{0}. The total amount *M* of tracer released is taken as 1000 units.

The observational window corresponding to the Indian region with centre (77° E, 17° N) in the state of Maharashtra is taken as an area of 8.5°×8.5° extending between 72.75° E and 81.25° E, and 12.75° N and 21.25° N.

The computations from dispersion model POLAIR3D (Boutahar *et al*. 2003; Mallet & Sportisse 2004) are carried out for horizontal resolution 0.5°×0.5°, assumed to be the same as that of the satellite. Similarly for the sensitivity analysis, the resolution for both model and satellite is taken as 0.25°×0.25° for a window of 8.25°×8.25° between 72.875° E and 81.125° E, and 12.875° N and 21.125° N. Fourteen vertical layers used in the model above the surface are 64, 176, 416, 716, 1061, 1446, 1861, 2301, 2761, 3236, 3726, 4231, 4756, 5296 and 5761 m. The time step is 15 min and *H* is taken as 5296 m. The meteorological fields, reanalysed by the European Centre for Medium-range Weather Forecast (ECMWF), were kindly provided by Météo France. The vertical eddy diffusivity is parametrized according to Louis (1979). The horizontal eddy diffusivity is assumed constant with value 20 000 m^{2} s^{−1} and decay coefficient *γ* is taken as zero. For the computation of the retroplumes by the dispersion model, the domain between 64° E and 91° E, and 6° N and 31° N containing the observation window is considered and the model is run for a period of 84 hours between 8 January/July 2004, 00 universal time (UT) and 11 January/July, 12 UT. The total number of observations in five successive images is 17^{2}×5=1445 and 33^{2}×5=5445 with the resolution 0.5°×0.5° and 0.25°×0.25°, respectively.

The time *t*_{0} for peak in equation (5.1) is taken as 11 January/July 2004, 00 UT. In order to analyse the influence of the meteorological conditions on inversion, two contrast meteorological patterns are used, in January and July, representative of winter and monsoon season.

In January, the winds close to the surface (figure 2*b*(i)) are westward with a speed less than 3 m s^{−1} associated with stable atmospheric conditions during the nights. Owing to the low winds and stable conditions, the diffusion of the species is relatively weak. On the other hand, in July, the monsoon winds close to the surface are eastward (figure 2*b*(ii)) with a speed varying from 5 to 10 m s^{−1} over land and 10 to 20 m s^{−1} over the ocean. When compared with the month of January, unstable conditions are observed for a longer period implying the stronger diffusion. The vertically averaged mixing ratios produced by the source *s*_{G} are represented in figure 3.

## 6. Numerical experiments

As mentioned in §5 the total numbers of synthetic observations generated for five successive images at the interval of 6 hours are *n*_{t}=1445 and 5445 for a resolution of 0.5°×0.5° and 0.25°×0.25°, respectively. The assimilation of these *n*_{t} values would require a significant computational time and memory as equation (4.9) is solved iteratively to obtain *φ*, in which each matrix involves the computation of elements. Even the computation of the single matrix *H*_{1} (equation (4.3)) for the 1445 data corresponding to the low resolution is not feasible. Researchers are involved in performing studies at various scales from global to local. The resolution appropriate to the local scale is taken for the measurements from the space-borne sensors leading to massive data for the analysis at regional scales. In general, a limited finite number of observations can be handled in the inverse problem. Thus, in order to deal with the massive set of data, the preprocessing is required to extract a manageable data set representative of whole data. Here, we have adopted the preprocessing based on (i) sorting of limited data, and (ii) aggregation of whole data into a limited number of averaged values.

The sequence of the calculations in the inversion is described in figure 4. The calculations are essentially carried out in three steps: (i) The computation of the retroplumes associated with the preprocessed pixel or aggregated observations based on the prevailing meteorological conditions using the transport model POLAIR3D. The retroplumes are stored hourly for the lowest layer only. (ii) The generation of the synthetic measurements associated with the source *s*_{G}. (iii) The inversion of the measurements which, in turn, involves the computation of (*a*) non-renormalized estimate from equation (4.3), (*b*) *φ* as converged value of *f*_{k} in equation (4.9), and (*c*) renormalized estimate from equation (4.5) in which *f* is taken as *φ*.

The organization of the numerical experiments is summarized in table 1. They fall broadly in five categories, schematically described in figure 5 with the corresponding results shown in figures 6–10. These essentially differ on the horizontal resolution of the dispersion model, the preprocessing approach used to reduce the set of data, the effective resolution of the reduced set of data and the influence of partial availability of the observations due to various reasons such as the presence of clouds. In each category, the simulations are carried out for two months, January and July, representing the winter and monsoon seasons, for both without and with renormalization procedure. The resolution for the dispersion model is the same as for the satellite observations.

In table 1, the numbers of satellite observations selected are 25 and 89 for each fully visible image corresponding to the preprocessed resolution of 2°×2° and 1°×1°, respectively, and, accordingly, the dimensions for the inverse problem are 125 and 445 based on five images taken successively at an interval of 6 hours. In the cases corresponding to figures 6–8, part of the pixels are selected and the contribution from the neighbouring pixels is ignored. Thus, the information inverted is selected inhomogeneously. In the cases represented in figures 9 and 10, the information from the neighbouring pixels is aggregated with weights normally distributed around the previously sorted pixels. The standard deviation of the weights is chosen as 110 km or 1° of longitude. In the last case, the observational window is divided into two little windows by a meridional band, 2.5° wide, considered invisible. The pixels in the invisible band are ignored and the dimension of the inverse problem reduces to 100. The times required for the computation of each retroplume and inversion are given in table 1. The total time in each experiment is calculated as the sum of (i) time in generating the retroplumes from the dispersion model, and (ii) time in the inversion.

We stress that the non-renormalized inversion coincides with the classical Bayesian inversion under the hypotheses that

there are no measurement and model errors and

=*Q***0**,the background errors are purely superficial, evenly distributed and uncorrelated:

reduces to an identity matrix for errors at ground level, so that in equation (3.5) coincides with the non-renormalized*B**H*_{1}of equation (4.3), andthe background state is chosen as reference zero, so that coincides with .

Under these hypotheses, the Bayesian estimate given by the expression (3.5) coincides with the non-renormalized estimate of the expression (4.3).

## 7. Results and discussion

In this paper we investigate the feasibility of retrieving the source of a tracer using the concentrations observed by satellite. Using appropriate techniques of inverse modelling, it is possible to optimize the resolution expected from the inversion of the data within the available computational time. The numerically expensive renormalization is shown to be theoretically relevant and its practical utility is established in the case of partial visibility. The smooth aggregation reduces significantly the amount of data handled in the inversion. The computations highlight the ability of this theory to retrieve the source of the tracer. A homogeneous illumination indicates that the information detected is appropriately distributed through space and time.

The source estimated for the month of July without and with renormalization is seen in figure 6*b*,*d*, respectively. The non-renormalized releases are underestimated and strongly influenced by the geometry of the observations around the 125 pixel data (figure 6*b*(iii),(iv)). Three hundred and forty two units of tracer were originally released between *t*_{0}−6 and *t*_{0} hours (figure 2*a*(iii)). The amount retrieved was 116 units (figure 6*b*(iii)) and 242 units (figure 6*d*(iii)) without and with renormalization procedure, respectively. The underestimation in the non-renormalized case is also visible in figure 6*b*(vi),*d*(vi), where the estimated variation of the total rate of release is compared with the profile theoretically prescribed. These results are consistent with the interpretation of the illumination as a distribution of information. The non-renormalized illumination *E*_{1}, integrated between *t*_{0}−6 and *t*_{0} hours (figure 6*b*(vii)), is inhomogeneous. On the other hand, the renormalized illumination *E*_{φ}=*φ* is relatively more homogeneous (figure 6*d*(vii)). Without renormalization, the information is focused more on the data points, and it is transmitted to the neighbouring points with renormalization resulting in the homogeneous distribution of the estimated releases. Similar behaviour is observed during the month of January (figure 6*a*,*c*).

### (a) Sensitivity analysis

The retrieval of the source depends on various parameters such as the resolution of the data available from the satellite (and used here as model resolution), the lower resolution of the data effectively used for inversion and the method chosen to reduce the amount of data from satellite observations. We carry out a sensitivity analysis to infer the influence of each of these on the source estimated.

#### (i) Model resolution

We consider the resolution in the model as 0.25°×0.25° instead of 0.5°×0.5°. However, we are still retaining 125 pixel data (figure 5, diagram 7) for computational purposes. The comparison of figure 7*b*(vi) with figure 6*b*(vi) shows that the better resolution considerably degrades the underestimation by the non-renormalized estimate. Indeed, of the 342 units of tracer originally released between *t*_{0}−6 and *t*_{0} hours, 75 units (figure 7*b*(iii)) are retrieved now in place of 116 units (figure 6*b*(iii)) by refining the resolution from 0.5°×0.5° to 0.25°×0.25°. On the other hand, the renormalized inversion is stable with the high resolution as the amount retrieved is 247 units (figure 7*d*(iii)), almost the same as that retrieved (242 units; figure 6*d*(iii)) with the model resolution of 0.5°×0.5°. When the resolution is refined, the non-renormalized illumination becomes singular at the data points (figures 6*b*(vii) and 7*b*(vii)). The same trend is observed in the evolution of the total illumination with time (figures 6*b*(vi) and 7*b*(vi)). On the contrary, the renormalized illuminations are similar in low (figure 6*d*(vi),*d*(vii)) and high (figure 7*d*(vi),*d*(vii)) model resolutions. With the increased resolution of the model, the source retrieved by the process of renormalization is stable, whereas artefacts are introduced in the retrieval without renormalization. Thus, this signifies the importance of the renormalization, proposed here, in the retrieval of the source.

#### (ii) Resolution of the extracted data

In figure 6, the 125 data used for inversion are extracted from the raw satellite data with a resolution of 2°×2°. Now we analyse the influence on the retrieval of the source of refining the resolution of the extraction from 2°×2° to 1°×1°, increasing the data points from 125 to 445 (figure 5, diagram 8), corresponding to 25 and 89 points in each image. However, the resolution in the model is still taken as 0.5°×0.5°. The source estimated from the increased number of data compares better with that prescribed (figure 8*b*,*d*). Between *t*_{0}−6 and *t*_{0} hours, 226 units (figure 8*b*(iii)) and 322 units (figure 8*d*(iii)) of tracer are now respectively retrieved without and with the renormalization procedure. We observe that without renormalization, the releases are relatively underestimated and focused at the data points. In spite of this improved quality of inversion, the comparison of the non-renormalized and renormalized estimates follows the features pointed out earlier. The computational cost involved in the inversion grows with the square of the number of data. The cost of the inversion is larger for 445 data without the renormalization procedure than for 125 data with renormalization (table 1). Further, the improvement of the non-renormalized estimate will depend on the model resolution.

#### (iii) Aggregation of the data

We consider the same model resolution of 0.5°×0.5° as in figure 6, with the same number of 125 data, extracted with the help of the neighbouring points using the Gaussian distribution as weights with a standard deviation of 110 km (figure 5, diagram 9). Between *t*_{0}−6 and *t*_{0} hours, 296 units (figure 9*b*(iii)) and 300 units (figure 9*d*(iii)) are respectively retrieved without and with the renormalization in the month of July. The estimated evolution of the releases with time is significantly improved too (figures 9*b*(vi),*d*(vi) and 6*b*(vi),*d*(vi)). Renormalized and non-renormalized estimates are reasonably close to each other and compare well with the source prescribed. The estimates obtained in this case are better and obtained at less computational cost than those calculated using 445 pixel data (figure 8*b*(vi),*d*(vi)). The computational time required with the 125 data based on Gaussian samples is almost the same as that for the same number of data in figure 6. Thus, the computational cost in the inversion is driven by the number of data rather than the way the data have been extracted. When the Gaussian samples are used, the illumination, either renormalized or not, is homogeneously distributed in space (figure 9*b*(vii),*d*(vii)) and time (figure 9*b*(vi),*d*(vi)). An evenly distributed illumination is apparently a criterion for the optimum design of the monitoring network. We wish to point out that with smooth aggregation of data, the renormalization procedure may not be needed. This is because as the illumination is evenly distributed through the region and interval of interest, the renormalizing equation *E*_{φ}=*φ* is almost achieved by the fundamental geometry corresponding to (up to a constant factor) *φ*=1 and .

#### (iv) Partial visibility

Keeping in view the advantages associated with the aggregation of data, we consider the same approach excluding a region, in the centre of each image, where most of the tracer was released. This excluded region physically corresponds to a band invisible to the satellite due to a technical problem or cloud cover. The number of data extracted is 100 based on the fact that 20 points are chosen in each image (figure 5, diagram 10). The averaging of the observed mixing ratios around each point is restricted in the visible area. It is expected that the inversion will retrieve the source in the invisible area. We observe that when the central band is invisible, the estimated intensity of the releases reduces slightly compared with the case of complete visibility without and with renormalization. For example, without renormalization between *t*_{0}−6 and *t*_{0} hours, 296 units are retrieved in the case of complete visibility (figure 9*b*(iii)) and 274 units in the case of partial visibility (figure 10*b*(iii)) in the month of July. With the renormalization, 300 units (figure 9*d*(iii)) and 296 units (figure 10*d*(iii)) are retrieved with complete and partial visibility, respectively. The problem with the non-renormalized estimation is not so much in the intensity of the estimated releases as with their spatial distribution. The source estimated without the renormalization procedure is split into two parts, the main one downwind of the invisible region, implying left to the central point in the month of January (figure 10*a*(iv),(v)), and right in the month of July (figure 10*b*(iv),(v)). The maximum intensity of the non-renormalized emissions is also shifted downwind of its prescribed position (figure 10*b*(iii),(iv)). The wind blowing from the left drives the tracer to the right in the month of January. As a result, relatively clean air is seen in the left visible window and an increased concentration through the right visible window. The non-renormalized inversion leads to the incorrect conclusion that the tracer was emitted within the window where it was seen. However, this artefact gets minimized in the case of renormalized estimates (figure 10*c*,*d*).

#### (v) Seasonal variations

For analysing the seasonal influence on the retrieval of the sources, we have considered the months of January and July, corresponding to winter and monsoon seasons, respectively. The wind speed and mixing are relatively lower in the month of January (§5). For the month of July, the amount retrieved is relatively greater and closer to that prescribed. Similar behaviour is observed in all the cases (figures 6–10) with and without renormalization. This is justifiable in view of the fact that the transport of the tracer is enhanced due to relatively strong winds and mixing in the month of July compared with those in January. The seasonal variability plays a key role in transporting the tracer between the emissions and the receptors, and, thus, the retrieval of the source will be significantly affected by the meteorological conditions.

### (b) Assumptions and limitations

We point out the following limitations of the present study.

The work is based upon synthetic data, so that the retrieval of the source is not affected by the uncertainties associated with the concentration measurements. The theory proposed here needs to be validated further with the availability of real data. The errors associated with the measurement of real observations are expected to influence the accuracy of the retrieval. In this study we have used the column average data corresponding to the techniques commonly used in the operational satellites. Efforts are being made to launch satellites with vertical resolution of the concentration measurements. Thus, it is desirable to evolve the appropriate techniques for inverting such profile data. This will help in minimizing the errors involved in the process of inversion.

We have considered the inversion of satellite data from a limited area and for a limited interval of time, and the source of tracer to be retrieved was fully contained in this area and interval. However, the space-borne observations often provide regional or global coverage on a time scale varying from days to years. Such data could help in retrieving the global emissions of carbon dioxide (CO

_{2}), methane (CH_{4}), aerosols, etc. In principle, the theory is even applicable for retrieving the sources on a global scale. The theory proposed here carries out inversion by considering all data at once; however, due to the enormous amount of data, the computation time involved in this approach restricts its applicability at the global scale. The alternative techniques might be based on decomposing the global coverage into a finite number of regions and the time into a finite number of sub-intervals, such that the inversion can be performed in each sub-region with a feasible amount of data. In this approach, the issue related to assemblage of all these partial inversions may be further explored.The inversion technique proposed here is based on the assumption that the relationship between the source and receptors is linear. In other words, this relationship is characterized by a linear partial differential equation describing the dispersion of a tracer. This assumption is valid as long as either the tracer is chemically inert or its removal is linear.

The theory proposed here is applicable to both ground and elevated sources. However, it is applied to the ground-level sources by redefining the inner product in equation (4.1) from equations (2.6) and (2.1). The time required for computing the retroplumes (equation (2.5)) is the same for ground and elevated sources. However, for the elevated sources, one needs to consider the vertical description of the atmosphere in the inversion process increasing the computational time in proportion to the number of vertical layers. In addition, the information collected from the measurements would be diluted in this larger domain, so that the source retrieved would be less satisfactory in comparison with that at ground level. In general, the nature of the source is not known

*a priori*, and we expect that both ground and elevated sources have contributed to the measurements. In this paper we have restricted our analysis to ground-level emissions.The issues related to the theory of renormalization as well as to that commonly used may be addressed based on the concept of

*a priori knowledge*developed differently in each case leading to distinct mathematical structures.Classically

*a priori*knowledge is of a statistical nature (e.g. Cohn 1997). It takes the form of a background error covariance matrixaccounting for the statistical behaviour of the differences between the real source and its first guess. The form of*B*is subjected to the known physical properties of the sought source. Its evaluation is, however, never explicit, and it is apparently a theoretical requirement to improve it by inferring the above statistics during successive steps of assimilation. To facilitate comparison with the alternative approach, it must be stressed that the part of the cost function corresponding to*B*(equation (3.3)) is a scalar product .*B*In the theory of renormalization,

*a priori*knowledge is based on a set of assumptions reflecting the physical properties that the sought source is known to follow. For instance, the source is emitted continuously from the surface over land rather than the ocean. These assumptions are accounted for by a vector space of acceptable sources associated with a fundamental product (,)_{1}. This approach is not statistical in nature. The acceptable functions and fundamental product are explicitly deduced from the physical assumptions only, once and for all, independent of the measurements. In addition, a concept of visibility is introduced to describe the regions well or poorly observed. This visibility depends on the number, position and time of the observations and on the prevailing meteorological conditions. The concept of visibility is expressed in terms of a renormalizing function*φ*equivalent to a scalar product (equation (4.4)), and*φ*is fully computable (equations (4.8) and (4.9)). Thus, the approach based on the classical theory involves the uncertainties associated with the evaluation of the background matrix, whereas such approximations are avoided in the present technique of renormalization.*B*The non-renormalized inversions shown here subject to artefacts have been related in §5 to the Bayesian framework under the conditions that there are no measurement errors (

=*Q***0**) and the background errors are purely superficial, evenly distributed and uncorrelated. The discussion in §3 suggests that a diagonalwould always be subjected to similar artefacts. In order to minimize the artefacts, it appears that*B*may not be diagonal, and, thus, further studies are required for the case of a non-diagonal background matrix.*B*

## 8. Conclusions

An assimilative technique is described here to retrieve the sources from a set of concentration measurements. A time-dependent three-dimensional transport model POLAIR3D is used for computing the corresponding retroplumes. For this purpose, a set of artificial or synthetic column-averaged measurements for an idealized satellite corresponding to a prescribed ground-level source is chosen. Since it is not feasible to handle such a large volume of data in the inversion due to the time involved in the computation of the matrices, a preprocessing is carried out to extract the manageable data set as a representative of whole data. A sensitivity analysis is carried out to infer the influence on the retrieval of the source with respect to several parameters: resolution of satellite data; resolution of preprocessed data; preprocessing method; complete or partial visibility of the region; and prevailing meteorological conditions. For this study a region over India was chosen and the inversion was done for a period of 3 days in each month of January and July as being representative of winter and monsoon season, respectively. The retrieval of the source was done with and without renormalization procedure. The source estimated from the inversion was compared in each case with that originally prescribed.

Considering a limited number of observations, it is shown that the emissions are underestimated without and with the renormalization procedure. The degree of underestimation is relatively more with non-renormalized estimates. The non-renormalized estimate is degraded further by a refined resolution of the model, whereas the renormalized estimate is not altered appreciably. The improvement in the retrieval of the source obtained for an increased number of observational points is not found to be cost effective. The preprocessing based on the aggregation of data significantly improves the estimation of emissions with and without renormalization. In fact we are able to retrieve 75% of the emissions in the month of January and 90% in the month of July. When the source lies in a region not completely visible from the satellite, the aggregation along with the renormalization procedure is shown to provide a reasonable estimation of the source. This leads to the conclusion that an evenly distributed illumination function is a quantitative criterion that a monitoring system is well designed. The time taken for the inversion is significantly higher in the case of renormalization. It is concluded that with aggregation of data associated with complete visibility, renormalization may be avoided. In addition, seasonal variations are found to affect significantly the estimation of emissions.

The above results have been obtained, however, under the assumptions that the source is at ground level, the dispersion of the tracer is linear and the use of synthetic data is free from noises. Thus, it is proposed that we validate the inversion technique described in the present manuscript with the availability of the real data.

## Acknowledgments

This work is partially supported by travel grants provided by the Cultural and Scientific Service, French Embassy, New Delhi. The authors wish to express their thanks to Prof. Bruno Sportisse for his continuous support and encouragement. The first author also wishes to acknowledge the support and encouragement received from Ingénieur Général Norbert Fargère and Mr Christophe Pezron.

## Footnotes

- Received February 21, 2007.
- Accepted May 25, 2007.

- © 2007 The Royal Society