## Abstract

The use of T-rays, or terahertz radiation, to identify substances by their spectroscopic fingerprints is a rapidly moving field. The dominant approach is presently terahertz time-domain spectroscopy. However, a key problem is that ambient water vapour is ubiquitous and the consequent water absorption distorts the T-ray pulses. Water molecules in the gas phase selectively absorb incident T-rays at discrete frequencies corresponding to their molecular rotational transitions. When T-rays propagate through an atmosphere, this results in prominent resonances spread over the T-ray spectrum; furthermore, in the time domain, fluctuations after the main pulse are observed in the T-ray signal. These effects are generally undesired, since they may mask critical spectroscopic data. So, ambient water vapour is commonly removed from the T-ray path by using a closed chamber during the measurement. Yet, in some applications, a closed chamber is not always feasible. This situation, therefore, motivates the need for an optional alternative method for reducing these unwanted artefacts. This paper represents a study on a computational means that is a step towards addressing the problem arising from water vapour absorption over a moderate propagation distance. Initially, the complex frequency response of water vapour is modelled from a spectroscopic catalogue. Using a deconvolution technique, together with fine tuning of the strength of each resonance, parts of the water vapour response are removed from a measured T-ray signal, with minimal signal distortion, thus providing experimental validation of the technique.

## 1. Introduction

T-rays lie between the infrared and millimetre wave parts of the spectrum, centred around the so-called ‘terahertz gap’ and can be defined as occupying 0.1–10 THz (Chamberlain 2004; Abbott & Zhang 2007). Emerging applications in non-invasive sensing are motivated by the fact that many molecules contain a rich set of characteristic resonances in this frequency range (Crowe *et al*. 2004; Fischer *et al*. 2007). The applications typically rely on femtosecond laser-based terahertz time-domain spectroscopy systems.

Terahertz time-domain spectroscopy (THz-TDS) has received much attention from researchers due to its decent spectroscopic performance in the terahertz regime (Mittleman *et al*. 1999; Withayachumnankul *et al*. 2007*b*). With the assistance of ultrafast femtosecond laser technology, a THz system generates and detects a short pulse, which has a corresponding frequency range spanning a few hundred gigahertz to a few terahertz, or more. This frequency range has a relatively high black body radiation background, prohibiting a high signal-to-noise ratio (SNR) detection with a conventional detector. However, this problem is addressed by adopting a coherent detection scheme.

In the T-ray frequency region, many polar gases of general interest possess unique rotational transition energies, which give rise to spectral resonances (Mittleman *et al*. 1998). Owing to these unique fingerprints, THz-TDS proves useful for gas classification and recognition. This property, on the other hand, affects the THz-TDS capability in an open air setting, in which water vapour is ubiquitous.

Water vapour, the third most abundant gas in nature (Tripoli 2003), is known to have numerous rotational resonances in the T-ray band (van Exter *et al*. 1989). T-ray spectroscopy of a sample, in open air, therefore often results in a combination of the sample's spectral features and water vapour resonances in the frequency domain. In the time domain, this results in field fluctuations after the main T-ray pulse. Mostly, these effects are undesirable, since they can obscure spectroscopic results of interest.

These water vapour effects are easily removable during the measurement by purging an enclosed T-ray path with dry air or a non-polar gas such as nitrogen, which does not have transition energy levels in the T-ray regime (Grischkowsky *et al*. 1990); alternatively, a vacuum is sometimes used. In some applications, it is not always possible to enclose the entire T-ray beam path, for example, in applications where stand-off detection is required (Federici *et al*. 2005; Kemp *et al*. 2006). The effects can be partially cancelled out by calculation, if two measurements, one for the sample and the other for the reference, are available. Not all situations can provide both measurements and thus this motivates the need for a numerical method, to alleviate the effect of water vapour on the data (Kemp *et al*. 2006).

Knowledge of the resonance characteristics in the T-ray frequency range allows a numerical estimation of the complex response of water vapour. Theoretically, the estimated complex response can be deconvolved directly from a received T-ray signal. But pragmatically, fitting of a modelled complex response to a measured response is complicated by many factors, e.g. the limited dynamic range (Jepsen & Fischer 2005), limited frequency resolution (Xu *et al*. 2003) and measurement uncertainties (Withayachumnankul *et al*. 2007*a*, in press). Besides, the exact model requires precise knowledge of geometric and atmospheric conditions during the measurement. Moreover, if direct deconvolution is carried out, there is no measure to validate the result.

Fine tuning the strengths of complex resonances based on a brute-force search is introduced in this work. Each accurately modelled resonance is tuned in magnitude within a predefined range and then deconvolved from a measured signal. A tuning criterion is met when the ratio of the fluctuation energy to the main pulse energy of an adjusted signal is minimized. Repeatedly tuning the strength line-by-line ultimately results in the mitigation of water vapour-induced effects. Constrained by the condition of minimum fluctuation energy, the algorithm always provides optimum results.

The proposed algorithm, based on signal processing, has merits in that (i) it is sufficiently general that the spectral response of a sample is minimally affected in a number of cases of practical interest and (ii) the exact measurement conditions, including humidity, temperature, pressure and propagation length, are not crucial in order to remove the water vapour response.

This paper is organized as follows. Section 2 elaborates the effects of water vapour on T-ray signals and spectra. In §3, the model characteristics of each complex resonance is determined and validated with a measurement. Section 4 considers the entire THz-TDS measurement as a system with several complex responses, and the direct deconvolution method using a modelled water vapour response is analysed to enable comparison with our method. In §5, the fine-tuning algorithm for the removal of water vapour effects is introduced. This algorithm is verified by experiments in §6, followed by discussions and a conclusion in §7.

## 2. Effects of water vapour on pulsed T-ray signals

Water vapour contains many modes, and the details of its interaction with an electromagnetic wave depend on the frequency. Water vapour exhibits pure rotational modes of energy transitions spanning the millimetre wave to mid-infrared, or approximately 3 GHz–19.5 THz (Kemp *et al*. 1978; Bernath 2002). In the infrared region, both pure rotational line transitions and rotation–vibration bands are noticeable (Gasster *et al*. 1988). These transitions cause absorption and re-emission of wave energy in a number of narrow frequency bands, unique to water molecules.

Figure 1 shows T-ray signals and spectra measured by THz-TDS under nitrogen and water vapour atmospheres at ambient temperature and pressure. In the time domain, the signal recorded in the presence of water vapour undergoes long fluctuations after the main pulse. This is due to the energy re-emission by rotational transitions of water molecules. In the frequency domain, the water vapour causes several sharp resonances at discrete frequencies, as a result of quantized rotational transition energies. In terms of Fourier theory, the pair of time-domain fluctuations and frequency-domain resonances is based on the principle that a sharp feature in one domain is related to a broad feature in the other domain.

In addition to the vapour-induced fluctuations in the time-domain signal and the resonance effects in the frequency domain, the analysis shows that the T-ray energy loss by water absorption calculated between 0.0 and 4.0 THz is as high as 10% for the spectrum in figure 1. The loss is probably due to the non-directional energy re-emission of water molecules. In the time domain, the ratio of the main pulse energy to the tail (vapour-induced fluctuation) energy is calculated for both of the time-domain signals (for more details on the calculation, see §5*c*). The energy ratio for the nitrogen measurement is 429.98, whereas the ratio for the water vapour measurement is 18.82. This measure will be used later on to construct a criterion for fluctuation removal.

## 3. Model of water vapour absorption and dispersion

It is required that the water vapour response be modelled appropriately before a further step in water vapour removal is taken. Modelling of absorption and dispersion for a rotational resonance involves selecting a suitable lineshape and calculating the line position, line strength and linewidth. Fortunately, the line position and line strength are, only to a small extent, dependent on atmospheric conditions, i.e. the line position is constant at low pressures (Townes & Schawlow 1955) and the peak absorption is not a function of the pressure (Townes & Schawlow 1955; Harde *et al*. 1997). In fact, the line position and line strength have been well parametrized in many publications. Thus, these two parameters are readily available, leaving only the linewidth and lineshape to be determined. In the following subsections, these parameters are discussed in detail.

### (a) Line strength and line position

Water vapour rotational resonances in the T-ray regime have been extensively measured and analysed via either conventional Fourier transform infrared spectroscopy (Kauppinen *et al*. 1978; Messer *et al*. 1983), THz-TDS (van Exter *et al*. 1989; Cheville & Grischkowsky 1999) or other techniques (Helminger *et al*. 1983; Matsushima *et al*. 1995; Chen *et al*. 2000)—the extensive study is probably due to the abundance of water vapour, which involves many physical processes in nature. Subsequently, the computer-accessible spectroscopic parameters of water vapour, including the line position, line strengths and linewidth, are available from many research groups. Here in this work, the catalogue published by the JPL group (Pickett *et al*. 1998) is adopted. As this catalogue has been regularly updated since before 1985 (Poynter & Pickett 1985), it is unlikely that the high-intensity lines of water vapour are absent from the list.

According to the JPL catalogue, the line strength is reported in terms of the integrated line intensity at 300 K, *I*_{a} (300 K), and is scalable with the temperature, *T*. Despite the temperature dependency, the line strength is independent of pressure (Townes & Schawlow 1955). Also included in the catalogue is the line position. The shifting of the position is an order of magnitude lower than the line broadening (Podobedov *et al*. 2004), and thus the position is presumably pressure and temperature dependent (Townes & Schawlow 1955).

### (b) Linewidth

A rotational absorption resonance is possibly broadened by a number of factors (Townes & Schawlow 1955; Bernath 2005), for instance, self or foreign gas collisions, molecule wall collisions, the Doppler effect and natural lifetime broadening. Among these factors, at a standard pressure, the collisional broadening is predominant in determining the spectral linewidth (Gopalsami *et al*. 1996; Harde *et al*. 1997).

More specifically, the collision broadened linewidth is dependent on pressure (Bernath 2005) and temperature (Gasster *et al*. 1988; Cheville & Grischkowsky 1999; Podobedov *et al*. 2004). This dependency yields the Benedict & Kaplan relationship (Benedict & Kaplan 1964; Kemp *et al*. 1978),(3.1)where Δ*ω*_{0} is the full width at half maximum (FWHM) of the line at pressure *p*_{0} and temperature *T*_{0}; Δ*ω* is the FWHM at pressure *p* and temperature *T*; and *m* is the temperature index, varying between 0.5 and 1.0. An index of *m*=0.68 can be used in the calculation (Rothman *et al*. 2005).

Determined from the experiment, the average FWHM, Δ*ω*_{0}/2*π*, of water vapour resonance at ambient temperature and standard pressure is 6 GHz. This value is in agreement with an estimate of 10 MHz Torr^{−1} or 7.6 GHz at 1 atm (De Lucia 2003; Bernath 2005) within the limit of measurement uncertainty.

### (c) Lineshape

The rotational absorption lineshape can be modelled either by Lorentz (1906), Van Vleck & Weisskopf (1945) or Gross (Emery 1972; Kemp *et al*. 1978) profiles. The choice depends on the frequency range of interest and the nature of the line broadening. Notwithstanding this, no significant difference is observable among these profiles at, or near, a T-ray resonance, with the linewidth nearly three orders of magnitude lower than the resonance frequency. In this paper, a Lorentzian profile is adopted in modelling the water vapour response in the T-ray frequency range.

Lorentz absorption and dispersion profiles are given, respectively, by (Kemp *et al*. 1978)(3.2a)and(3.2b)where *ω*_{a} denotes the *a*th transition frequency and Δ*ω*_{a} denotes the half width at half maximum of the profile at that frequency. The absorption and dispersion profiles are linked together via the Kramers–Krönig relation.

### (d) Ensemble of rotational transition resonances

This subsection gives a complete model, in terms of optical constants, as a result of several rotational transition resonances in the frequency range of interest. The model is a function of the lineshape, linewidth, line strength and line position, which are discussed in the previous subsections. Later on, the model is compared with the measurement to verify the accuracy.

The model extinction coefficient and index of refraction are the summation of an offset and a set of Lorentzian profiles, or(3.3a)and(3.3b)where the line strength *m*_{a} is proportional to the integrated line intensity, *I*_{a}, given in the JPL catalogue, divided by the line position, *ω*_{a}, or *m*_{a}∝*I*_{a}/*ω*_{a}. The offset is partially a result of electronic and molecular vibrational resonances at high frequencies (Kemp *et al*. 1978) and also a contribution from the tails of other rotational resonances unaccounted for by the model.

Typically, at the low-frequency range, from DC to a few hundred gigahertz, THz-TDS cannot produce sufficient energy to overcome the noise. As a result, the resolved optical constants in this frequency range are unreliable. A phase extrapolation technique is usually introduced to correct the unwrapping process, forcing the phase to start at zero. Thus, it is not necessary in the model to consider the index offset, *n*(0), which is derived from the phase.

The absorption of water vapour, shown in figure 2, is determined from the measured data (figure 1) and the Lorentzian model in the frequency range between 0.0 and 4.0 THz. It can be clearly seen that, below 1.6 THz, the model closely resembles the measurement. This match is possible since the resonances in this low-frequency range have their strengths beneath the maximum absorption coefficient measurable by the system. However, as the frequency goes beyond 1.6 THz, the model cannot track the measurement due to the dynamic range of the system. In this situation, the system can no longer measure the absorption coefficient correctly, and the clipped absorption peaks are obvious, in particular, in the range from 2.0 to 4.0 THz. Furthermore, the clipped lines become less distinct when the T-ray power reaches the noise floor beyond 3.0 THz.

### (e) Continuum absorption

The continuum absorption, unaccounted for so far, is defined as the excess measured absorption unable to be quantified by the resonance spectrum. The mechanism underpinning the continuum has not been fully understood (Podobedov *et al*. 2005), and most of the mathematical models are fitted to experimental observations.

It is known that the continuum absorption is pressure and frequency dependent. A continuum for H_{2}O collisions, derived from an experiment in the T-ray frequency range, between 0.4 and 1.83 THz, is estimated at 4.22×10^{−8} (dB km^{−1}) (hPa GHz)^{−2} (Podobedov *et al*. 2005). A normal atmospheric measurement in a relatively short path length is insignificantly influenced by the continuum absorption (Clough *et al*. 1989; Podobedov *et al*. 2005).

## 4. Removal of H_{2}O response by direct deconvolution

This section introduces a straightforward way to remove the water vapour response from T-ray signals. Direct deconvolution of the modelled water vapour response from the measured spectrum is theoretically possible. Nevertheless, the capability of the method relies on the achievable SNR of the system and the precision of the physical conditions. Direct deconvolution is carried out here for purposes of comparison and highlighting drawbacks, thus motivating the need for our water vapour removal method in §5.

### (a) Water vapour as a black box

During a THz-TDS measurement, an emitted T-ray signal evolves, based on several factors, e.g. the sample response, the water vapour response, the instrument response and the noise. These can be modelled as a system, as illustrated in figure 3.

Given that *x*(*t*) denotes the input T-ray pulse, which is immediately deployed from a transmitter, *s*(*t*) is the impulse response of a sample, *w*(*t*) is the impulse response of water vapour, *u*(*t*) is the impulse response of the instrument, *n*(*t*) is the noise and *r*(*t*) is the time windowing, the received pulse, *y*(*t*), can be described as(4.1)where * is the convolution operator. The noise is simply additive and can thus be treated as an extra term in equation (4.1).

Via Fourier transform, the above equation in the frequency domain is(4.2)where *Y*(*ω*), *X*(*ω*), *S*(*ω*), *W*(*ω*), *U*(*ω*), *R*(*ω*) and *N*(*ω*) represent complex frequency responses, as the Fourier pairs of *y*(*t*), *x*(*t*), *s*(*t*), *w*(*t*), *u*(*t*), *r*(*t*) and *n*(*t*), respectively.

The water vapour frequency response can be expressed by(4.3)Here, *L* denotes the T-ray propagation length in free space, excluding the space occupied by the sample(s). The water vapour's complex index of refraction, , contains the refractive index *n*_{vap} and the extinction coefficient *κ*_{vap}, which are modelled in equations (3.3*a*) and (3.3*b*). The water vapour response is also determined by the humidity (Yuan *et al*. 2003), where *ρ*(*T*) is the vapour density at temperature *T* and *ρ*_{0}(*T*) is the saturation vapour density at the same temperature.

### (b) Deconvolution of model H_{2}O response from the sample signal

In order to remove the effects of water vapour from a measurement, the water vapour response, *W*(*ω*), in equation (4.2) must be replaced by the vacuum response, *V*(*ω*), with the same propagation length,(4.4)In the case of a noise-free signal, the direct deconvolution can be performed,(4.5)The deconvolution can be performed successfully with no hindrance from either the window response, *R*(*ω*), which causes the limit spectral resolution, or the system response, *U*(*ω*). However, it is only feasible with an ideal signal. If the deconvolution is carried out with a noisy signal, the outcome is degraded.

From equation (4.2), the noisy response can be modelled in two parts,(4.6)Assuming the noise level is unity and the model water vapour response matches well with the measurement, performing the direct deconvolution would yield(4.7)Thus, removal of the water vapour response fails at |*X*.*S*.*W*|≤|*N*| or *D*(*ω*)≤|*S*.*W*|^{−1}, where *D*(*ω*)=|*X*/*N*| is the system's dynamic range. This usually occurs at the peaks of the absorption resonances, as shown in figure 2.

### (c) Limitation of the algorithm

From the previous subsections, the deconvolution to remove the water vapour response from the sample signal is effective, if only the measured signal, |*X*.*S*.*W*|, is higher than the noise level, |*N*|, at all frequencies of interest. Only a small part of the spectrum violating this condition prohibits the whole deconvolution. This condition is not always attainable due to the sharp absorption resonances and nature of T-ray spectral magnitude, which is relatively low and tends to decline at higher frequencies.

Apart from the noise issue, the applicability of direct deconvolution in §4*b* critically depends on the knowledge of the atmospheric and geometric parameters during the measurement, i.e. the temperature, pressure, humidity and propagation length, which are indispensable for modelling the water vapour response.

Although all water resonances are well above the noise floor and all required parameters are obtained exactly, significant discrepancies between the model and the measurement can still occur. In that case, there is no means to quantify the accuracy of the results produced by the direct deconvolution technique.

## 5. Removal of H_{2}O response by strength tuning

This section proposes an alternative to the direct deconvolution technique presented in §4. By introducing the strength-tuning algorithm, two difficulties, the noise and the requirement for atmospheric parameters are simultaneously mitigated.

### (a) Water vapour as black boxes

The water vapour response can be separated into several components, as shown in figure 4, and the separation is formulated as(5.1)Each component corresponds to a complex resonance and has its frequency response derived from equations (3.3*a*), (3.3*b*) and (4.3), or(5.2)The line strength, *m*_{a}, now incorporates the humidity ratio, *ρ*(*T*)/*ρ*_{0}(*T*), and the propagation length, *L*. It can be inferred from equations (5.1) and (5.2) that all components or resonances can be removed from the measured spectrum separately and independently. Consequently, the deconvolution at the noisy parts of the spectrum can be avoided, and it is also possible to tune the *m*_{a} of each resonance to the measurement without accurate information of the atmospheric and geometric parameters. In order to achieve this parameter relaxation, a criterion indicating the fit of tuning is essential, this is now addressed in §5*b*.

### (b) Fluctuation-removal algorithm

The fluctuation-removal algorithm is shown in figure 5. The counters *a* and *b* for the line position and line strength are reset. The H_{2}O spectral line parameters, including intrinsic line positions and strengths, in the frequency range of interest, are then fetched from an existing database. A complex resonance profile, *n*(*ω*)−*jκ*(*ω*), of a rotational transition at *ω*_{a=0} is modelled according to the Lorentzian profile in equations (3.2*a*) and (3.2*b*). The modelled profile multiplied by the initial strength factor *m*_{a=0,b=0} is then temporarily deconvolved from the measured complex response, *y*(*ω*). The deconvolved time-domain signal, *y*(*t*), is estimated for its fluctuation ratio (see §5*c*). The procedure repeats to find the fluctuation ratio at other predefined line strengths *m*_{a=0,b}; *b*∈{1, 2, 3, …}, which may increase by a discrete step in the vicinity of the database's line strength.

Once the tuning range, *m*_{a=0,b}, is covered, the algorithm picks up an optimal strength within *m*_{a=0,b}, which gives the minimum fluctuation ratio, if any, and permanently removes that optimal complex resonance from the measured signal. It is possible that the change in the time-domain fluctuation is so subtle that the minimum fluctuation ratio cannot be found. In that case, the removal of a current line is skipped. The tuning procedure starts again, but with the next transition resonance, *ω*_{a=1}, and is repeated until all resonances are optimized and possibly removed. It should be noted that a strong resonance, which suffers from noise, i.e. |*X*.*S*.*W*|<|*N*| at and around the peak of resonance, is avoided by the tuning process, because performing deconvolution of such an ill-defined resonance gives rise to a high fluctuation ratio. Since there might be dependencies among the resonances, with regard to time-domain fluctuations, the whole process starts again until the fluctuation ratio no longer decreases.

It is advisable not to perform zero padding prior to the Fourier transform in the removal process, as the interpolated spectrum does not exactly reflect the reality; this is especially the case for points at resonances that have high variability. Otherwise, discrepancy between the model and the spectrum would cause a large remnant in the spectrum after deconvolution. Also note that to speed up the algorithm, the resonances that have a strength lower than the amplitude uncertainty can be skipped.

### (c) Fluctuation ratio

A criterion related to the quality of a T-ray signal is necessary in selecting the optimal resonance line strengths. The ‘quality’ here is defined as having a high pulse energy with low fluctuations in the time domain, which implies the absence of water vapour resonances in the frequency domain.

In order to quantify the quality, one must be able to evaluate the total energies of the main pulse and of the fluctuations. One potential way is to window the time-domain signal with a Gaussian profile to obtain the amplitude at a desired portion. A Gaussian window is selected because a T-ray main pulse is essentially derived from an optical pumping pulse, which takes on a Gaussian pulse profile (Duvillaret *et al*. 2001).

Figure 6*a* shows a Gaussian window, *g*_{σ}(*t*), overlapping the T-ray signal, *y*(*t*). A Gaussian window with appropriate width and position would eliminate the fluctuating tail of a signal, as shown in figure 6*b*. On the other hand, the complement of a Gaussian window could also be used to remove the main pulse, as shown in figure 6*c*. With the assistance of a Gaussian window, the fluctuation energy normalized by the main pulse's total energy, called the *fluctuation ratio*, can be formulated as(5.3)The integration is carried out over the time duration of a recorded T-ray signal. A Gaussian window is given by(5.4)where *t*_{0} is the peak position of the T-ray signal, *y*(*t*), and *σ* multiplied by is the FWHM of a Gaussian window.

### (d) Generality and limitation of the algorithm

The algorithm is plausibly general, in the sense that it can be applied to any T-ray signal with minimal disturbance to the desired signal features. This generality is due to the following facts.

First of all, a T-ray spectrum is altered by the algorithm only at the frequencies at which water vapour absorption lines are situated. The broad spectral features of a time-domain transient are unlikely to be significantly affected by narrow water line removal. The reflections, for example, which exhibit fringes over a broad frequency range, are not affected by narrow-band resonance removal.

Second, uncorrelated fluctuations are ignored. Because the algorithm senses the fluctuation change, caused by strength tuning, rather than the fluctuation itself, any uncorrelated fluctuations cannot deceive the algorithm.

Finally, the causality of a signal is preserved. The absorption and dispersion profiles, used in the model of water vapour response, exactly comply with the Kramers–Krönig relation. Deconvolution of the measured spectrum by this causal response would yield the causal spectrum.

A limitation of the algorithm is when there is overlapping of sharp resonances, i.e. a resonance of unrelated gas species exactly collocates with that of water vapour. In such a situation, both resonances are removed. Despite this limitation, the algorithm is nevertheless applicable to a large number of specific cases where overlap does not significantly occur. Scenarios of interest can always be benchmarked against sealed-chamber measurements in the laboratory.

## 6. Results

In this section, the proposed algorithm is tested with experimental T-ray data. The results from the algorithm are benchmarked directly with the experimental data recorded using a sealed chamber purged with nitrogen, which is an ideal case. This benchmarking is only to demonstrate the efficacy of the algorithm.

### (a) Free-path measurement

The first T-ray signal, subject to the water vapour removal algorithm, is measured in free space without the presence of any material. The signal has a temporal resolution of 66.7 fs and a total duration of 136.63 ps, providing the spectrum with a spectral resolution of 7.3 GHz.

The strength-tuning algorithm is carried out with a set of complex resonances in the frequency range between 0 and 4 THz, where the T-ray magnitude is relatively high. The sequence, in which the resonances are interrogated and removed, follows the order of line strength, i.e. from high to low strengths. Only those resonances that have a strength higher than 0.01 of the maximum strength are inspected. The FWHM of the Gaussian window is 3 ps and the iteration of the algorithm is set to 5.

As shown by the processed signal in figure 7, the algorithm can significantly reduce the time-domain fluctuations, which were previously located immediately after the main pulse, i.e. after 10 ps, of the unprocessed signal. The main pulses in the inset clearly illustrate the similarity between the fluctuation-free signal and the processed signal, despite a temporal shift that is not accounted for by the algorithm. The differences between the unprocessed/processed signal and the fluctuation-free signal are shown in figure 8. The cumulative difference in figure 8*b* clearly shows that the fluctuations after the main pulse are greatly reduced. In figure 9, the spectrum of the processed signal demonstrates success of the algorithm in removing, or reducing the strength of, the resonances in the frequency range from 0 to 2.5 THz. However, some resonances still persist, where there is spectral crowding or where the resonances are buried in noise.

### (b) dl-phenylalanine measurement

The second experiment is performed with a signal measured from a sample of dl-phenylalanine at 220 K. The cryostat containing the sample is evacuated. The interacting water vapour is in the air in the remaining T-ray beam path. The total scan is 68.3 ps with a step size of 66.7 fs. The spectral resolution is 14.6 GHz.

The sample is selected because: (i) it has a typical distribution of vibrational modes. This is to demonstrate that the method can be potentially applied to a large number of cases, although complete generality of the algorithm is the future challenge. (ii) The recorded signal particularly contains the reflections, caused by the cryostat's windows, made from a cyclo-olefin copolymer (Topas). This can be very interesting if the algorithm can remove the vapour-induced fluctuations in the time-domain signal while revealing the reflections.

The parameters set for the algorithm are similar to the previous test, except for the interrogated frequency range, which is between 0 and 2 THz.

The processed signal is shown in figure 10, along with the original. Again, it is obvious that the fluctuations located after the main pulse, beyond 15 ps, are remarkably reduced, in spite of the introduction of a small oscillation at the beginning. The reflection at 63 ps (point B) is not disturbed by the algorithm, and, surprisingly, the reflection at 32 ps (point A), which was initially buried in fluctuations, is now recovered. Figure 11 provides a further comparison between the fluctuations of the unprocessed and processed signal. The cumulative difference in figure 11*b* clearly illustrates the reduction of the fluctuation energy for the processed signal. In figure 12, for the processed spectrum, the algorithm can remove most of the H_{2}O resonances. There are still a few resonances that cannot be completely removed. These persistent resonances are approximately 1.1 and 1.7 THz, exactly the same positions as the unremoved resonances in the previous case (figure 9). A closer look at 1.1 THz shows that two strong resonances overlap and merge together, but only one resonance is removed. This is because the removal of one resonance results in the deformation of the other resonance from a Lorentzian shape, and thus the remaining deformed resonance cannot be appropriately deconvolved by a Lorentzian model. In spite of this, the recovery of reflections suggests a promising application of the algorithm in T-ray range finding.

### (c) Evaluation of the algorithm's performance

Table 1 shows the fluctuation ratio and mean squared error (m.s.e.) of the signals and the total energy of the spectra investigated in the previous subsections. Two additional results, not shown in this paper, are tested with lactose's reference and sample signals. The fluctuation ratio helps to measure a change in the fluctuation energy and main pulse energy, before and after the signals are processed. The m.s.e. evaluates the fitness of the improved signals to a fluctuation-free signal (*N*_{2}-atmosphere measurement). The m.s.e. is given by(6.1)where *y* is a fluctuation-free signal and *ý* is a compared signal. The error is summed and averaged over *m* temporal points. The total spectrum energy shows the recovery of the energy from resonance absorptions. Note that the fluctuation ratio and the spectral energy are normalized to those of a fluctuation-free signal.

From the table, the fluctuation ratio indicates that the fluctuation energy is greatly reduced in all cases, once the removal algorithm is implemented with the unprocessed signal. However, the m.s.e. still reflects a considerable difference between the processed signals and the fluctuation-free signals, in spite of the great improvement visualized in the previous subsections. This would probably be due to a whole signal shift in the time domain, caused by a constant refractive index difference unaccounted for by the algorithm.

The total spectral energy in table 1 reveals that the algorithm is successful in recovering a part of the signal's energy, which is absorbed by water vapour resonances during propagation. However, not all the energy is recoverable. With regard to the processed spectra shown in the previous subsections, the algorithm cannot completely remove the water vapour resonances in the region where the SNR is low and where the spectral lines are crowded. These two cases are analysed separately as follows.

In a low SNR region, distortions in the absorption and dispersion resonances by noise cause deviations of the model from the measurement. This situation is demonstrated in figure 2. Applying the removal algorithm to a low SNR region would be effective, provided that the distortion is not severe. Otherwise, the large deviation inhibits the removal via a suboptimal value of the fluctuation ratio. A possible means to alleviate the noise issue is to implement a signal enhancement methodology such as wavelet denoising (Ferguson & Abbott 2001) prior to the removal process; this might better recover water resonances and subsequently improve the removal efficiency.

Crowded resonance features result in overlapping and merging between two or more resonances. If the spectral resolution is too low, these blended resonances are rendered incorrectly, resulting in a distorted spectral profile. The algorithm would see them as a single stronger and wider resonance and, as a result, the removal does not perform well. Improving the spectral resolution would allow better removal performance. For example, the free space measurement in figure 9 shows a better result than the dl-phenylalanine measurement in figure 12 in the regions of approximately 1.1 and 1.7 THz; the former and latter measurements have spectral resolutions of 7.3 and 14.6 GHz, respectively. In addition to the spectral resolution problem, blended resonances cannot be represented exactly by a simple sum between two Lorentzian lines (Pickett 1980).

## 7. Conclusion

A numerical algorithm for the removal of water vapour effects in T-ray measurements is investigated in this paper. Given only a sample signal with no reference, theoretically, the removal can possibly be carried out by simple deconvolution of the model complex water vapour response from the signal. However, many factors limit the usability of the simple deconvolution. An exact model of water vapour resonance requires many physical parameters, including the pressure, temperature, humidity and propagation length. In addition, if the noise level is sufficiently high such that some strong resonances are distorted, the whole deconvolution process cannot be performed.

The proposed algorithm fine tunes the strength of each model spectral resonance. A criterion for strength tuning is met when the fluctuation ratio of the processed signal reaches the minimum. Once an optimal resonance is attained, it is removed from the signal. The algorithm then proceeds to the removal of the next resonance. This tuning scheme relaxes the requirement for precise information of the delicate physical conditions during the measurement. An extension of the algorithm to cope with changes in the linewidth is possible by performing fine tuning of a resonance in the two-dimensional space (linewidth and line strength spaces). Furthermore, the fluctuation ratio criterion inhibits any faulty deconvolution that might occur when the noise disturbs the quality of a measured signal.

In the experiments, the algorithm produces promising results. It can reduce a significant amount of vapour-induced fluctuations in the time-domain signal and spectral resonances with small disturbances to other non-related features, such as reflections or sample-induced resonances. Moreover, we have demonstrated with experimental data that some features, which are initially masked by fluctuations, can be recovered by the algorithm. Optical constants extracted from these signals are demonstrated to hold their known values, within acceptable precision.

The proposed algorithm is general in the sense that any other polar gas response could be targeted, in principle, and hence removed from a measured spectrum if desired. It is not necessary that the parameters of a targeted gas be present in a spectroscopic catalogue, as long as its pure response in the frequency range of interest is fully estimable. This scheme might have benefits in some particular situations, for instance, where molecular contaminant(s) are unavoidable in measurement.

Finally, it should be noted that the algorithm is not meant to serve as a full replacement to the ‘sealed chamber’ procedure executed during the measurement stage. In fact, it assists the measurement in cases where a sealed chamber is not feasible. Although the algorithm can apply to a large number of specific cases, it has its own limitations. Further improvement of the algorithm remains an attractive challenge.

## Acknowledgments

The authors gratefully acknowledge Morten Franz at the Department of Molecular and Optical Physics, University of Freiburg for his support in T-ray measurements and Brian W.-H. Ng and Samuel P. Mickan at the School of Electrical and Electronics Engineering, University of Adelaide for their useful technical discussion.

## Footnotes

- Received October 31, 2007.
- Accepted April 9, 2008.

- © 2008 The Royal Society