## Abstract

The theory of the sibilant fricative [s] is formulated and solved as a mathematical problem of aeroacoustics. Air is forced through the constriction between the tongue blade and the hard palate by intra-oral pressure, forming a jet that strikes the upper incisors and leaves the mouth through a gap between the upper and lower incisors. The principal source of sound is the ‘diffraction’ of jet turbulence pressure fluctuations by the incisors. The spectrum of these pressure fluctuations incident on the teeth is modelled analytically using an empirical formula adapted from turbulent boundary-layer theory. Predictions are made about the far field acoustic pressure spectrum with reference to measured and estimated values of vocal tract dimensions and intra-oral pressure. Predicted spectra compare well with observations. The principal spectral peaks are determined by vocal tract physiology anterior to the tongue–palate constriction. The theory furnishes the first correct prediction of the dependence of the overall sound pressure level on the intra-oral pressure.

## 1. Introduction

The sibilant fricative phonemes of English are /s/, as in ‘sad’, /z/, as in ‘zoo’, /sh/, as in ‘shade’, and /zh/, as in ‘beige’. They possess random noise sources that are comparatively loud in the higher audible frequencies (Ladefoged 1993). A sibilant fricative is produced by a jet of air striking the front teeth (Catford 1977). The jet emerges from a narrow channel formed by pressing the front part of the tongue (the tongue *blade*) against the hard palate, and air is forced through the channel by the intra-oral pressure Δ*p*_{o}, which is usually about 1% of atmospheric pressure for sibilant sounds. Articulation of /s/ and /z/ requires the end of the channel to be just behind the incisors, the teeth at the very front of the mouth. In many cases, the tongue tip opposes the alveolar ridge, the portion of the palate directly behind the upper incisors. The place of articulation for /sh/ and /zh/ is posterior to this.

A physical realization of /s/ is denoted by [s]. Schematic views of the human vocal tract during [s] production are shown in figure 1. Vocal tract geometry during running speech is always changing, but for fricative production these changes can be relatively slow. The figure is indicative of the geometry for [s] during the middle portion of its production. Figure 1*a* exhibits the vocal tract and surrounding head in the midsagittal plane (the vertical plane of symmetry) from the trachea, below the glottis, and the pharynx, up through the mouth and the lips. The anterior portion of the tongue closest to the lips approximates the hard palate just behind the upper incisors. The view in figure 1*b* is a coronal plane section corresponding to the line ‘A’ in figure 1*a* orthogonal to the midsagittal plane. Figure 1*a*,*b* show how the posterior and anterior sections of the vocal tract are separated by a narrow air channel, or ‘constriction’. Two factors contribute to the creation of this constriction: the lateral doming of the hard palate, and the ability of the internal muscles of the tongue to depress the region along the centre-line of the tongue relative to tongue tissue adjacent to the centre-line. Airflow is prevented from leaking laterally through the sides of the constriction by the tongue–palate contract.

There are small differences (‘allophonic variation’) in the way in which the same sibilant is articulated by speakers of the same language and also between instances of production for one speaker (e.g. Shadle 1985). For instance, English /s/ can be articulated with the tongue tip behind the upper incisors or behind the lower incisors. In the first case, the air channel ends and the jet exits at the most anterior portion of the tongue. In the latter case, which is less common, the air jet exits just behind the most anterior portion of the tongue (Borden & Gay 1975, 1978). In both cases a wall jet along the hard palate strikes the base of the upper incisors. The geometry of this situation is illustrated schematically in figure 2. This property appears to be common among the allophonic variants of /s/ in English. Allophonic variation produces acceptable instances in the pronunciation of English /s/, so that other symbols could be added to [s] to indicate tokens that would be understood as /s/. Small variations of this kind will not be addressed in this paper.

Fant (1960) and Heinz & Stevens (1961) applied the source-filter theory of speech acoustics to understand English sibilant production. The source was assumed to be a broadband noise source, and many of the spectral and linguistic features of fricatives were then explained in terms of the resonance properties of the vocal tract expressed in terms of a transfer function, whose functional form can be estimated from a knowledge of the cross-sectional area variations along the vocal tract.

The values of several important parameters involved in the production of fricatives are unknown. In particular the cross-sectional area in the narrow channel between the tongue and the hard palate is not known with precision, nor is the interaction of the airflow with tongue and palate tissues fully understood. This has led to uncertainty about the causes of the observed dependence between Δ*p*_{o} and far field overall sound pressure level (OASPL). Measurements by Hixon *et al*. (1967) for two speakers producing the sibilants [s] and [sh] indicated that OASPL∝(Δ*p*_{o})^{2.6} for [s] for both speakers, whereas data for [sh] gave OASPL∝(Δ*p*_{o})^{2.4} and OASPL∝(Δ*p*_{o})^{2.8} for the two different speakers. Similar experiments using human participants performed by Badin (1989), Badin *et al*. (1994) for [s] found that the OASPL scaled as (Δ*p*_{o})^{α} where *α* varied between 2.05 and 2.8.

Stevens (1971) argued that sibilant fricatives are produced when moving air strikes the teeth to produce a turbulent wake. The wake generates a fluctuating surface force that constitutes an acoustic dipole source. Stevens estimated source strengths and spectral shapes from contemporary theoretical work and experimental data on sound production by flow past an obstruction in an open ended duct and deduced that the OASPL∝(Δ*p*_{o})^{3.0}. The finding by Hixon *et al*. (1967) that OASPL∝(Δ*p*_{o})^{2.6} for [s] was attributed to changes in the relation between the vocal tract transfer function spectrum and the noise source spectrum with changes in Δ*p*_{o} (see also Catford 1977). Stevens' (1971) general conclusions have been verified by model experiments performed by Shadle (1985, 1990, 1991). Badin (1991) used a direct method for inferring the vocal tract transfer function. He then applied inverse filtering to obtain the fricative sources for [s] and [sh] on the assumption that the source types are dipole and are located at the teeth.

In this paper, a new aeroacoustic theory is proposed for the production of [s]. It is argued that Stevens' (1971) turbulence wake model is too simple to provide an accurate understanding of fricative production. It takes no account of the fact that the air jet that strikes the upper incisors is actually deflected downwards along the upper incisors and leaves the mouth through a narrow gap between the tips of the upper and lower incisors. The principal source of sound is the interaction of jet turbulence pressure fluctuations with the tip regions of the incisors, which modulates the unsteady volume flux from the gap between the teeth. Of course other sources are involved, including the force fluctuations themselves (dipoles) and turbulence volume quadrupoles, but these are much less important because they are not coupled with the resonant properties of the oral cavity. Random fluctuations in mass flux rate, or random monopole sources, have been postulated elsewhere in the theory of noise production in speech, such as during aspiration (Pastel 1987; Stevens 1998).

One of the first detailed measurements of the far-field spectrum of [s] was made by Shadle (1985), and subsequent detailed measurements were made by Badin (1989) and Shadle *et al*. (1991). Three spectra from these studies are shown in figure 3, all of which represent SPL summed over 40 Hz bins with pressure normalized by 0.0002 dynes cm^{−2}. There are substantial variations in observed spectral shapes because of morphological differences in the vocal tracts of different speakers and because of differences in measurement procedures, although the spectra in figure 3*b* are actually from the same subject. The Shadle (1985) and Badin (1989) spectra were measured at a distance of 20 cm from the lips, with the former in a quiet but nonanechoic environment and the latter in an anechoic chamber. The level of background room noise in the Shadle experiment was measured and found to be highest in the lower frequencies. For example, between 1 and 2 kHz, the room noise was between 10 and 20 dB below the level for [s] sustained at normal level, which is the spectrum shown in figure 3*a*. Sound pressure measurements by Shadle *et al*. (1991) were made at 100 cm and the environment was anechoic down to 170 Hz; it has not been corrected for distance in figure 3*b*. The measurements in anechoic environments exhibit significant decreases in spectral levels below about 3 kHz, by about 10 dB for Badin (1989) and 20 dB for Shadle *et al*. (1991), respectively. There are substantial peaks at 3 and 9 kHz in the Shadle (1985) spectrum, whereas the anechoic measurements both show strong peaks at 4.5 kHz. Detailed examination of data from other sources (Hughes & Halle 1956; Strevens 1960; Badin 1991) indicate that these differences at higher frequencies are characteristic of variations between speakers rather than between different measurement conditions, although the latter must, of course, make some minor contribution.

These studies also differed in the instructions given to the participants and in the subsequent data processing. Shadle (1985) and Shadle *et al*. (1991) requested that the participants produce a sustained [s], whereas Badin (1989) instructed participants to produce a sustained [as]. The power spectra plotted by Shadle (1985) were obtained by averaging eight disjoint 25 ms Hanning-windowed segments within a 5 s interval of the fricative, with a sampling rate of about 20 kHz. Badin (1989) averaged 80 overlapping 12.5 ms Hanning-windowed segments from 0.5 s of a fricative, with a sampling rate of 20 kHz. Shadle *et al*. (1991) averaged 25 disjoint 20 ms Hanning-windowed segments from the middle 3 s of fricatives, with, a sampling rate of 44.1 kHz. These differences in measurements and processing techniques suggest that the latter study probably provides the most statistically significant estimate of spectral shape during [s] production.

We shall derive an expression for the overall acoustic pressure spectrum for [s] in the far field in terms of a solution of the acoustic wave equation that involves a ‘compact’ Green's function, and the wall pressure frequency spectrum of jet turbulence at the incisor surfaces. This solution accounts for the ‘diffraction’ of turbulence pressures by the teeth, scattering by the head and the resonance (filtering) properties of the vocal tract and mouth. A start is made on understanding the causes of speaker variability in [s], particularly in terms of individual vocal tract morphology.

The aeroacoustic problem is formulated in §2 and a general description is given of the various possible interactions of the jet with the incisors. The analysis is based on a simplified geometry (§3): close to the jet and the gap between the upper and lower incisors, the incisors are modelled by parallel, rigid half planes. The production of sound by the jet flowing through the gap is then ascribed to the diffraction of the hydrodynamic pressure field of the jet by the edges of the incisors. The diffraction problem is solved (in §3) by making use of an acoustic Green's function tailored to incisor geometry and coupled to the resonant response of the vocal tract (§4). An empirical model for the jet turbulence (Howe 1998) is then used (§§4 and 5) to predict the spectrum of [s] in the far field, and a comparison made with the experiment.

## 2. The jet–incisor interaction

Figures 1 and 2 illustrate in general terms the flow–structure interactions involved in the production of [s]. The anterior end of the constriction is generally less than 0.5 cm from the base of the upper incisors. The intra-oral pressure Δ*p*_{o} behind the constriction is essentially the same as the excess pressure in the lungs during [s] production, and is typically between 8 and 10 cm of H_{2}O (approximately 800–1000 Pa) during conversational speech. There appears to have been only one attempt to measure the cross-sectional area of the constriction (Narayanan *et al*. 1995, who used magnetic resonance imaging). More commonly, its value has been estimated from the ‘orifice equation’ (Batchelor 1967), which accounts for the loss of pressure head, when air flowing in a tube encounters an abrupt expansion where irrotational kinetic energy is transformed into turbulence in the exciting jet. Simultaneous measurements of pressure and steady volume velocity can be used to estimate that the cross-sectional area of the constriction during [s] production is about 0.1 cm^{2}. The volume velocity for steady [s] production at this cross-sectional area has been measured by Badin (1989) and others to be up to 500 cm^{3} s^{−1}. Thus, the average velocity at the exit of the construction can be as high as approximately 5000 cm s^{−1} for the moderate intra-oral pressure of 10 cm H_{2}O, corresponding to a Reynolds number (based on the mean diameter of the constriction) in excess of approximately 10^{4}.

The typical situation for humans is for the upper incisors to form an overbite with the lower incisors. That is, the upper incisors cover, at least partially, the lower incisors when the jaw is completely closed. From self-observation by the authors when producing [s], the jaw is open to the degree that the edges of the upper incisors are at nearly the same vertical level of the edges of the lower incisors and the horizontal distance between the upper and lower incisor edges is about 0.2 cm. This distance and the exact geometric relation between the edges will depend on the speaker, the phonetic environment of [s] during running speech, as well as the orientations of the incisors relative to the jaw and skull (e.g. straight teeth versus ‘buck’ teeth).

Anterior to the gap between the teeth is the channel bounded by the teeth, gums and the soft tissue of the lips. The distance *ℓ*_{o}, say, from the gap between the incisors and the most anterior portion of the lips is about 1.5 cm. This distance also depends on the speaker and on the exact phonetic environment in which [s] is produced. There is a ‘lower mouth cavity’ below the tip of the tongue blade and to the rear of the lower incisors. The volume of this region is not known with any degree of precision, however, in §5 we shall use estimates based on measurements by Sundberg & Linblom (1990) and Granqvist *et al*. (2003).

## 3. The acoustic pressure

### (a) The diffraction problem

The calculation of the sound radiated to the far field is based on the local model illustrated schematically in figure 2, and in more local detail in figure 4. The turbulent jet impinging on the upper incisor is deflected down to pass between a gap of width *h* between the upper and lower incisors. The upper and lower incisors are modelled locally by rigid, parallel half-planes which are defined in terms of the rectangular coordinates ** x**=(

*x*

_{1},

*x*

_{2},

*x*

_{3}) by

upper incisor:

*x*_{1}<0,*x*_{2}=*h*,lower incisor:

*x*_{1}>0,*x*_{2}=0.

It is convenient to assume that if Δ denotes the spanwise width of the jet (in the *x*_{3}-direction out of the plane of the paper in figure 2), then the coordinate origin O is positioned on the edge of the lower incisor such that the jet occupies the interval .

The overall pressure fluctuations at **x** at time *t* can be partitioned into two components *p*(** x**,

*t*) and

*p*

_{I}(

**,**

*x**t*). The latter is the ‘incident’

*hydrodynamic*pressure field of the jet, defined as the ‘direct’ unsteady pressure that would be generated by the jet vorticity convected in the jet when the presence of the boundary surfaces

*S*(principally the incisors) is ignored. It will be assumed that

*p*

_{I}(

**,**

*x**t*) is known or that it can be determined from an empirical representation of the jet turbulence. The component

*p*(

**,**

*x**t*) is the additional pressure fluctuations that arise as a consequence of the interaction of

*p*

_{I}with

*S*, and includes both acoustic and hydrodynamic components. Because the turbulence sources within the jet are responsible for the production of

*p*

_{I}(

**,**

*x**t*), the ‘diffracted’ part

*p*(

**,**

*x**t*) may be taken to satisfy the homogenous equation(3.1)everywhere in the fluid, including the region occupied by the jet. In equation (3.1),

*c*is the speed of sound, which may in principle take different values outside and within the vocal tract.

A formal representation of *p*(** x**,

*t*) in terms of

*p*

_{I}will be obtained by first introducing a Green's function . This is defined as the

*causal*solution of equation (3.1) when the right-hand side is replaced by the impulsive point source , subject to the condition on

*S*, where

*x*

_{n}is a local coordinate normal to

*S*(directed into the fluid). Then, Kirchhoff's solution of equation (3.1) (Howe 1998) yields(3.2)

For high Reynolds number flow, the condition that the normal component of velocity must vanish on *S* is(3.3)Thus, on *S*, and equation (3.2) becomes(3.4)This equation permits the acoustic pressure to be evaluated in terms of the hydrodynamic pressure fluctuations *p*_{I} ‘incident’ on *S*. Now the hydrodynamic pressure field of the jet decays very rapidly with distance from the jet so that the integral in equation (3.4) is essentially confined to the region of strong surface interaction, where the jet flows past the edges of the upper and lower incisors, and where *G* varies rapidly with position. The interaction that occurs where the jet impinges normally on the upper incisor is not a particularly strong source of acoustic excitation, because a plane surface merely reflects an incident pressure field and has no means of transforming hydrodynamic energy into sound, such transformations occurring predominantly at geometrical irregularities, particularly at sharp edges where *G* is singular (Howe 1998).

### (b) Green's function

To determine , set(3.5)where is the solution with outgoing wave behaviour of(3.6)*κ*=*ω*/*c* being the acoustic wavenumber.

We are primarily interested in evaluating from equation (3.4) the pressure in free space at large distances from the head when a typical source point * y* is just within the oral cavity, in the vicinity of the gap between the upper and lower incisors. As in free space it is then convenient to determine by application of the reciprocal relation (Rayleigh 1945; Crighton & Leppington 1971; Howe 1998). That is, to calculate the solution of equation (3.6)

*as a function of*

*near the incisors when the source is removed to the far field point*

**y****x**.

A point source at **x** generates a spherical wave with velocity potential at * y* equal to(3.7)where

*c*

_{o}is the ambient sound speed (assumed to be constant), which may differ from that within the vocal tract. This wave impinges on the head, and produces audible pressure fluctuations at the mouth that are transmitted along the front channel between the lips and the incisors to a point

*in the region of the source flow. The wavelength is typically large compared with source distance |*

**y***| from the coordinate origin O of figure 2. The incident potential, , just in front of the mouth can be written to a sufficient approximation(3.8)where*

**y***σ*(

*ω*) is a frequency dependent correction factor governed by the shape and size of the head and also by the direction of the far field point

**x**,

**j**is a unit vector in the

*x*

_{2}-direction of figure 2, and

*ℓ*

_{o}is the effective length (introduced above) of the front channel. Far field observations are usually made in front of the head (where

*x*

_{2}>0 in figure 2), and the functional form of

*σ*(

*ω*) can then be estimated from the analytical prediction for a rigid sphere of radius

*R*(see Morse & Ingard 1968). At low frequencies, for which the head is

*acoustically compact*,

*σ*(

*ω*)∼1; in the high-frequency limit

*σ*(

*ω*)→2 for

*x*

_{2}>0 because the head then appears to be locally plane on a scale of the acoustic wavelength.

Green's function can be interpreted as the velocity potential at * y* produced by the point source in equation (3.6). The potential fluctuations applied at the lips produce a fluctuating volume flux

*V*

_{t}, say, directed

*into*the gap between the upper and lower incisors. This flux depends on the acoustic properties of the vocal tract when configured for the production of [s], and can be represented for a given subject in terms of the transfer function(3.9)A functional representation of

*T*(

*ω*) is derived below in §4

*d*.

When the gap *h* between the incisors is very small compared with the acoustic wavelength, the flow between the incisors produced by is essentially the same as for an incompressible fluid. It is also *irrotational* in the absence of further source terms in the Green's function equation (3.6). Furthermore, because is independent of * y*, the motion may be regarded as locally 2D. Thus, in the midsagittal plane, the streamlines of the flow into the oral cavity are of the form illustrated in figure 4

*a*for an ideal fluid. This locally 2D approximation to the potential flow can be calculated by a conformal transformation, in which the fluid region bounded by the nominally semi-infinite upper and lower incisors in the

*z*=

*y*

_{1}+i

*y*

_{2}plane of the midsagittal section (

*y*

_{3}=0, see figure 4

*b*) is mapped into the upper half of the complex

*ζ*-plane of figure 4

*c*. The required mapping is (Kober 1957):(3.10)where(3.11)

The outer section A′D and the inner section DC of the upper incisor are mapped respectively onto the intervals (−∞, *ζ*_{D}) and (*ζ*_{D}, 0) of the negative real *ζ*-axis. Similarly the sections CB and BA of the lower incisor are mapped onto the intervals (0, *ζ*_{B}) and (*ζ*_{B}, +∞) of the positive real axis, where(3.12)The point at infinity in the lower *z*-plane (corresponding to positions behind the incisors at distances from O≫*h*) maps into the origin *ζ*=0. The desired velocity potential of flow in the midsagittal plane into the oral cavity produced by can therefore be ascribed to a sink at *ζ*=0, so that in the neighbourhood of the incisors must have the general form:(3.13)where *A* and *B* are constants.

The volume flux into the oral cavity per unit *spanwise* length of the incisors is therefore equal to −π*B*. Now, a representation of the form given by equation (3.13) is applicable locally along the whole spanwise length *ℓ*, say, of the upper and lower incisors (which is, of course, curvilinear). Thus, the net inward volume velocity *V*_{t}=−*ℓ*π*B*. Therefore,(3.14)

It will be seen below that the constant *A* in this formula plays no role in the calculation of the radiated sound and may thus be discarded. We shall do this forthwith and then substitute into equation (3.5) to obtain the following approximation for Green's function, valid in the neighbourhood of the impingement region of the jet on the upper and lower incisors:(3.15)where *ζ*≡*ζ*(*y*_{1},*y*_{2}) is given as an implicit function of *z*=*y*_{1}+i*y*_{2} by equation (3.10).

### (c) Calculation of the far field pressure

Consider first the contribution *p*_{L}, say, to the acoustic pressure integral in equation (3.4) from the lower incisor. The lower incisor lies in the plane *y*_{2}=0 with a *leading* edge at *y*_{1}=0. The jet turbulence flowing past the edge lies in the region 0<*y*_{2}<*h*. If the incisor were absent the incident pressure *p*_{I}(* y*,

*τ*), which is essentially an incompressible pressure field and has a characteristic length-scale that is typically much smaller than the width

*h*of the jet, would satisfy Laplace's equation in the region

*y*

_{2}≤0 outside the jet and, therefore, could be represented in the neighbourhood of the incisor by(3.16)where . In this formula is the space–time Fourier transformof the incident pressure in the plane of the lower incisor. The statistical properties of this transform are assumed to be known, and are discussed further below.

Thus, we can take on the ‘outer’ and ‘inner’ sides (*y*_{2}=±0) of the lower incisor, and then use equations (3.15) and (3.16) to cast the integral (3.4) for the far field acoustic pressure generated by the lower incisor in the form(3.17)where is the retarded time, and(3.18)is the jump in the value of across the lower incisor, and we have used the formula .

The integral with respect to *y*_{1} in equation (3.17) is formally divergent and must be interpreted as a generalized function (Lighthill 1958). This is because our approximation to Green's function is valid only for source points * y* close to the gap where the jet passes between the incisors. We can obtain a convergent integral by replacing

*k*

_{1}by

*k*

_{1}+i

*ϵ*, where

*ϵ*is a small positive quantity that is subsequently allowed to vanish, and may be taken to represent the gradual decay of the turbulence sources as they convect past the edge of the incisor. We then find, after an integration by parts, that(3.19)(3.20)where the asterisk denotes complex conjugate. The integral in equation (3.20) is taken along a path passing just above the positive real axis in the

*ζ*-plane, and can be evaluated in terms of the Hankel function of complex order

*ν*by making use of a contour integral representation of given by Erdélyi

*et al*. (1953) p. 23 eqn. (27). By this means we find that(3.21)

Equation (3.17) for the acoustic pressure radiated from the lower incisor now becomes(3.22)

A similar procedure can be used to determine the component *p*_{U} (**x**, *t*) of the far field acoustic pressure generated by the upper incisor. In this case the jet hydrodynamic pressure occurs on a *trailing* edge at *y*_{1}=0, *y*_{2}=*h*, and *p*_{I}(* y*,

*τ*) is given by(3.23)where is the Fourier transform of the incident pressure

*p*

_{I}(

*y*

_{1},

*h*,

*y*

_{3},

*τ*) at

*y*

_{2}=

*h*.

Proceeding as before, we find(3.24)where(3.25)

For brevity we have used the same notation for in equations (3.22) and (3.24), although it should be understood that the *effective* components of the pressure fluctuations on opposite sides of the jet (respectively, near *y*_{2}=0, *h*) are statistically independent so that *p*_{L} and *p*_{U} are also statistically independent. This is because only small scale components of the turbulence, satisfying make a significant contribution to the integral and to the measured acoustic spectrum (figure 3). Indeed, using typical values of *h* and *U* given below in equation (5.1), and putting *ω*=2π*f*, this condition is seen to imply that statistical independence is well assured provided *f* exceeds 3–4 kHz.

## 4. The acoustic spectrum

### (a) The upper incisor

The statistical independence of the pressure fields *p*_{U}(**x**, *t*) and *p*_{L}(**x**, *t*), respectively, radiated by the upper and lower incisors, implies that the spectrum of the overall radiated sound is simply the algebraic sum of the individual spectra.

Consider first the contribution from the upper incisor, given by equation (3.24). In this equation is the Fourier transform of the ‘incident’ hydrodynamic pressure of the jet. The turbulent flow over the incisor shares many of the characteristics of a turbulent boundary layer, and far upstream of the trailing edge at *y*_{1}=0, *y*_{2}=*h*, the net hydrodynamic pressure on the rigid inner surface *y*_{2}=*h*−0 of the upper incisor is actually equal to (the effect of plane wall being simply to double the amplitude of the incident pressure). We shall set , and identify *p*_{S} with the *blocked* wall surface pressure beneath the turbulent boundary layer of a mean flow at speed *U*, whose statistical properties can be modelled analytically (Corcos 1967; Chase 1980; Blake 1986; Howe 1998). Then(4.1)and for a turbulent flow whose lateral width is Δ, that of the jet,(4.2)where the angle brackets 〈 〉 denote an ensemble average, and *P*(*k*_{1}, *k*_{3}, *ω*)≥0 is the wall pressure wavenumber–frequency spectrum (Howe 1998).

Hence, squaring equation (3.24), we find that(4.3)Thus, if the far field acoustic pressure frequency spectrum *Φ*_{U} (**x**, *ω*) is defined by(4.4)then(4.5)

Now *P*(*k*_{1}, 0, *ω*) is sharply peaked near *k*_{1}=*ω*/*U*_{c}, where *U*_{c}∼0.7*U* is a characteristic eddy convection velocity (Corcos 1967; Chase 1980; Blake 1986; Howe 1998), and this permits the approximation . The remaining integral can be expressed in the form(4.6)where is the Corcos (1967) spanwise turbulence correlation scale, and *Φ*_{pp}(*ω*) is the turbulence wall pressure frequency spectrum. Therefore, equation (4.5) becomes(4.7)

### (b) The lower incisor

A formally identical calculation can be performed using equation (3.22) to determine the far field acoustic pressure frequency spectrum for the lower incisor. In this case we introduce the hypothesis that the statistical characteristics of the incident pressure fluctuation *p*_{I} are similar to those for a turbulent boundary-layer flow at speed *U*. We then find that(4.8)where and so on, are assumed to take the same values as in equation (4.7) for the upper incisor.

### (c) The overall acoustic spectrum

A simple calculation (using standard formulae given in Erdélyi *et al*. (1953)) reveals that(4.9)where(4.10)Hence, if denotes the overall acoustic pressure spectrum, then we have, by combining equations (4.7) and (4.8),(4.11)

### (d) The transfer function

An approximate analytical representation of the transfer function *T*(*ω*), defined in equation (3.9), can be obtained by the usual method (Stevens 1998) of modelling the gross acoustic properties of the vocal tract during [s] production as a one-dimensional (1D) branched system. Our model is based on data given in Narayanan *et al*. (1995), and is strictly applicable only at the lower frequencies where wavelengths exceed local tract diameters, but it is only at such frequencies that a coherent response of the vocal tract is likely to be significant. It is illustrated in figure 5, and is conveniently partitioned into five acoustically coupled elements labelled F, O, M, C and E in figure 5*b*. These labels represent the near free space region close to the lips, respectively: a uniform section O of cross-sectional area *A*_{o} and length *ℓ*_{o} between the lips and teeth; a lower mouth cavity M between the lower surface of the tongue blade and the floor of the mouth (of cross-section *A*_{m} and depth *ℓ*_{m}), formed when the front of the tongue makes contact with the hard palate just behind the upper incisors; a uniform constriction C of cross-section *A*_{c} and length *ℓ*_{c} between the tongue and hard palate; and a slowly diverging posterior vocal tract region E terminating in the vicinity of the glottis, which may be assumed to be acoustically ‘closed’. The posterior vocal tract is modelled as an ‘exponential horn’ of overall length *ℓ*_{e} and small growth rate *α*_{e}, such that the(4.12)where *z* denotes distance measured into E from the rear end of the constriction C at which the cross-section is *A*_{e}.

In the reciprocal problem whose solution determines the acoustic Green's function, the component of the velocity potential of frequency *ω* in the free space region F can be approximated by(4.13)where the term first in *β* is the spherical wave radiated from the mouth in response to forcing by , and *r* is distance from the centre of the mouth.

Similarly, in the uniform section O between the teeth and lips(4.14)where *γ* and *δ* are suitable constants. The velocity potential and volume flux may be assumed to be continuous across the junction at between O and F when the characteristic wavelength is large compared with the mouth diameter (so that ) (see Lighthill 1978). Hence, by equating expressions for the local mean potential and volume flux, obtained with the aid of equations (4.13) and (4.14), we find thatBy eliminating *β* and noting that , it follows that(4.15)

When the coefficients *γ* and *δ* are known, the volume flux *V*_{t} into the mouth through the teeth is given by(4.16)and this can then be used to calculate the transfer function *T*(*ω*) via equation (3.9). To do this, a second equation in addition to equation (4.15) is required to determine *γ*, *δ*. The form of this equation is governed by the acoustic properties of the vocal tract to the rear of the teeth. These properties can be expressed in terms of the acoustic admittance of the vocal tract at the teeth, defined by(4.17)When *Y*_{t} is known, equations (4.15) and (4.17) may be solved for *γ* and *δ*, and equations (3.9) and (4.16) can be used to show that(4.18)

The admittance *Y*_{t} may be expressed in terms of the respective admittances *Y*_{m}, *Y*_{c} and *Y*_{e} of the acoustic elements M, C and E of figure 5*b* using well known recursive techniques for 1D sound propagation through branching acoustic systems (Lighthill 1978, ch. 2). Therefore, it will suffice to set down the following without proof:(4.19)(4.20)(4.21)(4.22)In these expressions and are the respective densities and sound speeds in sections M, C, E of the vocal tract.

## 5. Numerical results

Equation (4.11) will now be used to predict the far field acoustic spectrum in terms of the following empirical model of the wall pressure frequency spectrum *Φ*_{pp} (Howe 1998)(5.1)where *δ*_{*} is an equivalent boundary-layer *displacement* thickness, *ρ*_{o} is the mean air density, and *v*_{*} is the equivalent boundary-layer friction velocity. For a high Reynolds number, smooth wall boundary layer , but it takes larger values in rough wall flows and might be expected to be even larger for a ‘turning’ jet flow of the type envisaged in figure 2. Similarly, the displacement thickness *δ*_{*} is equal approximately to one-eighth of the boundary-layer thickness *δ* (Hinze 1975). In the absence of detailed measurements, the most rational assumption is that *δ*∼*h*/2, as for turbulent pipe flow, in which case *δ*_{*}≈*h*/16, and this approximation is used in the following. The spanwise correlation scale is *ℓ*_{3}≈*U*/*ω*.

The function *ℓ*_{3}*Φ*_{pp}(*ω*) is plotted in figure 6 for *h*=0.2 cm when *U*=5000 cm s^{−1}. According to Stevens (1998) the low frequency region (*f*<1 kHz) where the empirical model in equation (5.1) decreases rapidly, does not contribute significantly to the far field spectrum. Figure 6 also displays the variation of . This represents the efficiency with which the different frequency components of *ℓ*_{3}*Φ*_{pp} contribute to the production of sound and is seen to decrease smoothly by about 7 dB over the interval between 2 and 10 kHz. The remaining factors on the right of equation (4.11) depend predominantly on the overall acoustic properties of the vocal tract embodied within the transfer function *T*(*ω*). Our model may be contrasted with Stevens' (1971) model, in which the frequency dependence of the acoustic source strength was determined empirically from data relating to the fluctuating drag experienced by an obstacle in a duct (see figure 4*a*, which resembles the spectrum *ℓ*_{3}*Φ*_{pp} of our figure 6) and then modified by an ‘acoustic filter’ analogous to our transfer function *T*(*ω*).

Numerical predictions of the far field spectrum *Φ*(**x**,*ω*) (at a nominal distance of 20 cm directly in front of the lips) are given below for two cases designated case (i) and case (ii). For both cases it is assumed that(5.2)where Δ is the width of the jet, and *ℓ* is the effective curvilinear spanwise length of the gap between the upper and lower incisors.

In our theory, the overall acoustic properties of the vocal tract are determined by the values of the cross-sectional areas *A*_{o}, *A*_{m}, *A*_{c}, *A*_{e} and lengths *ℓ*_{o}, *ℓ*_{m}, *ℓ*_{c}, *ℓ*_{e} in the 1D model of the vocal tract depicted in figure 5. Measurements reported by Sundberg & Lindblom (1990) and Granqvist *et al*. (2003) indicate that during [s] production, the volume of the lower mouth cavity (M in figure 5*b*) lies between 0.5 and 3.0 cm^{3}. Other dimensions have been estimated from data given by Narayanan *et al*. (1995) for subject MI. For numerical purposes we have therefore assumed the values given in table 1.

Results have been computed for the two cases (i) *v*_{*}=0.05*U* and (ii) *v*_{*}=0.07*U*.

The actual effective friction velocity is not known with any precision but these values are typical of the relatively high values encountered in rough wall turbulent flows (Hinze 1975).

The walls of the vocal tract are compliant, and sound waves in such a 1D branching system interact with the walls through momentum and thermal boundary layers. These effects modify the speed of sound and cause waves to be damped. Thus, the effective sound speeds *c*_{m}, *c*_{c}, *c*_{e} that enter equations (4.19)–(4.22) for the admittance *Y*_{t} must be appropriately adjusted.

For the purpose of calculation, the corresponding corrections to the wavenumbers *k*_{m}, *k*_{c}, *k*_{e} were deduced from empirical measures of vocal tract resonance, or formant, bandwidths for vocal tracts producing various vowels (Fujimura & Lindqvist 1971; Fant 1972; Stevens 1998). The bandwidth is the absolute difference in frequencies on either side of the peak where the level is reduced by 3 dB from the peak. To model dissipative losses the wavenumber is assigned an imaginary part equal to where *c* is the local value of the adiabatic sound speed. The corrections were chosen to give bandwidths of 100, 40, 10, 16, 20, and 40 Hz, respectively, at 0, 400, 800, 3000, 4000, and 15 000 Hz. These were deduced from approximate measurements of the bandwidth during vowel articulations for the closed glottis condition, and after removing the effect of radiation damping (e.g. Stevens 1998, p. 259, 162). (The bandwidth at 15 000 Hz was not empirically determined, but was estimated from the bandwidth at 4000 Hz assuming a square-root dependence on frequency, as given by boundary-layer theory.) Bandwidths are also affected by the open glottis and the turbulence in the vocal tract. No attempt was made to model this, but we shall indicate below the probable influence of turbulence on our predictions (based on the procedure described by Stevens 1998, p. 163).

Figures 7 and 8 depict theoretical predictions of the far field acoustic pressure frequency spectra for cases (i) and (ii), respectively. Overlaid on these figures are the spectra shown previously in figure 3, measured by Badin (1989) and Shadle *et al*. (1991). While the values of the intra-oral pressure for these measurements are unknown, an estimate can be made using aerodynamic data obtained by Badin (1989) on a different occasion. Referring to figure 7 (*v*_{*}=0.05*U*), it is clear that there is broad correlation between theory and measurement of the major resonance speak at 4.5 kHz, whose magnitude is governed principally by the dimensions of cavities O and M of figure 5*b*. The dotted curve in figure 7 is our estimate of the effect on the spectrum of turbulence losses, using the empirical corrections proposed by Stevens (1998): the main change is a smoothing and flattening of the spectral peaks and a small reduction in amplitude, particularly in the frequency range 1–5 kHz. The results for case (ii) (*v*_{*}=0.07*U*) in figure 8, indicate the same overall spectral shape, but with the levels increased because of the increased value of the friction velocity.

According to Hixon *et al*. (1967) and Badin (1989), the OASPL in the far field scales as (Δ*p*_{o})^{α}, where *α*∼2.05–2.8. The dependence of the predicted far field OASPL on the jet velocity *U* is shown in figure 9 for cases (i) and (ii) with no allowance for turbulence losses, which are actually found to make very little difference). In both cases, linear regression indicates that the OASPL varies as *U*^{4.4}. Because , these results imply that *α*∼2.2. This conclusion appears to be the first occasion that the measured trend in the OASPL has been predicted theoretically, which is a consequence of our inclusion in the theory of jet ‘diffraction’. Indeed, the calculated exponent would be 1.8 if the source had consisted merely of the wall pressure spectrum , with no contribution from ‘diffraction’ of the jet pressure by the incisors, which occurs when the function in equation (4.11) is replaced by a constant. Stevens' (1971) conclusion that the OASPL grows more slowly than (Δ*p*_{o})^{3} (the value it would have for a free space dipole source) is confirmed by our results, and can be attributed to ‘filtering’ by the vocal tract.

## 6. Conclusion

The sibilant fricative [s] is produced by a jet of air striking the upper incisors. The aeroacoustic theory of this paper assumes that the jet is deflected downwards to pass through the gap between the upper and lower incisors. Sound is generated by the modulation of the mean flow caused by the ‘diffraction’ of jet turbulence pressure fluctuations by the sharp edges of the incisors. The acoustic output of this source is modified by ‘acoustic filtering’ by the vocal tract. This output has previously, been calculated (Stevens 1971, 1998) by modelling the source as a simple dipole filtered by the vocal tract.

The specific characteristics of [s] are different for different speakers, and for the same speaker, from one utterance to the next. Those variations that depend on source flow morphology are incorporated (for the first time) in the theoretical model of this paper. For instance, the distance between upper and lower incisors generally varies from speaker to speaker. Further, the same speaker also tends to vary this distance by moving the jaw horizontally; a reduction in the sibilance of [s] is obtained by translating the jaw rearward, causing a reduction of the diffracted energy at the higher frequencies. Similarly, variations in the spectrum and sound pressure level of [s] will occur because of changes in the intra-oral pressure (i.e. in the jet velocity) and in the cross-sectional shape of the jet where it strikes the teeth. Our calculated SPL agrees well with measurements for reasonable values of the air volume flow rate and vocal tract dimensions. The predicted dependence of the overall SPL on intra-oral pressure is also in excellent accord with previously observed trends. This appears to be the first occasion on which absolute levels have been predicted without recourse to empirically defined source levels and correction factors (cf. Stevens 1988, pp. 398–399).

Nonetheless, it is obviously desirable that the model postulated in this paper, in which the air jet from the constriction is deflected downwards to pass between the gap separating the upper and lower incisors, should be confirmed by observation. Hitherto, the subject was noted for an absence of detailed flow measurements of this kind, but it is evident that a proper understanding of the flow is crucial to ensure successful prediction from aeroacoustic calculation.

The theory developed in this paper applies to humans with their incisors intact. Nonetheless, special cases can obviously be considered by the same method. For example, to treat the case of a child with missing baby teeth and without an adult set, it would be necessary simply to modify the potential flow model of §3 used in the evaluation of the acoustic Green's function and to take account of the changed nature of the jet flow between the upper and lower incisors. The conformal mapping procedure (§3) is applicable provided that the potential flow can be assumed to be locally 2D. The potential flow for more general, three-dimensional (3D) geometries would need to be investigated numerically.

## Acknowledgements

This work was supported by NIDCD grant no. 01247 to CReSS LLC.

## Footnotes

As this paper exceeds the maximum length normally permitted, the authors have agreed to contribute to production costs.

- Received December 5, 2003.
- Accepted September 14, 2004.

- © 2005 The Royal Society