## Abstract

We obtain the central limit theorem for fluctuations of Young diagrams around their limit shape in the bulk of the ‘spectrum’ of partitions *λ*⊢*n*∈ (under the Plancherel measure), thus settling a long-standing problem posed by Logan & Shepp. Namely, under normalization growing like , the corresponding random process in the bulk is shown to converge, in the sense of finite-dimensional distributions, to a Gaussian process with independent values, while local correlations in the vicinity of each point, measured on various power scales, possess certain self-similarity. The proofs are based on the Poissonization techniques and use Costin–Lebowitz–Soshnikov's central limit theorem for determinantal random point processes. Our results admit a striking reformulation after the rotation of Young diagrams by 45°, whereby the normalization no longer depends on the location in the spectrum. In addition, we explain heuristically the link with an earlier result by Kerov on the convergence to a generalized Gaussian process.

## 1. Introduction

*Partitions* of natural numbers (i.e. decompositions *λ*⊢*n*∈ into sums of non-negative integer parts, *n*=*λ*_{1}+*λ*_{2}+⋯, *λ*_{1}≥*λ*_{2}≥⋯≥0) and their geometric counterpart, *Young diagrams* (i.e. planar figures formed by consecutive columns with *λ*_{1}, *λ*_{2}, … unit square cells), are a classical subject of discrete mathematics dating back to Euler, which has since grasped the imagination of many mathematicians including G. H. Hardy and S. Ramanujan (see Andrews 1976). More recently, interest in asymptotic properties of a ‘typical’ partition *λ*⊢*n* as *n*→∞ has led to the idea to treat the set of partitions _{n}={*λ*⊢*n*} as a random ensemble equipped with a uniform probability measure (so that each partition of *n* is ‘equally likely’). The principal technical device that makes this idea work is based on a suitable randomization of the parameter *n* (thus replacing _{n} by =∪_{n}_{n}), which can be accomplished in such a way that the parts of a random partition *λ*∈ become statistically independent (Fristedt 1993; Vershik 1995, 1996).

The probabilistic approach has proved to be extremely powerful and efficient, offering a unified explanation of statistical properties obtained earlier and, more importantly, yielding many new, often surprising discoveries including the ‘limit shape’ of Young diagrams associated with uniform partitions (Vershik 1995, 1996). Similar methods have also led to deep results about the asymptotic structure of the constituting components (‘spectrum’) for many other classes of growing combinatorial objects (see Arratia & Tavaré 1994; Arratia *et al*. 2003).

Quite a different type of probability distribution on the space _{n}={*λ*⊢*n*} is the *Plancherel measure*, motivated by representation theory of the symmetric group _{n} (i.e. the group of permutations of order *n*). It is defined by (see Fulton 1997)(1.1)where *d*_{λ} is the number of *standard tableaux* of a given shape *λ* (i.e. arrangements of the numbers 1, …, *n* in the cells of the Young diagram *λ,* such that the numbers increase along the rows and up the columns). The well-known Robinson–Schensted–Knuth (RSK) correspondence (see Fulton 1997) establishes a bijection between permutations *σ*∈_{n} and pairs of standard tableaux of the same shape *λ*(*σ*)∈_{n}. This implies that , and so equation (1.1) determines a probability distribution on _{n}.

Besides representation theory, the Plancherel measure arises rather unexpectedly in some combinatorial and probabilistic problems (see Deift 2000). For example, the RSK correspondence implies that the Plancherel distribution of the partition's largest term *λ*_{1}=max{*λ*_{i}∈*λ*⊢*n*} coincides with the distribution of the length *L*_{n} of the longest increasing subsequence in a random (uniformly distributed) permutation *σ*∈_{n} (Baik *et al*. 1999; Deift 2000). The asymptotic behaviour of *L*_{n} was queried by Ulam's Problem (Deift 2000), ultimately solved by Vershik & Kerov (1977) who showed that *L*_{n} grows like (cf. (1.6)).

Logan & Shepp (1977) and, independently, Vershik & Kerov (1977) discovered that, under the Plancherel measure *P*_{n} as *n*→∞, a typical Young diagram *λ*⊢*n*, appropriately scaled, has a certain limit shape *y*=*ω*(*x*). More precisely, set(1.2)where ⌈*x*⌉≔min{*m*∈: *m*≥*x*} is the ceiling integer part of *x*, and let the function *y*=*ω*(*x*), *x*≥0, be defined on [0, 2] by the parametric equations(1.3)and continued as zero for all *x*>2. Then, the random process(1.4)satisfies the law of large numbers (Vershik & Kerov 1977)(1.5)In particular, since , formula (1.5) implies the law of large numbers for the maximum term in a random partition,(1.6)

Owing to the invariance of the Plancherel measure under the transposition of Young diagrams *λ*↔*λ*′ (when columns of the diagram *λ* become rows of the transposed diagram *λ*′ and vice versa), the same law of large numbers holds for (i.e. for the number of parts in the random partition *λ*).

## 2. Fluctuations of Young diagrams

A natural question about the limit distribution for fluctuations of Young diagrams under the Plancherel measure was posed by Logan & Shepp (1977). One might expect that a central limit theorem should hold, at least *in the bulk* of the partition spectrum (i.e. for *λ*_{i}∈*λ*⊢*n* such that ). However, the problem has remained open as yet, and it is the purpose of the present paper to tackle it. In fact, we will show that the random process *Δ*_{n}(*x*), *x*∈(0, 2), normalized by , converges in the sense of finite-dimensional distributions (FDDs) to a Gaussian process with zero mean and independent values (§3). Moreover, local correlations in the vicinity of each point, measured on various power scales, possess certain self-similarity properties (§§3*a* and 7). The randomization method (called ‘Poissonization’, see §4) will be crucial for our considerations, but unlike the uniform distribution case it does not remove the dependence between the parts. Instead, our argument is based on the fundamental fact that the correlation functions of the Poissonized point process {*λ*_{i}−*i*} have a *determinantal structure*, so one may apply a general central limit theorem for determinantal random point processes (§5). Unfortunately, the corresponding calculations are quite involved, so in order to keep this paper to a reasonable size we will only outline the proof, leaving most of the more technical details to a separate publication.

The FDD convergence to a Gaussian process that we establish in the present paper should be contrasted with an earlier result by Kerov (1993) who proved the asymptotic normality of *global* fluctuations (i.e. integrals of *Δ*_{n}(*x*) with respect to a suitable class of test functions). That is to say, the random process *Δ*_{n}(*x*) converges in distribution (without any normalization!) to a *generalized* Gaussian process, whose sample paths make sense only as generalized functions rather than ordinary functions (cf. (2.2)).

To state Kerov's result more precisely, it is convenient to pass to the coordinates *u*=*x*−*y*, *v*=*x*+*y*, corresponding to anticlockwise rotation by 45° and dilation by . In the new coordinates, the boundary of the Young diagram is determined by the corresponding piecewise-linear (continuous) function , and the limit shape is given by (Vershik & Kerov 1977)Then (Kerov 1993; see also Ivanov & Olshanski 2002), the random process(2.1)converges in distribution to a generalized Gaussian process , *u*∈[−2, 2], defined by the formal random series(2.2)where {*X*_{k}} are mutually independent random variables with standard normal distribution (0, 1).

Convergence in a generalized sense is a manifestation of increasing ‘roughness’ of Young diagrams, which is nevertheless suppressed after integration. Admittedly, this result might have cast doubt on the validity of the usual convergence, so it is quite surprising that an ordinary limit does exist (under a modest logarithmic normalization). Let us point out that the limiting Gaussian process appears to be a continuous-time ‘white noise’ (i.e. a continuum set of independent normal variables), which is very rough in its own right, with sample paths being nowhere measurable! Coexistence of both generalized and ordinary (though extremely irregular) versions of the limit must seem striking indeed, and we will comment more on this peculiar situation in §6.

Note that the asymptotic behaviour of fluctuations at the upper edge of the limiting spectrum (corresponding to *x*=0) is different from Gaussian. As was shown by Baik *et al*. (1999) for *λ*_{1} and by Borodin *et al*. (2000), Okounkov (2000) and Johansson (2001) for any *λ*_{k} with fixed *k*=1, 2, …,(2.3)where *F*_{k}(.) is the cumulative distribution function (CDF) of the *k*-th order statistic in the Airy point process discovered earlier by Tracy & Widom (1994) in connection with the limit distribution of largest eigenvalues of random matrices in the Gaussian unitary ensemble (GUE). In particular, *F*_{1}(.) is known as the *Tracy–Widom* CDF.

## 3. Main results

### (a) Limit theorems in the bulk

Recall that the random variable *Δ*_{n}(*x*) is given by equation (1.4), and set(3.1)where(3.2)is the value of the parameter *θ* in equations (1.3) corresponding to the coordinates *x* and *y*=*ω*(*x*).

*Suppose that x*_{n}→*x*∈(0, 2) *as n*→∞. *Then, the distribution of the random variable Y*_{n}(*x*_{n}) *with respect to the Plancherel measure P*_{n} *converges to the standard normal distribution* (0, 1).

The asymptotics of the FDDs of the random process *Δ*_{n}(*x*) are described by theorem 3.2. We write *c*_{n}≈1 if *c*_{n}*n*^{ϵ}→∞, *c*_{n}*n*^{−ϵ}→0 for any *ϵ*>0, and *a*_{n}≈*b*_{n} if *a*_{n}/*b*_{n}≈1.

*Suppose that x*_{n,i}→*x*_{i}∈(0, 2) *as n*→∞ (*i*=0, …, *m*) *and* , 0≤*s*_{i}≤1 (*i*=1, …, *m*). *Then, the random vector* (*Y*_{n}(*x*_{n,0}), …, *Y*_{n}(*x*_{n,m})) *converges in distribution to a Gaussian vector* *with zero mean and covariance matrix K with the elements* (

*i*<

*j*),

*K*

_{ii}=1.

Theorem 3.2 has a number of insightful implications. First of all, by choosing *s*_{1}=⋯=*s*_{m}=0, we obtain an FDD generalization of theorem 3.1.

*Suppose that x*_{n,i}→*x*_{i}∈(0, 2) *as n*→∞ (*i*=0, …, *m*), *where x*_{i}≠*x*_{j} (*i*≠*j*). *Then, the random vector* (*Y*_{n}(*x*_{n,0}), …, *Y*_{n}(*x*_{n,m})) *converges in distribution to a Gaussian vector with independent standard normal components*.

The local structure of correlations with regard to a given power scale in the vicinity of any point is described as follows.

*Fix x*∈(0, 2), *s*∈(0, 1], *and set x*_{n}(*t*)≔*x*+*tn*^{−s/2}. *Then, the random process Y*_{n}(*x*_{n}(*t*)) *converges, in the FDD sense, to a stationary Gaussian process Z*^{(s)}(*t*) (*t*∈) *with zero mean and the covariance function K*^{(s)}(*t*,*t*′)=*s* (*t*≠*t*′), *K*^{(s)}(*t*,*t*)=1.

Another interesting consequence is obtained for a variable spatial scale.

*Fix x*∈(0, 2), *a*≠0, *and set x*_{n}(*s*)≔*x*+*an*^{−s/2} (0≤*s*≤1). *Then, the random process Y*_{n}(*x*_{n}(*s*)) *converges, in the FDD sense, to a Gaussian process Z*(*s*) (0≤*s*≤1) *with zero mean and the covariance function K*(*s*,*s*′)=min{*s*,*s*′} (*s*≠*s*′), *K*(*s*,*s*)=1.

Results similar to theorems 3.1 and 3.2 were obtained by Gustavsson (2005) in the bulk of the spectrum of random matrices in the GUE.

By a well-known criterion of convergence in distribution (Billingsley 1999, theorem 2.3), the condition *x*_{n,i}→*x*_{i}∈(0, 2) in theorems 3.1 and 3.2 may be relaxed to .

### (b) ‘Rotated’ versions

It is instructive to reformulate theorems 3.1 and 3.2 in the ‘rotated’ coordinates *u* and *v* (see §2). To this end, one needs to find the sliding projection (divided by ) of the deviation (see (2.1)) onto the line *u*+*v*=0 along the tangent to the graph of *Ω*(*u*) at the point *u*. Differentiating equations (1.3), we getwhich implieswhere and *η*_{n}→0 (in probability). Hence, setting(3.3)we obtain the following elegant versions of theorems 3.1 and 3.2, whereby—quite surprisingly—the normalization does not depend on the location in the spectrum.

*If u*_{n}→*u*∈(−2, 2) *as n*→∞, *then the distribution of the random variable* *with respect to the Plancherel measure P*_{n} *converges to* (0, 1).

*Suppose that u*_{n,i}→*u*_{i}∈(−2, 2) *as n*→∞ (*i*=0, …, *m*) *and* , 0≤*s*_{i}≤1 (*i*=1, …, *m*). *Then, the random vector* *converges in distribution to a Gaussian vector* *with zero mean and the same covariance matrix K as in*

*theorem*3.2.

Of course, implications parallel to corollaries 3.3–3.5 are also valid here.

### (c) Limit theorems at the edges

Let us describe the asymptotics of the random function *Δ*_{n}(*x*) at the ends of the limit spectrum, i.e. for *x*=0 and *x*=2. By the definition (1.4), , and according to equation (2.3)where *F*_{1}(.) is the Tracy–Widom CDF (see §2). However, the limit distribution of appears to be discrete.

*Under the Plancherel measure P*_{n}, *for any z*≥0*where* [*z*] *is the integer part of z and the CDF F*_{k}(.) *is the same as in equation* (2.3).

Indeed, using the invariance of the measure *P*_{n} under the transposition *λ*↔*λ*′ (see remark 1.1), we have, due to equation (2.3),

In the rotated coordinates *u* and *v*, a similar result holds for both edges,(3.4)

## 4. Poissonization

The proof of our results is based on the *Poissonization* technique (see Baik *et al*. 1999). Denote (by convention, _{0} consists of the ‘empty’ partition of zero) and for *λ*∈, set . The Poissonization *P*^{t} (*t*>0) of the measure *P*_{n} is defined as follows:(4.1)Formula (4.1) defines a probability measure on the set , since for *λ*∈_{n} we have |*λ*|=*n* and hence

We first establish the Poissonized versions of theorems 3.1 and 3.2. Let *Y*_{t}(.) be given by formula (3.1) with *n* replaced by *t*.

*Suppose that x*_{t}→*x*∈(0, 2) *as t*→∞. *Then, the distribution of the random variable Y*_{t}(*x*_{t}) *with respect to the measure P*^{t} *converges to* (0, 1).

*In the notations of theorem* 3.2 (*with n replaced by t*), *the random vector* (*Y*_{t}(*x*_{t,0}), …, *Y*_{t}(*x*_{t,m})) *converges in distribution* (*with respect to P*^{t}) *to a Gaussian vector* *with zero mean and the same covariance matrix K*.

To derive theorems 3.1 and 3.2 from theorems 4.1 and 4.2, we use a *de-Poissonization* method. According to equations (1.1) and (4.1), *P*^{t} can be viewed as the expectation of the random measure *P*_{N}, where *N* is a Poisson random variable with parameter *t,*(4.2)Since *N* has mean *t* and standard deviation , formula (4.2) suggests that the asymptotics of the probability *P*_{n}(*A*) as *n*→∞ may be recovered from that of *P*^{t}(*A*) as *t*∼*n*→∞. More precisely, one can prove that *P*_{n}(*A*)∼*P*^{t}(*A*) as *t*∼*n*→∞, provided that variations of *P*_{k}(*A*) are small in the zone . In the context of random partitions, such a result was obtained by Johansson (see Baik *et al*. 1999).

## 5. Sketch of the proof of theorem 4.1

Using equations (1.3) and (3.2), the statement of theorem 4.1 is equivalent to saying that for any *z*∈(5.1)whereand *Φ*(.) is the CDF of the standard normal law (0, 1). Set *I*_{t}≔[*a*_{t},∞) and let #*I*_{t} be the number of points in the random set (*λ*∈) that fall in the interval *I*_{t}. Then, recalling the definition (1.2) of the function *λ*(.) and using that the sequence {*λ*_{i}−*i*} is strictly decreasing, it is easy to see that equation (5.1) is reduced to(5.2)

For *k*=1, 2, …, define the *k*-point correlation functions by(5.3)The key fact is that the functions have a *determinantal structure* (Borodin *et al*. 2000; Johansson 2001),(5.4)with the kernel of the form(5.5)where is the Bessel function of integral order *m*. Then, by a central limit theorem due to Soshnikov (2000), generalizing an earlier result by Costin & Lebowitz (1995) (see further generalization set in a purely probabilistic framework in Ben Hough *et al*. 2006), the distribution of converges to (0, 1), provided that Var(#*I*_{t})→∞. Thus, in order to derive the limit (5.2), it remains to obtain the asymptotics of the first two moments of #*I*_{t} with respect to the measure *P*^{t}. The lemma 5.1 is the main technical (and the most difficult) part of the work.

*As t*→∞,

For the proof of lemma 5.1, we first derive suitable expressions for the expectation and variance of #*I*_{t}. Recalling the definition of the correlation functions (5.3) together with the determinantal property (5.4) and using an identity from Borodin *et al*. (2000, proposition 2.9), we obtain(5.6)Similarly, for the variance we get (cf. Johansson 2001; Gustavsson 2005)(5.7)which again can be expressed in terms of the Bessel function using equation (5.5). The rest of the proof is based on a direct asymptotic analysis of equations (5.6) and (5.7), heavily exploiting the asymptotics of the Bessel function in various regions of variation of its parameters. Calculations are straightforward but quite laborious, and we omit them here.

The proof of theorem 4.2 follows the similar ideas using a central limit theorem by Soshnikov (2002) for linear statistics of the form .

## 6. Link with Kerov's result

Let us discuss the link between our results and the limit theorem by Kerov (1993) (see §2). In particular, our goal is to explain heuristically the mechanism of the effects that take place for the process .

Note that if |*u*−*u*′|=*n*^{−s/2}, then *s*=−2 log|*u*−*u*′|/log *n*. That is to say, ‘time’ *s* indexing the components of the limit Gaussian vector in theorem 3.7 has the meaning of the logarithmic distance between points *u* and *u*′, normalized by log *n*. From this point of view, theorem 3.7 suggests thatIn fact, in the course of the proof of theorems 3.2 and 3.7, we obtain that for any *ϵ*>0, the following estimates hold uniformly in *u*,*u*′∈(−2, 2):(6.1)

Now, setting , from equation (6.1) we havesince the function log|*u*−*u*′| is integrable at zero. Hence, is bounded in distribution as *n*→∞, which helps to understand why Kerov's (1993) result holds without any normalization (see §2).

Conversely, the limiting process defined in equation (2.2) can be used to render information contained in theorems 3.6 and 3.7 (at least heuristically). To this end, observe that since the number of parts in a typical partition is close to (see remark 1.1), it is reasonable to think that the number of ‘degrees of freedom’ of a random partition *λ*⊢*n* is of the order of , and hence , *u*∈(−2, 2), may be represented by a partial sum of the series (2.2)Then, for *u*=2 cos *θ* and *u*′=2 cos *θ*′, we get, using some trigonometry, that(6.2)As *m*→∞, the second sum on the right-hand side of equation (6.2) converges for all *θ*,*θ*′∈(0, *π*). For *θ*=*θ*′, from equation (6.2) we getand it follows that the distribution of converges to (0, 1), in agreement with theorem 3.6. Moreover, if *u*−*u*′≈*n*^{−s/2} (hence *θ*−*θ*′≈*n*^{−s/2}), then the first sum on the right-hand side of equation (6.2) is approximated by the integralInserting this into equation (6.2), we obtainas predicted by theorem 3.7.

## 7. Concluding remarks

The limiting random process *Z*^{(s)}(*t*) in corollary 3.4 can be represented (in the FDD sense) aswhere {*ζ*_{t}} are mutually independent random variables with normal distribution (0, 1). A similar representation for the process *Z*(*s*) of corollary 3.5 reads(7.1)where *B*(*s*) is a standard Brownian motion (i.e. with covariance function given by min{*s*,*s*′} and *B*(0)=0), independent of the family {*ζ*_{s}}. These decompositions show that the random processes *Z*^{(s)}(*t*) and *Z*(*s*) are highly irregular (e.g. stochastically discontinuous unless *s*=1), which is a manifestation of asymptotically fast oscillations of the process *Δ*_{n}(*x*) (as well as ) in the vicinity of each point 0<*x*<2 (respectively, −2<*u*<2).

Furthermore, decomposition (7.1) implies that the reverse-time increment process *Z*^{*}(*s*)≔*Z*(1−*s*)−*Z*(1) is represented asand it follows that *Z*^{*}(*s*) is a *self-similar process* with the Hurst parameter *H*=1/2 (see Embrechts & Maejima 2002),

Finally, let us briefly comment on the nature of convergence we have established. Let , *u*∈[−2, 2], be a random process with independent values, such that has a standard normal distribution for each *u*∈(−2, 2), whereas (with probability 1). Theorem 3.7 (cf. corollary 3.3) and equation (3.4) imply that the random process converges to in the FDD sense, and it is of interest to ask whether this can be extended to weak convergence (cf. Logan & Shepp 1977, p. 211). However, the answer appears to be negative, at least under the natural choice of the space of continuous functions *C*[−2, 2], because the necessary condition of tightness (Billingsley 1999) breaks down. Indeed, for any *δ*>0, *ϵ*>0 and all *u*,*u*′∈(−2, 2) such that |*u*−*u*′|≤*δ*, we haveThe analogous remark applies to the process *Y*_{n}(*x*), *x*∈[0, 2], considered in the space *D*[0, 2] of *cadlag* functions (i.e. right continuous with left limits).

## Acknowledgments

This work was done when Z. Su was visiting the University of Leeds under a Royal Society K. C. Wong Education Foundation International Incoming Fellowship. His research was also supported by the National Science Funds of China (grant no. 10 371109). The authors are grateful to Sir J. F. C. Kingman, A. Okounkov and Ya. G. Sinai for their interest in this work, and to A. V. Gnedin, J. T. Kent and Yu. V. Yakubovich for useful discussions. The authors also thank the unknown referees for their careful reading of the manuscript and helpful remarks.

## Footnotes

In memory of Sergei Kerov (1946–2000).

- Received August 17, 2006.
- Accepted December 15, 2006.

- © 2007 The Royal Society