## Abstract

Quantum mechanics imposes a fundamental trade-off between the accuracy of time measurements and the size of the systems used as clocks. When the measurements of different time intervals are combined, the errors due to the finite clock size accumulate, resulting in an overall inaccuracy that grows with the complexity of the set-up. Here, we introduce a method that, in principle, eludes the accumulation of errors by coherently transferring information from a quantum clock to a quantum memory of the smallest possible size. Our method could be used to measure the total duration of a sequence of events with enhanced accuracy, and to reduce the amount of quantum communication needed to stabilize clocks in a quantum network.

## 1. Introduction

Accurate time measurements are important in a variety of applications, including global positioning systems (GPS) [1], frequency standards [2] and astronomical observations [3,4]. But the accuracy of time measurements is not just a technological issue. At the most fundamental level, every clock is subject to an unavoidable quantum limit, which cannot be overcome even with the most advanced technology. The limit has its roots in Heisenberg’s uncertainty principle, which implies fundamental bounds on the accuracy of time measurements [5–7]. For an individual time measurement, the ultimate quantum limit can be attained by initializing the clock in a suitably engineered superposition of energy levels [8–10]. However, the situation is different when multiple time measurements are performed on the same clock (e.g. in order to measure the total duration of a sequence of events) or on different clocks (e.g. in GPS technology). In these scenarios, the errors accumulate, resulting in an inaccuracy that grows linearly with the number of measurements. To address this problem, one may try to minimize the number of measurements: instead of measuring individual clocks, one could store their state into the memory of a quantum computer, process all the data coherently, and finally read out the desired information with a single measurement. However, quantum memories are notoriously expensive and hard to scale-up [11]. This leads to the fundamental question: how much memory is required to record time at the quantum level?

Here, we derive the ultimate quantum limit on the amount of memory needed to record time with a prescribed accuracy. The limit is based on a Heisenberg-type bound, expressing the trade-off between the accuracy in the read-out of a given parameter and the size of the system in which the parameter is encoded. We show that the bound is tight, by constructing a protocol that faithfully transfers information from the system to a quantum memory of minimal size. The protocol, which we call *quantum stopwatch*, freezes the time evolution of a clock by storing its state into the state of the memory, as in figure 1. The quantum stopwatch protocol works with clocks made of many identical and independently prepared particles, a common setting when the clocks are identical atoms or ions [12]. The use of identical particles can also be thought as a simple repetition code for transmitting time information. As our protocol uses identically prepared particles, the optimal scaling with the memory size is robust to depolarization of the clocks and to particle loss.

Storing time coherently into a quantum memory is a useful primitive for many applications. As an illustration, we construct a quantum-enhanced protocol to measure the total duration of a sequence of events. The same protocol can be used to establish a shared frequency standard among the nodes of a network, and to generate quantum states with Heisenberg-limited sensitivity to time evolution and to phase shifts.

## 2. The size-accuracy trade-off

Suppose that a parameter *T* is encoded in the state of a quantum system, say *ρ*_{T}. The system can be either a quantum clock, where *T* is the time elapsed since the beginning of the evolution, or a quantum memory, where the dependence of *ρ*_{T} on *T* can be completely arbitrary. In general, the parameter *T* does not have to be time: it can be phase, frequency or any other real parameter.

When needed, one can extract information about the parameter *T* by measuring the system. The question is how accurate the measurement can be. The inaccuracy of a given measurement can be quantified by the size of the smallest interval, centred around the true value, in which the measurement outcome falls with a prescribed probability, for example, *P*=99%. Explicitly, the inaccuracy has the expression
*P*(*δ*,*T*) is the probability that the measurement outcome *δ* centred around the true value *T*.

Note that the inaccuracy can generally depend on the true value *T*, which is unknown to the experimenter. The dependence can be removed by fixing a fiducial interval *T* is in one-to-one correspondence with the state of the system [13]. We denote by *δ*(*P*) the worst-case value of the inaccuracy within the fiducial interval. In the Bayesian approach, *δ*(*P*) provides a lower bound on the probability that the true value falls within an interval of size *δ*(*P*) around the measured value *P* for every prior distribution on *T* and for all values of

We now derive a fundamental lower bound on the inaccuracy, expressed in terms of the size of the quantum system used to encode the parameter *T*. Let us denote by *D* the dimension of the smallest subspace containing the eigenvectors of the states *D* can be regarded as the *effective dimension* of the system used to encode the parameter *T*. In terms of the effective dimension, the inaccuracy satisfies the bound
*T* and for arbitrary quantum measurements. We call equation (2.2) the *size-accuracy bound*.

The size-accuracy bound follows from dividing the fiducial interval *ΔT* into *N*=⌊*ΔT*/*δ*(*P*)⌋ disjoint intervals of size *δ*(*P*). One can then encode the midpoint value of the *i*th interval into the state *ρ*_{Ti}. In this way, one obtains *N* quantum states, which can be distinguished with probability of success at least *P* (one has just to estimate *T* and to declare the state *ρ*_{Ti} if the estimate of *T* falls in the *i*th interval). On the other hand, the *N* states are contained in an *D*-dimensional subspace, and therefore the probability of success is upper bounded by *D*/*N* [14], leading to the bound *P*≤*D*/*N* and, in turn, to equation (2.2).

The size-accuracy bound (2.2) captures in a unified way the Heisenberg scaling of quantum clocks and the ultimate limits on the memory needed to store the parameter *T*. Let us first see how it implies the Heisenberg scaling of quantum clocks. Consider a clock made of *n* identical non-interacting particles, each evolving with the same periodic evolution *H* is the single-particle Hamiltonian. If the particles are initialized in the state |*Ψ*〉, then the quantum state at time *T* is *H* must be integer multiples of a given energy. This implies that the number of distinct eigenvalues of the *n*-particle Hamiltonian grows linearly with *n*, and, therefore, all the states |*Ψ*_{T}〉 are contained in a subspace of dimension proportional to *n*. Hence, one obtains the bound *δ*(*P*)≥*c*/*n* for some suitable constant *c*>0. Note that this relation also holds for mixed states, because mixing can only increase the inaccuracy (see Material and method). Moreover, the relation *δ*(*P*)≥*c*/*n* holds for arbitrary measurements.

The bound *δ*(*P*)≥*c*/*n* implies the familiar Heisenberg bound on the standard deviation of the best-unbiased measurement. The argument is simple: by Chebyshev’s inequality, the standard deviation *σ* satisfies the bound *δ*(*P*)≥*c*/*n* implies the Heisenberg scaling of the standard deviation. It is important to stress that our ‘Heisenberg-like’ bound *δ*(*P*)≥*c*/*n* holds even when the measurement in question is not unbiased.

The size-accuracy bound (2.2) can also be applied to memories. Suppose that one wants to write down the parameter *T* with accuracy *δ*(*P*) into a quantum memory of *q* qubits. Then, equation (2.2) implies that, no matter what encoding is used, the number of memory qubits must be at least
*quantum memory bound*. In the following, we show that the bound is tight, meaning that there exist quantum states and quantum measurements for which equation (2.3) holds with the equality sign. Moreover, we show that these states can be generated from an ensemble of identically prepared quantum particles by applying a compression protocol that minimizes the memory size while preserving the accuracy.

## 3. Compressing time information: the noiseless scenario

Consider a quantum clock made of *n* identical particles oscillating between two energy levels. Restricting the attention to these levels, each particle can be modelled as a qubit. In the absence of noise, the evolution of each qubit is governed by the Hamiltonian *H*=*E*_{0}|0〉〈0|+*E*_{1}|1〉〈1|, where *E*_{0} and *E*_{1} are the energy levels and |0〉 and |1〉 are the corresponding eigenstates. For each individual qubit, the best clock state is the uniform superposition *T* is

It is well known that *n* qubits in an entangled state can achieve the Heisenberg scaling 1/*n* in terms of standard deviation, from which it is immediate that the same scaling can be achieved in terms of inaccuracy.^{1} However, here we consider *n* qubits in a product state—specifically, the product state |*ψ*_{T}〉^{⊗n} obtained by preparing each qubit in the optimal single-copy state. The state |*ψ*_{T}〉^{⊗n} has the standard scaling

To understand how the compression works, it is useful to expand the clock state as
*B*_{k,n,1/2} is the binomial distribution with probability 1/2, and |*n*,*k*〉 is the state obtained by symmetrizing the state |1〉^{⊗k}⊗|0〉^{⊗n−k} over the *n* qubits. The key observation is that, for large *n*, the binomial distribution is concentrated in an interval of size *k*〉=⌊*n*/2⌋. This means that the state |*ψ*_{T}〉^{⊗n} can be compressed into a typical subspace of dimension *n* increases. After the clock state has been projected in the typical subspace, it can be encoded into *P*, and the number of memory qubits, equal to

The original state |*ψ*_{T}〉^{⊗n} can be retrieved from the compressed state, up to an error that vanishes exponentially fast with *n*. Owing to the exponential decay of the error, a good compression performance can obtained already for small clocks: for example, *n*=16 is already in the asymptotic regime for all practical purposes. A compression from 16 clock qubits to four memory qubits can be done with a compression error of 5.5×10^{−3}, in terms of the trace distance, or 3.0×10^{−5}, in terms of the infidelity. Relatively high-quality compression can be obtained also for smaller number of qubits: for example, four clock qubits can be encoded into two memory qubits with fidelity 87.9%.

Summarizing, the state of *n* identically prepared clock qubits can be compressed into *frequency projection*.

## 4. Extension to mixed states and noisy evolution

We have seen that the optimal trade-off between inaccuracy and memory size can be achieved for pure states with unitary time evolution. A similar result can be obtained also in the noisy case. Let us consider first the case where noise affects the state preparation, while the evolution itself is still unitary. In this model, each clock qubit starts off in the mixed state *ρ*_{0,p}=*p*|*ψ*_{0}〉〈*ψ*_{0}|+(1−*p*)*I*/2 and evolves to the state *ρ*_{T,p}=*p*|*ψ*_{T}〉〈*ψ*_{T}|+(1−*p*)*I*/2. Physically, we can think of these mixed states as the result of dephasing noise on the pure states |*ψ*_{T}〉.

In general, the amount of dephasing may vary from one qubit to another. However, as long as the variations are random and affect all qubits equally and independently, the state of the clock can be described as *p*. Even more generally, one could consider some types of correlated noise, where the errors acting on different qubits are part of an (ideally infinite) exchangeable sequence [15]. Physically, this means that each qubit undergoes a random phase kick, possibly correlated with the phase kicks received by the others, but without any systematic bias that makes one qubit more prone to noise than the others. The model of exchangeable dephasing noise includes the correlated errors due to an overall uncertainty on the initial time of the evolution. In general, de Finetti’s theorem [15] implies that exchangeable dephasing errors lead to a mixture of states of the form *T*_{0} is a random shift of the time origin and *p* is a random single-qubit dephasing parameter. Owing to this fact, we can focus first on the compression of the clock states *p* to vary.

The clock state *n* qubit system into a tripartite system, consisting of a spin register, a rotation register and a permutation register, as in the following equation:
*J* is the quantum number of the total spin, *q*_{J,p} is a probability distribution, *ρ*_{T,p,J} is a state of the rotation register and *ω*_{J} is a fixed state of the permutation register, independent of *T* and *p*.

As the states of the spin register are orthogonal, the value of *J* can be read out without disturbing the state. The problem is then to store the state *ρ*_{T,p,J}⊗*ω*_{J} in the minimum amount of memory. Note also that the permutation register can be discarded, for it contains no information about *T*. Hence, the problem is actually to store the state *ρ*_{T,p,J}. This can be done through the technique of frequency projection, which is realized here by projecting the state into the subspace spanned by eigenstates of the total Hamiltonian in an interval of size J around the mean.

It turns out that the error introduced by frequency projection is negligible for large *J*. Specifically, we showed that the trace distance between the original state and the frequency-projected state is upper bounded as
*J* is small, but fortunately the probability that *J* is small tends to zero exponentially fast as *n* grows: indeed, the probability distribution *q*_{J} is the product of a polynomial in *J* times a Gaussian with variance of order n centred around the value *J*_{0}=*p*(*n*+1)/2 [17].

Frequency projection squeezes the density matrix into a subspace of size J, which can then be encoded into a quantum memory of approximately *J* in the high probability region. The argument is based on two observations: First, the inaccuracy for the state *f*(*P*) is a suitable function (see Material and methods). Second, the state *ρ*_{J,T} in an approximately reversible fashion, with an error that vanishes in the large *n* limit [18]. As the inaccuracy is a continuous function of the state (see Material and methods), we obtain that the minimum inaccuracy for the typical state *ρ*_{T,J} is *n*. Now, recall that the typical values of *J* are equal to *J*_{0}=*p*(*n*+1)/2, up to a correction of size at most n. Hence, one has
*J*.

Note that the compression protocol does not require any knowledge of the time parameter *T*, nor it requires knowledge of the dephasing parameter *p*. Owing to this feature, the protocol applies even in the presence of randomly fluctuating and/or correlated dephasing, as long as dephasing errors on different qubits arise from an exchangeable sequence of random variables.

The protocol can also be applied when the evolution is noisy. Dephasing during the time evolution is described by the master equation [9],
*σ*_{z} is the Pauli matrix *σ*_{z}=|0〉〈0|−|1〉〈1| and *γ*≥0 is the decay rate, corresponding to the inverse of the relaxation time *T*_{2} in NMR. The state at time *T* is
*p*_{T,γ}=(1−*e*^{−γ(T+τ0)})/2, and *τ*_{0} accounts for dephasing noise in the state preparation. The only difference with the case of unitary evolution is that now the amount of dephasing depends on *T*. However, since our compression protocol does not require knowledge of the dephasing parameter *p*, all the results shown before are still valid.

## 5. Applications

### (a) Measuring the duration of a sequence of events

An important feature of the compression protocol is that it is approximately reversible, meaning that the original *n*-qubit state can be retrieved from the memory, up to a small error that vanishes in the large *n* limit (see electronic supplementary material, Note S2). Owing to this feature, one can engineer a set-up that pauses the time evolution and resumes it on demand.

The set-up, illustrated in figure 2, uses a quantum clock made of *n* identical qubits. At time *t*_{0}, each qubit is initialized in the state *ρ*_{t0,γ}. The qubit evolves until time *t*_{0}′=*t*_{0}+*T*_{0} under the noisy dynamics (4.4). The state of the *n* clock qubits is then stored in the quantum memory, where it remains until time *t*_{1}. For simplicity, we assume that the memory is ideal, meaning that the state of the memory qubits does not change during the lag time between *t*_{0}′ and *t*_{1}. Physically, this means that the decoherence time of the memory is long compared to the lag time between one event and the next. At time *t*_{1}, the state of the memory is transferred back to the clock, which resumes its evolution until *t*_{1}′=*t*_{1}+*T*_{1}. The procedure is iterated for *k* times, so that in the end the state of the clock records the total duration *T*=*T*_{0}+*T*_{1}+⋯+*T*_{k−1}.

Our coherent set-up offers an advantage over incoherent protocols where the duration of each time interval *T*_{j} is measured individually. In the noiseless scenario, the comparison is straightforward. The probability distribution for the optimal time measurement on the state |*Ψ*_{Tj}〉^{⊗n} is approximately Gaussian, and the inaccuracy for the sum of *k* Gaussian variables grows as k. Instead, the inaccuracy of the coherent protocol is approximately constant in *k*, up to higher-order terms arising from the compression error (see Material and methods). Hence, performing a single measurement reduces the inaccuracy by a factor k.

The advantage of the coherent protocols persists even after taking into account the error of frequency projection, and even for relatively small *n*. As a simple example let us consider the case of *n*=8 and *P*=0.9. For a sequence of *k*=4 events, a coherent protocol using three qubits of memory has an inaccuracy 0.787 times that of the incoherent protocol.

The benefits of the coherent protocol are not limited to the noiseless scenario. A performance comparison between coherent and incoherent protocols is presented in figure 3, for the task of measuring the duration of a time interval *T*, divided into *k* subintervals of equal length. The figure shows the advantage of the coherent approach for *γ*=0.2 and for every *k* larger than 2. In electronic supplementary material, Note S3, we provide a necessary and sufficient condition for the coherent protocol to have better performance than the incoherent protocol. The condition shows that the total duration is better computed coherently whenever the length of each subinterval *T*/*k* is small compared to the decoherence time 1/*γ*. Note that the advantage persists even when the total time *T* is large, although the performance of both the coherent and the incoherent protocol worsen as *T* grows.

In the above discussion, we assumed that the memory is free from noise and that the compression protocol is implemented without errors. Of course, realistic implementations will also involve errors. One way to take into account the noise in the memory is to introduce an effective dephasing rate *γ*_{j}, which models independent and symmetric errors occurring in the lag time between *t*_{j}′ and *t*_{j+1}. Overall, the result of this extra dephasing is to reduce the size of the parameter region where the coherent storage of time information offers an advantage over the incoherent strategy. Now, let us consider the errors in the implementation of the compression protocol. Owing to the continuity of the inaccuracy (see Material and methods), the error of the circuit implementation can be analysed independently of estimation inaccuracy. Assuming an independent error model, the errors in the implementation of the encoding and decoding operations will introduce an error *ϵ*_{circuit}, which is bounded by the error probability of each elementary gate, denoted by *ϵ*_{1}, times the gate complexity of the whole circuit. The overhead of the gate complexity is the complexity of the Schur transform [16], which was recently reduced to *k* iterations, the overall error *ϵ*_{circuit} scales as *ϵ*_{1} is small compared to *n*. In fact, the desired rate of *ϵ*_{1} can be achieved by using physical gates with error below a constant threshold value, by recursively increasing the number of layers of error correction [20–22].

### (b) Stabilizing quantum clocks in a network

Networks of clocks are important in many areas, such as GPS technology and distributed computing. Recently, Kómár *et al* proposed a quantum protocol, allowing multiple nodes in a network to jointly stabilize their clocks with higher accuracy [23,24]. The protocol involves a network of *k* nodes, each node with a local oscillator used as a time-keeping device. The goal is to guarantee that all local oscillators have approximately the same frequency. To this purpose, a central station distributes a GHZ state *k* nodes. The entanglement is then transferred to *k* atomic clocks. By interacting with the clock qubits, the *k* nodes adjust the frequencies of their local oscillators, obtaining a shared time standard with accuracy *n* is the number of repetitions of the whole procedure. The key ingredient in this last step is a protocol for estimating the sum of the frequencies of the local oscillators.

Overall, the above protocol requires the communication of *kn* qubits. Using our stopwatch protocol, we can reduce the amount of quantum communication to *T*_{0}, and then encodes the state of the clock into a memory, which is sent to the second node. The second node performs the same operations, and passes the memory to the third node, and so on until the *k*th node. In the end of the protocol, the memory will contain information about the total phase *φ*_{tot}=(*ω*_{1}+*ω*_{2}+⋯+*ω*_{k})*T*_{0}, where *ω*_{j} is the frequency of the *j*th local oscillator. In this way, the sum of the frequencies can be read out with an error *k*, meaning that the average frequency has Heisenberg limited error of size

An alternative to the sequential protocol is to use a parallel protocol, where the central station distributes entangled states to the *k* nodes, as in [23,24]. Using our frequency projection technique, it is possible to reduce the amount of quantum communication also in this case, from *kn* qubits to *n* copies of the GHZ state into a multipartite entangled state where each node has an exponentially smaller clock of

## 6. Conclusion

The compression of clock states is a versatile technique. In the addition to advantages in measuring the total duration and in stabilizing quantum clocks in a network, it offers the opportunity to transform product states of *n* qubits into entangled states of n qubits, allowing one to reversibly switch from an encoding where the information can be accessed locally to a more compact encoding where the information is available globally. Quite interestingly, this approach works also for mixed states and in the presence of noise, thus defining a new set of mixed states achieving the Heisenberg limit.

In view of the applications, it is natural to ask what ingredients would be needed to implement our compression protocol experimentally. The protocol requires a quantum computer capable of implementing the encoding and decoding operations. The question is how large the computer should be and how many elementary operations it should perform. In terms of size, we have seen that the large *n* regime can be already probed for values around *n*=16, a number that is likely to be within reach in the near future. In terms of complexity, one can break down the compression protocol in two parts: the Schur transform and the frequency projection. The Schur transform can be efficiently realized by a quantum circuit of at most polynomially many gates [16], scaling as *n*=10. The frequency projection can be efficiently implemented with a technique introduced in [25], whereby the spin eigenstate |*J*,*m*〉 is encoded into a (2*J*+1)-qubit state, with the *m*th qubit is in the state |1〉 and all the other qubits are in the state |0〉. In this encoding, projecting on a restricted range of values of *m* is the same as throwing away some of the qubits. The encoding operations and their inverses can be implemented using *O*(*J*^{2}) elementary gates [25]. In summary, all the components of the quantum stopwatch can be implemented with a moderate amount of elementary gates. The main challenge for the experimental realization of our protocol is the required accuracy in the encoding and decoding operations, whose fault tolerant realization requires additional layers of error corrections. Our protocol provides an additional motivation to the realization of fault-tolerant quantum computers, showing an example of application where the aid of a quantum computer could significantly enhance the precision of time measurements.

## 7. Material and methods

### (a) Properties of the inaccuracy

The results of this paper take advantage of three basic properties of the inaccuracy, presented in the following:

(i) *Continuity*. Suppose that the states *ρ*_{T} and *ρ*_{T}′ are close in trace distance for every value of the parameter *T*. Operationally, this means that the outcome probabilities for every measurement performed on *ρ*_{T} are close to the outcome probabilities for the same measurement on *ρ*_{T}′. If the trace distance is smaller than *ϵ*, then one has
*P*(*δ*,*T*) (respectively, *P*′(*δ*,*T*)) is the probability that the estimate falls within an interval of size *δ* around the true value *T*. In turn, equation (7.1) implies the following bound in the inaccuracy:
*δ*(*P*,*T*) and *δ*′(*P*,*T*) are the inaccuracies for the states *ρ*_{T} and *ρ*_{T}′, respectively. When the probability distribution is sufficiently regular, these bounds guarantee that the inaccuracy is a continuous function of the state. For example, we will see that the accuracy for an *n*-qubit quantum clock in the state *f*(*P*) is an analytical function of *P*. A state that is *ϵ*-close to *n*, meaning that the correction does not affect the leading order.

(ii) *Data-processing inequality*. Suppose that the system *S*, used to encode the parameter *T*, is transformed into another system *S*′ by some physical process. Let *ρ*_{T}′ be the state generated by the process acting on the state *ρ*_{T}. Then, every measurement *M*′ on the output system *S*′ defines a measurement *M* on the input system *S*, obtained by first transforming *S* into *S*′ and then performing the measurement *M*′. By construction, the two measurements have the same statistics and, in particular, the same inaccuracy. By minimizing the inaccuracy over all measurements, one obtains the inequality

(iii) *Symmetry.* For quantum clocks, the accuracy is maximized by covariant measurements [7], that is, measurements described by positive operator valued measures of the form

### Proposition 7.1

*For a convex mixture* *, one has the inequality* *, where* *(respectively,* *) is the minimum inaccuracy for the state ρ (respectively, ρ*_{i}*).*

The argument is simple. For every covariant measurement, the probability to find the estimate in an interval of size *δ* around the true value *T* is independent of *T* and will be denoted simply as *P*(*δ*). For measurements on the state *P*_{i}(*δ*) is the corresponding probability for the state *ρ*_{i}. Setting *ρ* cannot be smaller than *δ*.

### (b) Accuracy of time measurements on identically prepared clock states

We now construct a measurement strategy that estimates the parameter *T* from the state *j*th qubit on the eigenstates of the observable *τ*_{j} is an angle, chosen uniformly at random between 0 and 2*π*. The measurement has two possible outcomes, +1 and −1. If the outcome of the measurement is +1, one records the value *τ*_{j}, if the outcome is −1, one records the value *τ*_{j}+*π*. Mathematically, the measurement strategy is described by the positive operator-valued measure {*M*_{τ}}_{τ∈[0,2π)} with
*τ* when the input state is *ρ*_{T,p} is *P*(*τ*|*T*,*p*)=Tr[*M*_{τ}*ρ*_{T,p}]. Explicit calculation shows that the classical Fisher information of this probability distribution is
*n* qubits are measured independently, one can collect the results of the measurements and use classical statistics to generate an estimate of *T*. Using the maximum-likelihood estimator, one obtains an estimate that is approximately Gaussian-distributed with average *T* and standard deviation *δ*/2 is [27]
*x*) is the error function. Hence, the inaccuracy can be expressed as

The same argument applies to the states *ρ*_{T,γ}, generated by the noisy time evolution. The only difference is that, in this case, the classical Fisher information is
*γ* is known, and
*γ* is unknown (see electronic supplementary material, Note S4 for the derivation).

## Data accessibility

All electronic supplementary material notes were submitted together with the article.

## Authors' contributions

All the authors contributed to the entire work and helped drafting the manuscript. All the authors gave their final approval for publication.

## Competing interests

We declare we have no competing interests.

## Funding

This work is supported by the National Natural Science Foundation of China through grant no. 11675136, by the Hong Kong Research Grant Council through grant no. 17326616, by the Croucher Foundation, by the Canadian Institute for Advanced Research (CIFAR), and by the HKU Seed Funding for Basic Research. Y.Y. is supported by a Microsoft Research Asia Fellowship and a Hong Kong and China Gas Scholarship. M.H. was supported in part by a MEXT Grant-in-Aid for Scientific Research (B) no. 16KT0017, a MEXT Grant in-Aid for Scientific Research (A) no. 23246071, the Okawa Research Grant and Kayamori Foundation of Informational Science Advancement. Centre for Quantum Technologies is a Research Centre of Excellence funded by the Ministry of Education and the National Research Foundation of Singapore. This publication was made possible through the support of a grant from the John Templeton Foundation. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation.

## Acknowledgements

We thank Lorenzo Maccone for useful comments and Xinhui Yang for drawing the figures.

## Footnotes

Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.4093169.

↵1 This can be shown by applying Markov inequality to the squared deviation from the mean.

- Received November 4, 2017.
- Accepted April 24, 2018.

- © 2018 The Author(s)

Published by the Royal Society. All rights reserved.