## Abstract

Traditional quantum state tomography requires a number of measurements that grows exponentially with the number of qubits *n*. But using ideas from computational learning theory, we show that one can do exponentially better in a statistical setting. In particular, to predict the outcomes of *most* measurements drawn from an arbitrary probability distribution, one needs only a number of sample measurements that grows linearly with *n*. This theorem has the conceptual implication that quantum states, despite being exponentially long vectors, are nevertheless ‘reasonable’ in a learning theory sense. The theorem also has two applications to quantum computing: first, a new simulation of quantum one-way communication protocols and second, the use of trusted classical advice to verify untrusted quantum advice.

## Footnotes

Electronic supplementary material is available at http://dx.doi.org/10.1098/rspa.2007.0113 or via http://www.journals.royalsoc.ac.uk.

This work was done while the author was a postdoc at the University of Waterloo, supported by CIAR through the Institute for Quantum Computing.

↵ can also be a continuous probability measure; this will not affect any of our results.

↵At this point we should remind the reader that the distribution over measurements has to exist only; it does not have to be known. All of our learning algorithms will be ‘distribution-free’, in the sense that a single algorithm will work for any choice of .

↵Interestingly, in the ‘representation-independent’ setting (where the output hypothesis can be an arbitrary Boolean circuit), this problem is not known to be -complete.

↵Note that any sample complexity bound must have at least a linear dependence on

*k*. Here is a proof sketch: given a subset*S*⊆{1, …,*k*} with |*S*|=*k*/2, let |*S*〉 be a uniform superposition over the elements of*S*. Now consider simulating a measurement of |*S*〉 in the computational basis, {|1〉, …, |*k*〉}. It is clear that*Ω*(*k*) sample measurements are needed to do this even approximately.↵According to Google Scholar, Valiant's original paper on the subject (Valiant 1984) has been cited 1918 times as of this writing, with a large fraction of the citations coming from fields other than theoretical computer science.

↵Here the superscript ‘1’ denotes one-way communication.

↵The result we state is a special case of Bartlett & Long's (1998) theorem 20, where the function

*F*to be learned is itself a member of the hypothesis class .↵If we care about optimizing the constant under the

*Ω*(*k*), then we are better off avoiding amplification and instead proving theorem 2.6 directly using the techniques of Ambainis*et al*. (2002). Doing so, we obtain*n*/*γ*^{2}≥2*k*/ln 2.↵We can assume without loss of generality that Bob is deterministic, i.e. that his output is a function of

*a*and*y*.↵This

*f*is the equality function:*f*(*x*,*y*)=1 if*x*=*y*, and*f*(*x*,*y*)=0 otherwise.↵Indeed, it suffices to give a deterministic protocol that errs with probability at most 1/3 over

*y*drawn from , a fact we will not need.↵This is the function

*f*:{0, 1}^{N}×{1, …,*N*}→{0, 1} defined by*f*(*x*_{1}, …,*x*_{N},*i*)=*x*_{i}.↵The protocol is as follows: first Alice sends the log

*N*-qubit quantum message . Then Bob measures in a basis corresponding to (*i*_{1},*j*_{1}), …, (*i*_{α}_{N},*j*_{α}_{N}). With probability 2*α*, Bob will learn whether for some edge (*i*_{ℓ},*j*_{ℓ}). So it suffices to amplify the protocol*O*(1/*α*) times.↵See Aaronson (2005) for a detailed definition, as well as a proof that PostBQP coincides with the classical complexity class PP.

↵Here Yoda, from Star Wars, is intended to evoke a sage whose messages are highly generic (“Do or do not... there is no try”). One motivation for the name YP is that, to our knowledge, there had previously been no complexity class starting with a ‘Y’.

↵Closely related to heuristic complexity is the better-known average-case complexity. In average-case complexity one considers algorithms that can never err, but that are allowed to output ‘don't know’ on some fraction of inputs.

↵Note that we can only hope to learn such states efficiently for restricted classes of measurements. Otherwise, even if the state to be learned were a classical basis state |

*x*〉, a ‘measurement’ of |*x*〉 might be an arbitrary polynomial time computation that fed*x*as input to a pseudorandom function.↵Here is a proof sketch: let

*U*be an*n*-qubit unitary that maps |*x*〉|*b*〉 to |*x*〉|*b*⊕*f*(*x*)〉, for some Boolean function*f*:{0, 1}^{n−1}→{0, 1}. Then to predict*U*on a 1−*ϵ*fraction of basis states, we need to know (1−*ϵ*)^{2n−1}bits of the truth table of*f*. But Holevo's theorem implies that by examining*U*|*ψ*_{i}〉 for*T*input states |*ψ*_{1}〉, …, |*ψ*_{T}〉, we can learn at most*Tn*bits about*f*.- Received July 2, 2007.
- Accepted August 16, 2007.

- © 2007 The Royal Society

Sign in for Fellows of the Royal Society

Fellows: please access the online journals via the Fellows’ Room

Not a subscriber? Request a free trial

### Log in using your username and password

### Log in through your institution

Pay Per Article - You may access this article or this issue (from the computer you are currently using) for 30 days.

Regain Access - You can regain access to a recent Pay per Article or Pay per Issue purchase if your access period has not yet expired.