## Abstract

One of the crown jewels of complexity theory is Valiant's theorem that computing the permanent of an *n*×*n* matrix is #**P**-hard. Here we show that, by using the model of *linear-optical quantum computing*—and in particular, a universality theorem owing to Knill, Laflamme and Milburn—one can give a different and arguably more intuitive proof of this theorem.

## 1. Introduction

Given an *n*×*n* matrix *A*=(*a*_{i,j}), the *permanent* of *A* is defined as
A seminal result of Valiant (1979) says that computing Per(*A*) is #**P**-hard, if *A* is a matrix over (say) the integers, the non-negative integers, or the set {0,1}.^{1} Here, #**P** means (informally) the class of *counting problems*—problems that involve summing exponentially many non-negative integers—and #**P**-hard means ‘at least as hard as any #**P** problem’.^{2,3}

More concretely, Valiant gave a polynomial-time algorithm that takes as input an instance *φ*(*x*_{1},…,*x*_{n}) of the Boolean satisfiability problem, and that outputs a matrix *A*_{φ} such that Per(*A*_{φ}) encodes the number of satisfying assignments of *φ*. This means that computing the permanent is at least as hard as counting satisfying assignments.

Unfortunately, the standard proof that the permanent is #**P**-hard is notoriously opaque; it relies on a set of gadgets that seem to exist for ‘accidental’ reasons. Could there be an alternative proof that gave more, or at least different, insight? In this paper, we try to answer that question by giving a new, quantum computing-based proof that the permanent is #**P**-hard. In particular, we will derive the permanent's #**P**-hardness as a consequence of the following three facts:

—

*Postselected linear optics*is capable of universal quantum computation, as shown in a celebrated 2001 paper of Knill*et al.*(2001) (henceforth referred to as KLM).^{4}— Quantum computations can encode #

**P**-hard quantities in their amplitudes.— Amplitudes in

*n*-photon linear-optics circuits can be expressed as the permanents of*n*×*n*matrices.

Even though our proof is based on quantum computing, we stress that we have made it *entirely self-contained*: all of the results we need (including the KLM theorem (Knill *et al.* 2001), and even the construction of the Toffoli gate from 1-qubit and CSIGN gates) are proved in this paper for completeness. We assume some familiarity with quantum computing notation (e.g. kets and quantum circuit diagrams), but not with linear optics.

### (a) Motivation

If one counts the complexity of all the individual pieces we use—especially the universality results for quantum gates—then our reduction from #**P** to the permanent ends up being at least as complicated as Valiant's, and probably more so. In our view, however, this is similar to how writing a program in C++ tends to produce a longer, more complicated executable file than writing the same program in assembly language. Normally, one also cares about the length and readability of the *source code*! Our purpose in this paper is to illustrate how quantum computing provides a powerful ‘high-level programming language’, in which one can, among other things, easily re-derive the most celebrated result in the theory of #**P**-hardness.

But why does the world need a new proof that the permanent is #**P**-hard—especially a proof invoking what some might consider to be exotic concepts? Let us offer several answers:

— Any theorem as basic as the #

**P**-hardness of the permanent deserves several independent proofs. And our proof really is ‘independent’ of the standard one: rather than composing variable and clause gadgets,^{5}we*multiply matrices*corresponding to quantum gates, and use ideas from linear optics to keep track of how such multiplications affect the permanent. One way to see the difference is that our proof never uses the notion of a cycle cover.— While our proof, like the standard one, requires ‘gadgets’ (one to simulate a Toffoli gate using CSIGN gates, another to simulate a CSIGN gate using postselected linear optics), the connection with quantum computing gives those gadgets a natural

*semantics*. In other words, the gadgets were introduced for ‘practical’ reasons having nothing to do with proving the permanent #**P**-hard, and can be motivated independently of that goal.*If*one already knows the quantum universality gadgets,*then*we offer what seems like a major advance in complexity-theoretic pedagogy: a proof that the permanent is #**P**-hard that can be reproduced on-the-spot from memory!— As Kuperberg (2009) pointed out, by their nature, any #

**P**-hardness proofs (including ours) that are based on ‘quantum postselection’ almost*immediately*yield hardness of approximation results as well.— We expect that the quantum postselection approach used here could lead to #

**P**-hardness proofs for many other problems—including problems not already known to be #**P**-hard by other means. In this direction, one natural place to look would be special cases of the permanent.

### (b) Related work

By now, there are many examples where quantum computing has been used to give new or simpler proofs of classical complexity theorems; see Drucker & de Wolf (2011) for an excellent survey. Within the area of counting complexity, Aaronson (2005) showed that the class **PP** is equal to **PostBQP** (quantum polynomial-time with postselection), and then used that theorem to give a simpler proof of the landmark result of Beigel *et al.* (1995) that **PP** is closed under intersection. Later, also using the **PostBQP**=**PP** theorem, Kuperberg (2009) gave a ‘quantum proof’ of the result of Jaeger *et al.* (1990) that computing the Jones polynomial is #**P**-hard, and even showed that a certain approximate version is #**P**-hard (which had not been shown previously). Kuperberg's argument for the Jones polynomial is conceptually similar to our argument for the permanent.

There is also precedent for using linear optics as a tool to prove theorems about the permanent. Scheel (2004) observed that the unitarity of linear-optical quantum computing (LOQC) implies the interesting fact that |Per(*U*)|≤1 for all unitary matrices *U*.

Rudolph (2009) showed how to encode quantum amplitudes directly as matrix permanents, and in the process, gave a ‘quantum-computing proof’ that the permanent is #**P**-hard. However, a crucial difference is that Rudolph starts with Valiant's proof based on cycle covers, then recasts it in quantum terms (with the goal of making Valiant's proof more accessible to a physics audience). By contrast, our proof is independent of Valiant's; the tools we use were invented for separate reasons in the quantum computing literature.

There has been a great deal of work on LOQC, beyond the seminal KLM theorem (Knill *et al.* 2001) on which this paper relies. Recently, Aaronson & Arkhipov (2011) studied the complexity of sampling from a linear-optical computer's output distribution, assuming that no adaptive measurements are available. By using the #**P**-hardness of the permanent as an ‘input axiom’, they showed that this sampling problem is classically intractable unless **P**^{#P}=**BPP**^{NP}. More relevant to this paper is an alternative *proof* that Aaronson and Arkhipov gave for their result. Inspired by the work of Bremner *et al.* (2010), the alternative proof combines Aaronson's **PostBQP**=**PP** theorem (Aaronson 2005) with the fact that postselected linear optics is universal for **PostBQP**, and thereby avoids any direct appeal to the #**P**-hardness of the permanent. In retrospect, that proof was already much of the way towards a linear-optical proof that the permanent is #**P**-hard; this paper simply makes the connection explicit.

## 2. Background

Not by accident, this section constitutes the bulk of the paper. First, in §2*a*, we fix some facts and notation about standard (qubit-based) quantum computing. Then, in §2*b*, we give a short overview of those aspects of LOQC that are relevant to us, and (for completeness) prove the KLM theorem in the specific form we need.

### (a) Quantum circuits

Abusing notation, we will often identify a quantum circuit *Q* with the unitary transformation that it induces: for example, 〈0⋯0|*Q*|0⋯0〉 represents the amplitude with which *Q* maps its initial state to itself. We use |*Q*| to denote the number of gates in *Q*.

The first ingredient we need for our proof is a convenient set of quantum gates (in the standard qubit model). Thus, let be the set of gates consisting of (i) all 1-qubit gates and (ii) the 2-qubit *controlled-sign* gate
which flips the amplitude if and only if both qubits are |1〉.^{6} Then, Barenco *et al.* (1995) showed that is a *universal* set of quantum gates, in the sense that generates *any* unitary transformation on any number of qubits (without error). For our purposes, however, the following weaker result suffices.

### Lemma 2.1

*generates the Toffoli gate, the 3-qubit gate that maps each basis state |x,y,z〉 to |x,y,z ⊕ xy〉.*

### Proof.

The circuit can be found in Nielsen & Chuang (2000) for example, but we reproduce it in figure 1 for completeness. In the diagram, is the Hadamard gate, is another 1-qubit gate, and the six vertical bars represent CSIGN gates. ■

### (b) Linear-optical quantum computing

We now give a brief overview of LOQC, an alternative quantum computing model based on identical photons rather than qubits. See Aaronson & Arkhipov (2011) for a detailed introduction to LOQC from a computer science perspective.

In LOQC, each basis state of our quantum computer has the form |*S*〉=|*s*_{1},…,*s*_{m}〉, where *s*_{1},…,*s*_{m} are non-negative integers summing to *n*. Here *s*_{i} represents the number of photons in the *i*th location or ‘mode’, and the fact that *s*_{1}+⋯+*s*_{m}=*n* means that photons are never created or destroyed. One should think of *m* and *n* as both polynomially bounded. For this paper, it will be convenient to assume that *m* is even, that *n*=*m*/2, and that the initial state has the form |*I*〉=|0,1,0,1,…,0,1〉: that is, one photon in each even-numbered mode, and no photons in the odd-numbered modes.

Let *Φ*_{m,n} be the set of non-negative integer tuples *S*=(*s*_{1},…,*s*_{m}), such that *s*_{1}+⋯+*s*_{m}=*n*, and let be the Hilbert space spanned by basis states |*S*〉 with *S*∈*Φ*_{m,n}. Then a general state in LOQC is just a unit vector in :
with .

To transform |*ψ*〉, one can select any *m*×*m* unitary transformation *U*=(*u*_{ij}). This *U* then induces a larger unitary transformation *φ*(*U*) on the Hilbert space of *n*-photon states. There are several ways to define *φ*(*U*), but perhaps the simplest is the following formula:
2.1for all tuples *S*=(*s*_{1},…,*s*_{m}) and *T*=(*t*_{1},…,*t*_{m}) in *Φ*_{m,n}. Here, *U*_{S,T} is the *n*×*n* matrix obtained from *U* by taking *s*_{i} copies of the *i*th row of *U* and *t*_{j} copies of the *j*th column, for all *i*,*j*∈[*m*]. To illustrate, if
and |*S*〉=|*T*〉=|2,1〉, then

Intuitively, the reason the permanent arises in formula (2.1) is that there are *n*! ways of mapping the *n* photons in basis state |*S*〉 onto the *n* photons in basis state |*T*〉. Since the photons are *identical bosons*, quantum mechanics says that each of those *n*! ways contributes a term to the total 〈*S*|*φ*(*U*)|*T*〉, with the contribution given by the product of the transition amplitudes *u*_{ij} for each of the *n* photons individually.

It turns out that *φ*(*U*) is always unitary and that *φ* is a homomorphism. Both facts seem surprising when viewed purely as algebraic consequences of formula (2.1), but of course they have natural physical interpretations: *φ*(*U*) is unitary because it represents an actual physical transformation that can be applied, and *φ* is a homomorphism because generalizing from one to *n* photons must commute with composing beamsplitters. In this paper, we will not need that *φ*(*U*) is unitary; see Aaronson & Arkhipov (2011) for a proof of that fact. Below we prove that *φ* is a homomorphism.

### Lemma 2.2

*φ is a homomorphism.*

### Proof.

We want to show that for all tuples *S*,*T*∈*Φ*_{m,n} and all *m*×*m* unitaries *U*,*V* ,
By equation (2.1), the above is equivalent (after multiplying both sides by ) to the identity
2.2We will prove identity (2.2) in the special case *n* = *m* and *S* = *T* = *I* = (1,1,…,1), as the general case is analogous. We have
In the second line above, we decomposed the sum by thinking about each permutation *σ*∈*S*_{n} as a product of *two* permutations: one, *τ*, that maps *n* particles in the initial configuration |*I*〉 to *n* particles in the intermediate configuration |*R*〉 when *U* is applied, and another, *ξ*, that maps *n* particles in the intermediate configuration |*R*〉 to *n* particles in the final configuration |*I*〉 when *V* is applied. This yields the same result, as long as we remember to sum over all possible intermediate configurations *R*∈*Φ*_{n,n}, and also to divide each summand by *r*_{1}!…*r*_{n}!, which is the size of *R*'s automorphism group (i.e. the number of ways to permute the *n* particles within |*R*〉 that leave |*R*〉 unchanged). ■

In the standard qubit model, every unitary transformation can be decomposed as a product of *gates*, each of which acts non-trivially on only one or two qubits. Similarly, in LOQC, every unitary transformation can be decomposed as a product of *linear-optics gates*, each of which acts non-trivially on only one or two modes. Then a *linear-optics circuit* is simply a list of linear-optics gates applied to specified modes (or pairs of modes) starting from the initial state |*I*〉=|0,1,…,0,1〉.^{7}

The last notion we need is that of *postselected LOQC*. In our context, postselection simply means measuring the number of photons in a given mode *i*, and conditioning on a particular result (e.g., 0 photons or 1 photon). After we postselect on the number of photons in some mode, we will never use that mode for further computation.^{8} For this reason, without loss of generality, we can defer all postselected measurements until the end of the computation.

Our #**P**-hardness proof will fall out as a corollary of the following universality theorem, which is implicit in the work of KLM (Knill *et al.* 2001). Indeed, we *could* just appeal to the KLM construction as a ‘black box’, but we choose not to do so, as the properties of the construction that we want are slightly different from the properties KLM want, and we wish to verify in detail that the desired properties hold.

### Theorem 2.3 (following KLM (Knill *et al.* 2001))

*Postselected linear optics can simulate universal quantum computation. More concretely: there exists a polynomial-time classical algorithm that converts a quantum circuit Q over the gate set* *into a linear-optics circuit L*, *so that*
*where Γ is the number of CSIGN gates in Q and* |*I*〉=|0,1,…,0,1〉 *is the standard initial state*.

### Proof.

To encode a (qubit-based) quantum circuit by a postselected linear-optics circuit, KLM use the so-called *dual-rail representation* of a qubit using two optical modes. In this representation, the qubit |0〉 is represented as |0,1〉, while the qubit |1〉 is represented as |1,0〉. Thus, to simulate a quantum circuit that acts on *k* qubits, we need 2*k* optical modes. (We will also need additional modes to handle postselection, but we can ignore those for now.) Let the modes corresponding to qubit *i* be labelled (*i*,0) and (*i*,1), respectively. Notice that the initial state |0⋯0〉 in the qubit model maps onto the initial state |*I*〉 in the optical model.

As *φ* is a homomorphism by lemma 2.2, to prove the theorem it suffices to show how to simulate the gates in . Simulating a 1-qubit gate is easy: simply apply the appropriate 2×2 unitary transformation to the Hilbert space spanned by |0,1〉 and |1,0〉. The interesting part is how to simulate a CSIGN gate. To do so, KLM use another gate that they call NS_{1}, which applies the following unitary transformation to a single mode:
(We do not care how NS_{1} acts on |3〉, |4〉, and so on, as those basis states will never arise in our simulation.) Using NS_{1}, it is not hard to simulate CSIGN on two qubits *i* and *j*. The procedure, shown in figure 2, is this: first apply a Hadamard transformation to modes (*i*,0) and (*j*,0). One can check that this induces the following transformation on the state of (*i*,0) and (*j*,0):
The key point is that we get a state involving *two* photons in the same mode, *if and only if* both the modes (*i*,0) and (*j*,0) contained a photon. Next, apply NS_{1} gates to both (*i*,0) and (*j*,0). This flips the amplitude if and only if we started with |1,1〉. Finally, apply a second Hadamard transformation to (*i*,0) and (*j*,0), to complete the implementation of CSIGN.

We now explain how to implement NS_{1} on a given mode *i*, using postselection. To do so, we need two additional modes *j* and *k*, which are initialized to the states |0〉 and |1〉, respectively. First we apply the following 3×3 unitary transformation to *i*,*j*,*k*:
Then we postselect on *j* and *k* being returned to the state |0,1〉. As shown in Knill *et al.* (2001), this postselection always succeeds with amplitude 1/2 (corresponding to probability 1/4); and that conditioned on it succeeding, the effect is to apply NS_{1} in mode *i*. To prove this, observe that as the number of photons is conserved, the effect of *W* on mode *i* must have the form
for some *λ*_{0},*λ*_{1},*λ*_{2}. Using formula (2.1), we then calculate
and
This implies that the CSIGN circuit shown in figure 2 succeeds with amplitude 1/4 (corresponding to probability 1/16), and furthermore, we know when it succeeds. ■

In the proof of theorem 2.3, the main reason the matrix *W* looks complicated is simply that it needs to be unitary. However, notice that unitarity is irrelevant for our #**P**-hardness application—and if we drop the unitarity requirement, then we can replace *W* by a simpler 2×2 matrix, such as
To implement NS_{1} on a given mode *i*, we would apply *Y* to *i* as well as another mode *j* that initially contains one photon, then postselect on *j* still containing one photon after *Y* is applied. One can verify by calculation that the effect on mode *i* is
where *λ*_{0}=*λ*_{1}=1 and *λ*_{2}=−1.

## 3. Main result

In this section, we deduce the following theorem, as a straightforward consequence of theorem 2.3.

### Theorem 3.1

*The problem of computing* Per(*A*), *given a matrix* *of* poly(*N*)-*bit integers written in binary, is* #**P**-*hard under many-one reductions*.

In classical complexity theory, one is often more interested in various corollaries of theorem 3.1: for example, that computing Per(*A*) remains #**P**-hard even if *A* is a *non-negative* integer matrix, or a {−1,0,1}-valued matrix, or a {0,1}-valued matrix. Valiant (1979) gave simple reductions by which one can deduce all of these corollaries from theorem 3.1. We do not know how to use the linear-optics perspective to get any additional insight into the corollaries.

Let *C* be a classical circuit that computes a Boolean function , and let . Then computing Δ_{C}, given *C* as input, is a #**P**-hard problem essentially by definition. On the other hand, it is easy to encode Δ_{C} as an amplitude in a quantum circuit:

### Lemma 3.2

*There exists a classical algorithm that takes a circuit C as input, runs in poly(n,|C|) time and outputs a (qubit-based) quantum circuit Q, consisting of gates from* *, such that
*

### Proof.

Let *D*_{C} be a 2^{n}×2^{n} diagonal unitary matrix whose (*x*,*x*) entry is *C*(*x*). Then, as the Toffoli gate is universal for classical computation, a quantum circuit consisting of 1-qubit gates and Toffoli gates can easily apply *D*_{C}. To do so, one uses the standard ‘uncomputing’ trick:
where *h*_{C}(*x*) is the complete history of a computation using Toffoli gates that produces *C*(*x*). Now let be the quantum Fourier transform over (i.e. the Hadamard gate applied to each of *n* qubits), and let *Q*=*FD*_{C}*F*. Then
Finally, by lemma 2.1, we can simulate each of the Toffoli gates in *Q* using gates from the set . ■

Let *Q* be the quantum circuit from lemma 3.2, and assume that *Q* uses *k*=poly(*n*,|*C*|) qubits. By theorem 2.3, we can simulate *Q* by a linear-optics circuit *L* such that
where *Γ*=poly(*n*,|*C*|) is the number of CSIGN gates in *Q*. Furthermore, the circuit *L* uses *m*:=2*k*+4*Γ* optical modes. Let *U* be the *m*×*m* unitary matrix induced by *L*, and let *V* be the (*m*/2)×(*m*/2) submatrix of *U* obtained by taking only the even-numbered rows and columns. Then, we have
where the first line follows from formula (2.1) and the third from lemma 3.2. As *V* can be produced in polynomial time given *C*, this already shows that computing Per(*V*) to sufficient precision is #**P**-hard.

However, we still need to deal with the issue that the entries of *V* are real numbers.^{9} Let . Then notice that truncating the entries of *V* to *b* bits of precision produces a matrix such that
for sufficiently large *n*, and hence
For this reason, we can assume that each entry of *V* has the form *k*/2^{b} for some integer *k*∈[−2^{b},2^{b}]. Now set *A*:=2^{b}*V* . Then *A* is an integer matrix satisfying Per(*A*)=2^{bn}Per(*V*), whose entries can be specified using *b*+*O*(1)=poly(*n*,|*C*|) bits each. This completes the proof of theorem 3.1.

We conclude by noticing that our proof yields not only theorem 3.1, but also the following corollary:

### Corollary 3.3

*The problem of computing* sgn(Per(*A*)):=Per(*A*)/|Per(*A*)|, *given a matrix* *of* poly(*N*)-*bit integers written in binary, is* #**P**-*hard under Turing reductions*.

### Proof.

By the above equivalences, it suffices to show that computing sgn(Δ_{C}) is #**P**-hard. This is true because, given the ability to compute sgn(Δ_{C}), we can determine Δ_{C} *exactly* using binary search. In more detail, given a positive integer *k*, let *C*[*k*] denote the circuit *C* modified to contain *k* additional inputs *x* such that *C*(*x*)=1, and let *C*[−*k*] denote *C* modified to contain *k* additional *x*'s such that *C*(*x*)=−1. Then clearly
and
Thus, we can use the following strategy: compute the signs of Δ_{C[1]},Δ_{C[−1]},Δ_{C[2]},Δ_{C[−2]},Δ_{C[4]},Δ_{C[−4]}, and so on, increasing *k* by successive factors of 2, until a *k* is found such that sgn(Δ_{C[k]})≠sgn(Δ_{C[2k]}). At that point, we know that Δ_{C} must be between *k* and 2*k*. Then by computing sgn(Δ_{C[3k/2]}), we can decide whether Δ_{C} is between *k* and 3*k*/2 or between 3*k*/2 and 2*k*, and so on recursively until Δ_{C} has been determined exactly. ■

Corollary 3.3 implies, in particular, that approximating Per(*A*) to within any multiplicative factor is #**P**-hard—because to output a multiplicative approximation, at the least we would need to know whether Per(*A*) is positive or negative.

Using a more involved binary search strategy (which we omit), one can show that, for any *β*(*N*)∈[1,poly(*N*)], even approximating |Δ_{C}| or to within a multiplicative factor of *β*(*N*) would let one compute Δ_{C} exactly, and is therefore #**P**-hard under Turing reductions. It follows from this that approximating |Per(*A*)| or Per(*A*)^{2} to within a multiplicative factor of *β*(*N*) is #**P**-hard as well. (Aaronson & Arkhipov (2011) gave a related but more complicated proof of the #**P**-hardness of approximating |Per(*A*)| and Per(*A*)^{2}, which did not first replace Per(*A*) with Δ_{C}.)

## Acknowledgements

I am grateful to Alex Arkhipov and Michael Forbes for helpful discussions, and to Andy Drucker, Greg Kuperberg, Avi Wigderson, Ronald de Wolf and the anonymous reviewers for their comments. This material is based upon work supported by the National Science Foundation under grant no. 0844626. Also supported by a DARPA YFA grant and a Sloan Fellowship.

## Footnotes

↵1 See Hrubes

*et al.*(2010) for a recent, ‘modular’ presentation of Valiant's proof (which also generalizes the proof to the non-commutative and non-associative case).↵2 See the Complexity Zoo (www.complexityzoo.com) for the definitions of #

**P**and other complexity classes used in this paper.↵3 If

*A*is a non-negative integer matrix, then Per(*A*) is*itself*a #**P**function, which implies that it is #**P**-*complete*(the term for functions that are both #**P**-hard and in #**P**). If*A*can have negative or fractional entries, then strictly speaking, Per(*A*) is no longer #**P**-complete, but it is still #**P**-hard and computable in the class**FP**^{#P}.↵4 KLM actually prove the stronger (and more practically relevant) result that linear optics with

*adaptive measurements*is capable of universal quantum computation. For our purposes, however, we only need the weaker fact that*postselected*measurements suffice for universal quantum computation, which KLM prove as a lemma along the way to their main result.↵5 Indeed, our proof does not even go through the Cook–Levin theorem: it reduces a #

**P**computation directly to the permanent, without first reducing #**P**to #3*SAT*.↵6 A more common 2-qubit gate than CSIGN is the

*controlled-NOT*(CNOT) gate, which maps each basis state |*x*,*y*〉 to |*x*,*y*⊕*x*〉. However, CSIGN is more convenient for linear-optics purposes, and is equivalent to CNOT by conjugating the second qubit with a Hadamard gate.↵7 A crucial difference between standard quantum circuits and linear-optics circuits is that, whereas a standard quantum gate is the

*tensor product*of a small (say 4×4) unitary matrix with an exponentially-large (say 2^{n−2}×2^{n−2}) identity matrix, a linear-optics gate is the*direct sum*of a small (say 2×2) unitary matrix with a*polynomially*-large (say (*m*−2)×(*m*−2)) identity matrix. It is only the homomorphism that produces exponentially large matrices. One consequence, pointed out by Reck*et al.*(1994), is that, whereas most*n*-qubit unitary transformations require*Ω*(2^{2n}) gates to implement (as follows from an easy dimension argument), every*m*-mode unitary transformation*U*can be implemented using only*O*(*m*^{2}) linear-optics gates.↵8 In physics language, all photon number measurements are assumed to be ‘demolition’ measurements.

↵9 Indeed, the matrices that we multiply to obtain

*U*can be*complex*matrices, but*U*itself (and hence the submatrix*V*) will always be real.

- Received April 13, 2011.
- Accepted June 7, 2011.

- This journal is © 2011 The Royal Society