The dynamics of a spatial quantum formulation of the iterated prisoner's dilemma game is studied in this work. The game is played in the cellular automata manner, i.e. with local and synchronous interaction. The evolution is driven by the imitation of the best neighbour. It is shown how spatial structure enables in fully quantum contests a dramatically rapid emergence of mutual cooperation. In unfair contests, such as quantum versus classic players, the velocity in which the quantum players take advantage of its privileged status is also very notable.
1. Introduction: the classic context
The prisoner's dilemma (PD) is a game played by two players, who may choose either to cooperate (C) or to defect (D). Mutual cooperator each scores the reward R, mutual defectors score the punishment P; D scores the temptation against C, who scores S (sucker's pay-off) in such an encounter. In the PD, it is: . The game is symmetric, i.e. the pay-off matrices of both players are coincident after transposition: .
Using uncorrelated probabilistic strategies x=(x,1−x)′ and y=(y,1−y)′, the expected pay-offs (p) in the PD game are 1.1
The pair of strategies (x,y) are in Nash equilibrium if x is the best response to y and y is the best response to x. Mutual defection, i.e. x=y=(0,1) is the only pair of strategies in Nash equilibrium in the PD game. Both players would get the same pay-off R with x=y=1, but mutual cooperation is not of a pair of strategies in equilibrium.
In a broader game scenario, a probability distribution Π=(πij) assigns probability to every combination of player choices, so in 2×2 games . Thus, the expected pay-offs in the PD are and
2. Quantum games
In the quantization scheme introduced by Eisert et al. , the classical pure strategies C and D are assigned to two basic vectors |0〉 and |1〉, respectively, in a Hilbert space of a two-level system. The state of the game is a vector in the tensor product space spanned by the basis vectors |00〉, |01〉, |10〉, |11〉, where the first and the second entries in the ket refers to the players A and B, respectively.
The quantum game protocol starts with an initial entangled state , where is a symmetric unitary operator that entangles the players qubits and that is known to both players. To ensure that the classical game is a subset of its quantum version, it is necessary that , where γ∈[0,π/2].
The players perform their quantum strategies as local unitary operators (), and . After the application of these strategies, that the players chose independently, the state of the game evolves to . Prior to measurement, the gate is applied and the state of the game becomes 2.1
This follows a pair of Stern–Gerlach-type detectors for measurement. As a result, . Consequently, the expected pay-offs become and
Following the seminal article by Eisert et al.  (and our study ), we will give here consideration to the two-parameter subset of the SU(2) space of strategies 2.2With classical strategies, and .
If αA=αB=0 or if γ=0,1 it turns out (noting ω≡θ/2) Thus, the joint probabilities factorize as in the classical game employing independent strategies (1.1) with and . So to say, the θ parameters are the classical ones.
By contrast, with maximal entangling (γ=π/2), if θA=θB=0, it is: , a no factorizable joint probability distribution.
At variance with what happens in the classical context, the pair of pure strategies are in Nash Equilibrium in the Eisert et al.  quantum implementation.
3. The spatialized quantum prisoner's dilemma
In the spatial version of the PD, we deal with, each player occupies a site (i,j) in a two-dimensional N×N lattice. In order to compare different types of players, e.g. quantum versus classic, two types of players, termed A and B, are to be considered. A and B players alternate in the site occupation in a chessboard form so that every player is surrounded by four partners (A–B, B–A) and four mates (A–A, B–B). Incidentally, in asymmetric games, such as the battle of the sexes , the distinction between two types of players is mandatory, not the only artefact to allow comparisons of players with different capabilities.
In a cellular automata2 (CA)-like implementation, in each generation (T) every player plays with his four adjacent partners so that the pay-off of a given individual is the sum over these four interactions. Following the imitation-of-the best3 way followed by Nowak & May [6,7] with regards to spatial PD game, in the next generation, every player will adopt the parameter choice (), of his nearest-neighbour mate (including himself) that received the highest pay-off. In the case of a tie, i.e. several mate neighbours with the same maximum pay-off, the average of the (θ,α) parameter values corresponding to the best mate neighbours will be adopted.
This is the case of the () example of table 1 intended in the classical context, i.e. with α=0 for every player. In the initial scenario of table 1, every player fully cooperates θA=0 ≡x=1, θB=0≡y=0), except an A player located in the central part of the lattice that chooses θ at π level, i.e. fully defects. As a result, at time-step T=1 the general income for every player is 16=4×4, with the exceptions arising from the initial defector, which gets a 20=4×5 pay-off, whereas its partners achieve the 13=3×4+1 pay-off. The change in pay-offs from 16 to 20 units fires the change to θA=π of the four A players connected with the initial defector as indicated under T=2 in table 1. The change θA=0 into θA=π advances in this way at every time-step, so that in this simple example every A player will defect, i.e. θ=π in the long term. In an n=6 small lattice as that of table 1, the spread of defection is fully achieved as soon as at T=4, and from this time step every A player will receive the mentioned 20=4×5 pay-off, whereas every B player will receive a 4=4×1 pay-off.
All the simulations in this work are run in an N=200 lattice with periodic boundary conditions and initial random assignment of the parameter values. As a result of the random assignment of the parameter values it is initially: and . In the classic γ=0 context, exactly middle levels of the θ and α parameters lead to so that both players get the arithmetic mean of the pay-off values: , p=3.0, in the context of table 1. With full entanglement, with the equal middle-level election of the parameters by both players, the probability distribution degenerates to:4 , which leads to pA=pB=1×P (=2.0 in the context of table 1). Thus, the Eisert et al.  model is, so to say, biased towards defection.
Figure 1 shows a simulation of a quantum (1,2,4,5)-PD CA with the entanglement factor γ increasing from γ=0.0, i.e. the classic context, partial entanglement γ=0.511, to γ=π/2, i.e. maximum entangling. In the classic scenario (γ=0.0), the high temptation value () drives the process to general defection by leading the θ values to the π landmark. By contrast, with full entanglement (γ=π/2), the imitation of the best choice implemented with perfect spatial structure allows for the contrary dynamics: the θ values tend to 0 and the α ones to the π/2 landmark, i.e. full cooperation. Please note that these tendencies are achieved from the very beginning and culminate dramatically soon, despite the mentioned bias towards defection of the Eisert et al. model. Please, recall that in contrast to what happens in the classical context, the pair of pure strategies CC are in Nash Equilibrium in the γ=π/2 quantum implementation. It has been here just described how it easily emerges in a CA-like spatial dynamics.
With partial entanglement (γ=0.511), the dynamic tends initially to the increase in θ accompanied with the decrease in α, reflected in a notable decreasing in the mean pay-offs (as in the classic model), but by T=9, approximately, these tendencies are reverted, and consequently the pay-offs increase. Nevertheless, these tendencies are not maintained in the posterior evolution on the described γ=0.511 simulation, on the contrary, from T=40 (not shown in figure 1) the θ values increase and the α ones decrease, with the pay-offs slowly tending to the penalization P=2.0. It can be argued that, at least in the described simulation, γ=0.511 is not high enough to enable the emergency of mutual cooperation.
The curves labelled p* in figure 1 show the theoretical (or mean-field) pay-offs of both players, i.e. the pay-offs achieved in a single hypothetical two-person game with players adopting the mean parameters appearing in the spatial dynamic simulation, namely,
The p* values coincide with the actual mean pay-offs when there is no (spatial) heterogeneity in the parameter values, as it is the case of both the classic and fully entangled simulations shown in figure 1. On the contrary, in the partially entangled simulation in figure 1 there is some degree of spatial heterogeneity in the parameter values, which leads to theoretical mean-field estimations different from the actual ones, in this particular simulation with p* below .
A detailed study on the transition from mutual defection to mutual cooperation as γ increases is currently conducted . But let us note here that the landmark γ=π/45 and γ=π/8 have been γ-values checked in the initial configuration of figure 1 so that the former leads to mutual cooperation and the latter to mutual defection. The parameter value of the central frame in figure 1, presented as γ=0.511, is really established as γ=1.3π/8 in the Fortran code programmed to perform computations.
4. Unfair contest
Let us assume the unfair situation: A type of players is restricted to classical strategies , whereas the other type of players may use quantum ones .
Figure 2 shows a simulation of a quantum (1,2,4,5)-PD cellular automaton, where the B players are restricted to classical strategies, i.e. αB=0. The far-left panel of the figure shows the evolution up to T=200 of the mean values across the lattice of θ, α as well of the actual and theoretical mean pay-offs. Figure 2 also shows the snapshots of the parameter and pay-off patterns at T=200, both for the full lattice and the zoomed 11×11 central part. As a result of the unfair scenario, the left panel of the figure shows how the A player rapidly gets a mean pay-off pA≃3.4, not far to the reward R=4, whereas the B player is induced to pB≃1.5 not far to P=1.
Rich maze-like structures predominate for both θ and α spatial parameter patterns in figure 2. These structures are also shown in the pay-off spatial pattern, though in a much fuzzier manner. As a result of the rich spatial structure, the theoretical pay-offs (p*) are far distant to its corresponding actual ones (p) and invert the order. Geometry plays a key role in this scenario so that the mean-field approach does not properly operate. The scenario depicted in figure 2 is very much reminiscent (including the maze-like patterns) of the (5,1)-Quantum Battle of the Sexes described in , fig. 2.
As long as only the results from the last round are taken into account and the outcomes of previous rounds are neglected, the model considered up to now may be termed ahistoric, although it is not fully memoryless as there is chain (or Markovian) mechanism inherent in it so that previous results affect further outcomes.
In the historic model we consider in this section, after the generic time-step T, and for every cell (i,j), both the pay-offs (p) and the (θ,α) parameter values coming from the previous rounds are summarized by means of a geometric mechanism This geometric mechanism is accumulative in its demand for knowledge of past history. Thus, the numerators of the just above-stated-averaged means are sequentially generated as: , , , with Ω(T)=1+δΩ(T−1).
The choice of the memory factor 0≤δ≤1 simulates the remnant memory effect: the limit case δ=1 corresponds to equally weighted records (full memory model), whereas δ≪1 intensifies the contribution of the most recent iterations and diminishes the contribution of the past ones (short-term memory); the choice δ=0 leads to the ahistoric model.
The imitation of the best rule will remain unaltered, but it will operate on the trait pay-offs and parameters of every player, constructed from the previous rounds as described earlier. We have studied the effect of this kind of embedded memory in the classical spatial PD  and the Battle of the Sexes (BOS) [13,14] games. As an overall rule, memory boosts cooperation in the PD and coordination in the BOS.
Figure 3 shows a simulation of a quantum (1,2,4,5)-PD CA, with full (δ=1.0) memory. This figure demonstrates that memory of past iterations slows the rapid convergence to cooperation found in the ahistoric dynamic in such a way that by T=200, the parameter values appear fairly stabilized in levels distant of the extreme zero and π and, as a result, the mean pay-offs seem to difficultly grow beyond 3.5<4.0=R. An almost perfect agreement turns out apparent in figure 3 between the theoretical and the actual pay-offs.
Figure 4 shows a simulation of a quantum (1,2,4,5)-PD cellular automaton, where the B players are restricted to classical strategies and the quantum A players are endowed with memory of factor δ. As a consequence of the retarder effect of memory, the classic players overcome the quantum ones provided that memory is not full, i.e. δ<1. In the full memory scenario (δ=1), the evolution appears very soon frozen so that both kinds of players get low pay-offs. In these simulations, the theoretical pay-offs p* correspond fairly well with the actual ones (), even though the rich parameter pattern structure shown in the figure at T=200.
6. Conclusion and future work
The quantum cellular automata formulation of the PD evolves, via the imitation of the best neighbour, in a dramatically rapid form to the selection of mutual cooperation. This is so even with high temptation values, which contrasts with the, also rapid, evolution to mutual defection in the classic formulation of the PD. In the event of an unfair contest such as quantum versus classic players, the velocity in which the quantum players take advantage of its privileged status is also very notable.
Memory of past iterations slows the rapid convergence to cooperation in the just-mentioned ahistoric dynamic regarding quantum players. As a consequence, if in the quantum versus classic players contest the quantum ones are endowed with memory, the classic ones may get better pay-offs than the quantum ones.
Three-parameter strategies  and other quantization schemes [16,17] deserve particular studies in the spatial context. In particular, the scheme introduced by Marinatto & Weber  in which the bias to defection is avoided by stating the initial state of the game |ψi〉 as a linear combination of the vectors of the base (|00〉,|01〉,|10〉,|11〉). The Marinatto and Weber model also differs from the earlier proposed model of Eisert et al.  by the absence of the reverse gate J†.
Further study is due on structurally dynamic quantum games, in games with asynchronous and probabilistic updating, as well as on the effect of increasing degrees of spatial dismantling. These are deviations from the canonical cellular automata paradigm which may lead to more realistic models. Particularly, with embedded tuneable memory .
This work was carried out during a residence in UNCOMP research group, based in the University of the West of England (Bristol).
↵2 Cellular automata are spatially extended dynamical systems that are discrete in all their constitutional components: space, time and state-variable. Uniform, local and synchronous interactions, as assumed here, are landmark features of CA .
↵3 Li et al.  deals with an investigation related to that performed in this study. However, the details in the manner in which both studies are implemented vary greatly. Let us mention here only one of them: At variance with the purely deterministic imitation of the best updating mechanism implemented here, a probabilistic mechanism of imitation is implemented in Li et al. .
- Received November 27, 2013.
- Accepted January 17, 2014.
- © 2014 The Author(s) Published by the Royal Society. All rights reserved.