## Abstract

A central question for causal inference is to decide whether a set of correlations fits a given causal structure. In general, this decision problem is computationally infeasible and hence several approaches have emerged that look for certificates of compatibility. Here, we review several such approaches based on entropy. We bring together the key aspects of these entropic techniques with unified terminology, filling several gaps and establishing new connections, all illustrated with examples. We consider cases where unobserved causes are classical, quantum and post-quantum, and discuss what entropic analyses tell us about the difference. This difference has applications to quantum cryptography, where it can be crucial to eliminate the possibility of classical causes. We discuss the achievements and limitations of the entropic approach in comparison to other techniques and point out the main open problems.

## 1. Introduction

Deciding whether a causal explanation is compatible with given statistical data and exploring whether it is the most suitable explanation for the data at hand are central scientific tasks. Sometimes the most reasonable explanation of a set of observations involves unobserved common causes. In the case where the common causes are classical, the well-developed machinery of Bayesian networks can be used [1,2]. In principle, such networks are well understood and it is known how to check whether observed correlations are compatible with a given network [3]. In practice, however, testing compatibility for networks that involve unobserved systems is only computationally tractable for small cases [4,5]. Furthermore, the methodology has to be adapted whenever non-classical common causes are permitted.

Finding good heuristics to help identify correlations that are (in)compatible with a causal structure is currently an active area of research [6–20] and the use of entropy measures is common to many of these [6–13,17,18,20]. Such methods are important in the quantum context, where recent cryptographic protocols rely on the lack of a classical causal explanation for certain quantum correlations in specified causal structures [21–29], an idea that lies behind Bell’s theorem [30] (see also [31]).

In §2 of this article, we review the entropic characterization of the correlations compatible with causal structures in classical, quantum and more general non-signalling theories. We detail refinements of the approach based on post-selection in §3. Together, these sections show the current capabilities of entropic techniques, also establishing and clarifying connections between different contributions. Our review is illustrated with several examples to assist its understanding and to make it easily accessible for applications. In §4 we outline and compare further approaches to the problem, before concluding in §5 with some open questions. Where lemmas or propositions appear without citations, we are not aware of a reference where they are stated explicitly (although several have been implicitly used in the existing literature).

## 2. Entropy vector approach

Characterizing the joint distributions of a set of random variables or alternatively considering a multiparty quantum state in terms of its entropy (and of those of its marginals) has a tradition in information theory, dating back to Shannon [32–34]. However, only recently has this approach been extended to account for causal structure [6,8]. In §2a and 2b, respectively, we review this approach with and without imposing causal constraints. All our considerations are concerned with discrete random variables; for extensions of the approach to continuous random variables (and its limitations) we refer to [8,35].

### (a) Classical entropy cone

The entropy cone for a joint distribution of *n* random variables was introduced in [33]. It is defined in terms of the *Shannon entropy* [32], which, for a discrete random variable *X* taking values *P*_{X}, is defined by

For a set of *n*≥2 jointly distributed random variables, *Ω*:={*X*_{1},*X*_{2},…,*X*_{n}}, we denote their probability distribution as *n* jointly distributed random variables. The Shannon entropy maps any subset of *Ω* to a non-negative real value: *Ω*, and *H*({})=0. The entropy of the joint distribution of the random variables *Ω* and of all its marginals can be expressed as components of a vector in *v* for which there exists a sequence of distributions **H**(*P*_{k}) tends to *v* as *entropy cone*

#### (i) Outer approximation: the Shannon cone

The standard outer approximation to *Shannon inequalities* listed in the following:

—

*Monotonicity*. For all*X*_{T},*X*_{S}⊆*Ω*,*H*(*X*_{S}∖*X*_{T})≤*H*(*X*_{S}).—

*Submodularity*. For all*X*_{S},*X*_{T}⊆*Ω*,*H*(*X*_{S}∩*X*_{T})+*H*(*X*_{S}∪*X*_{T})≤*H*(*X*_{S})+*H*(*X*_{T}).

It is a matter of convention, whether *H*({})=0 is included as a Shannon inequality; we keep this implicit.

These inequalities are always obeyed by the entropies of a set of jointly distributed random variables. They may be concisely rewritten in terms of the following information measures: the *conditional entropy* of two jointly distributed random variables *X* and *Y* , *H*(*X* | *Y*):=*H*(*XY*)−*H*(*Y*), their *mutual information*, *I*(*X*:*Y*):=*H*(*X*)+*H*(*Y*)−*H*(*XY*), and the *conditional mutual information* between two jointly distributed random variables *X* and *Y* given a third, *Z*, defined by *I*(*X*:*Y* | *Z*):=*H*(*XZ*)+*H*(*Y* *Z*)−*H*(*Z*)−*H*(*XY* *Z*). Hence, the monotonicity constraints correspond to positivity of conditional entropy, *H*(*X*_{S}∩*X*_{T} | *X*_{S}∖*X*_{T})≥0, and submodularity is equivalent to positivity of the conditional mutual information, *I*(*X*_{S}∖*X*_{T}:*X*_{T}∖*X*_{S} | *X*_{S}∩*X*_{T})≥0. The monotonicity and submodularity constraints can all be generated from a minimal set of *n*+*n*(*n*−1)2^{n−3} inequalities [33]: for the monotonicity constraints it is sufficient to consider the *n* constraints with *X*_{S}=*Ω* and *X*_{T}=*X*_{i} for some *X*_{i}∈*Ω*; for the submodularity constraints it is sufficient to consider those with *X*_{S}∖*X*_{T}=*X*_{i} and *X*_{T}∖*X*_{S}=*X*_{j} with *i*<*j* and where *X*_{U}:=*X*_{S}∩*X*_{T} is any subset of *Ω* not containing *X*_{i} or *X*_{j}, i.e. submodularity constraints of the form *I*(*X*_{i}:*X*_{j} | *X*_{U})≥0.

These *n*+*n*(*n*−1)2^{n−3} independent Shannon inequalities can be expressed in terms of a (*n*+*n*(*n*−1)2^{n−3})×(2^{n}−1)-dimensional matrix, which we call *v*=**H**(*P*). It follows that the *Shannon cone*,

### Example 2.1.

The three-variable Shannon cone is

#### (ii) Beyond the Shannon cone

For two variables the Shannon cone coincides with the actual entropy cone, *n*≥4, further independent constraints on the set of entropy vectors are needed to fully characterize

### Proposition 2.2 (Zhang & Yeung).

*For any four discrete random variables T, U, V and W the following inequality holds*:

For *n*≥4, the convex cone *n*=4 have been conducted [42,43], which recover most of the previously known inequalities; in particular, the inequality of proposition 2.2 is rederived and shown to be implied by tighter ones [43]. The systematic search in [43] is based on considering additional random variables that obey certain constraints and then deriving four-variable inequalities from the known identities for five or more random variables (see also [38,39]), an idea that is captured by a so-called copy lemma [38,43,44]. In the same article, several rules to generate families of inequalities have been suggested, in the style of techniques introduced by Matúš [39].

For more than four variables, a few additional inequalities are known [38,40]. Curiously, to our knowledge, all known relevant non-Shannon inequalities (i.e. the ones found in [38–43] that are not yet superseded by tighter ones) can be written as a positive linear combination of the *Ingleton quantity*, *I*(*T*:*U* | *V*)+*I*(*T*:*U* | *W*)+*I*(*V* :*W*)−*I*(*T*:*U*), and conditional mutual information terms (see also [43]).

#### (iii) Inner approximations

Inner approximations are constructed from a set of conditions that are sufficient for an entropy vector to be realized by a distribution. Such conditions can be stated in terms of so-called linear rank inequalities [45–48]. They can be useful for establishing tightness of outer approximations.

For the four-variable entropy cone, *Ingleton inequality* [45], *I*(*T*:*U* | *V*)+*I*(*T*:*U* | *W*)+*I*(*V* :*W*)−*I*(*T*:*U*)≥0, for random variables *T*, *U*, *V* and *W*. These inequalities can be concisely written as a matrix *Ingleton cone*, *v*∈*Γ*^{I} implies *v*∈*Γ**_{n} [46]. By contrast, there are entropy vectors that violate the Ingleton inequalities, as the following example shows.

### Example 2.3.

Let *T*, *U*, *V* and *W* be four jointly distributed random variables. Let *V* and *W* be uniform random bits and let *T*=AND(¬*V*,¬*W*) and *U*=AND(*V*,*W*). This distribution [39] leads to the entropy vector *v*≈(0.81,0.81,1,1,1.50,1.50,1.50,1.50,1.50,2,2,2,2,2,2), for which *I*(*T*:*U* | *V*)+*I*(*T*:*U* | *W*)+*I*(*V* :*W*)−*I*(*T*:*U*)≈−0.12 in violation of the Ingleton inequality.

For five random variables an inner approximation in terms of Shannon, Ingleton and 24 additional inequalities and their permutations is known (including partial extensions to more variables) [47,48].

### (b) Entropy vectors for causal structures

Causal relations among a set of variables impose constraints on their possible joint distributions, which can be conveniently represented with a causal structure.

### Definition 2.4.

A *causal structure* is a set of variables arranged in a directed acyclic graph (DAG), in which a subset of the nodes is assigned as observed.

The directed edges of the graph are intended to represent causation, perhaps by propagation of some influence, and cycles are excluded to avoid the well-known paradoxes associated with causal loops. We will interpret causal structures in different ways, depending on the supposed physics of whatever is mediating the causal influence.

One of the simplest causal structures that leads to interesting insights and one of the most thoroughly analysed ones is Pearl’s instrumental causal structure, *IC* [49]. It is displayed in figure 1*a* and will be used as an example throughout this review.

#### (i) Classical causal structures

In the classical case, the causal relations among a set of random variables can be explored by means of the theory of Bayesian networks (see, for instance, [1,2] for a complete presentation of this theory).

### Definition 2.5.

A *classical causal structure*, *C*^{C}, is a causal structure in which each node of the DAG has an associated random variable.

It is common to use the same label for the node and its associated random variable. The DAG encodes which joint distributions of the involved variables are allowed in a causal structure *C*^{C}. To explain this, we need a little more terminology.

### Definition 2.6.

Let *X*_{S}, *X*_{T}, *X*_{U} be three disjoint sets of jointly distributed random variables. Then *X*_{S} and *X*_{T} are said to be *conditionally independent* given *X*_{U} if and only if their joint distribution *P*_{XSXTXU} can be written as *P*_{XSXTXU}= *P*_{XS | XU}*P*_{XT | XU}*P*_{XU}. Conditional independence of *X*_{S} and *X*_{T} given *X*_{U} is denoted as *X*_{S} ⫫ *X*_{T} | *X*_{U}.

Two variables *X*_{S} and *X*_{T} are (unconditionally) independent if *P*_{XSXT}=*P*_{XS}*P*_{XT}, concisely written *X*_{S} ⫫ *X*_{T}. With reference to a DAG with a subset of nodes, *X*, we will use *X*^{↓} to denote the ancestors of *X* and *X*^{↑} to denote the descendants of *X*. The parents of *X* are represented by *X*^{↓1} and the non-descendants are *X*^{⤉} .

### Definition 2.7.

Let *C*^{C} be a classical causal structure with nodes {*X*_{1},*X*_{2},…,*X*_{n}}. A probability distribution *compatible* with *C*^{C} if it can be decomposed as

The compatibility constraint encodes all conditional independences of the random variables in the causal structure *C*^{C}. Nonetheless, whether a particular set of variables is conditionally independent of another is more easily read from the DAG, as explained in the following.

### Definition 2.8.

Let *X*, *Y* and *Z* be three pairwise disjoint sets of nodes in a DAG *G*. The sets *X* and *Y* are said to be *d-separated* by *Z*, if *Z* blocks any path from any node in *X* to any node in *Y* . A path is *blocked* by *Z* if the path contains one of the following: *i*, *j* and a node *z*∈*Z* in that path, or if the path contains *k*∉*Z*.

The d-separation of the nodes in a causal structure is directly related to the conditional independence of its variables. The following proposition corresponds to theorem 1.2.5 from [1], previously introduced in [51,52]. It justifies the application of d-separation as a means to identify independent variables.

### Proposition 2.9 (Verma & Pearl).

*Let C*^{C} *be a classical causal structure and let X*, *Y and Z be pairwise disjoint subsets of nodes in C*^{C}. *If a probability distribution P is compatible with C*^{C}, *then the d-separation of X and Y by Z implies the conditional independence X* ⫫ *Y* | *Z. Conversely, if, for every distribution P compatible with C*^{C}, *the conditional independence X* ⫫ *Y* | *Z holds, then X is d-separated from Y by Z*.

The compatibility of probability distributions with a classical causal structure is conveniently determined with the following proposition, which has also been called the parental or local Markov condition before (theorem 1.2.7 in [1]).

### Proposition 2.10 (Pearl).

*Let C*^{C} *be a classical causal structure. A probability distribution P is compatible with C*^{C} *if and only if every variable in C*^{C} *is independent of its non-descendants, conditioned on its parents*.

Hence, to establish whether a probability distribution is compatible with a certain classical causal structure, it is enough to check that every variable *X* is independent of its non-descendants *X*^{⤉} given its parents *X*^{↓1}, concisely written as *X* ⫫ *X*^{⤉} | *X*^{↓1}, i.e. to check one constraint for each variable. In particular, it is not necessary to explicitly check for all possible sets of nodes whether they obey the independence relations implied by d-separation. Each such constraint can be conveniently expressed as
*D*(*P* ∥ *Q*)=0 ⇔ *P*=*Q*, and because *I*(*X*:*X*^{⤉} | *X*^{↓1})=*D*(*P*_{XX⤉X↓1} ∥ *P*_{X | X↓1}*P*_{X⤉ | X↓1}*P*_{X↓1}).

While the conditional independence relations capture some features of the causal structure, they are insufficient to completely capture the causal relations between variables, as illustrated in figure 2. In this case, the probability distributions themselves are unable to capture the difference between these causal structures: correlations are insufficient to determine causal links between random variables. External interventions allow for the exploration of causal links beyond the conditional independences [1]. However, we do not consider these here.

Let *C*^{C} be a classical causal structure involving *n* random variables {*X*_{1},*X*_{2},…,*X*_{n}}. The restricted set of distributions that are compatible with the causal structure *C*^{C} is

### Example 2.11 (Allowed distributions in the instrumental scenario).

The classical instrumental scenario of figure 1*a* allows for any four-variable distribution in the set

The restrictions on the allowed distributions also restrict the corresponding entropy cones. Owing to proposition 2.10 there are at most *n* independent conditional independence equalities (2.1) in a causal structure *C*^{C}. Their coefficients can be concisely written in terms of a matrix *M*_{CI}(*C*^{C}), where CI stands for conditional independence. For a causal structure *C*^{C}, we define the two sets *Γ*(*C*^{C}):={*v*∈*Γ*_{n}|*M*_{CI}(*C*^{C})⋅*v*=0}, where *Γ**(*C*^{C})⊆*Γ*(*C*^{C}). The following lemma justifies the notation we use for *Γ**(*C*^{C}); it is the set of achievable entropy vectors in *C*^{C}.

### Lemma 2.12.

*For a causal structure C*^{C}, *Furthermore, its topological closure,* *is a convex cone.*

### Proof.

For the causal structure *C*^{C}, let *X*_{i} if and only if *M*_{CI}(*C*^{C}) yields *E*(*C*^{C})=*Γ**(*C*^{C}). Now, let us consider the set *M*_{CI}(*C*^{C})⋅*v*=0 is also closed and convex. Being the intersection of two closed convex sets, the set *F*(*C*^{C}) is also closed and convex. From this we conclude that *F*(*C*^{C}). (Because *F*(*C*^{C}) is closed, any element *w*∈*F*(*C*^{C}), in particular any element on its boundary, is the limit of a sequence of elements {*w*_{k}}_{k} for *w*_{k} lie in the interior of *F*(*C*^{C}) for all *k*. Hence

The convexity of

### Example 2.13 (Entropic outer approximation for the instrumental scenario).

The instrumental scenario has at most four independent conditional independence equalities (2.1). We find that there are only two, *I*(*A*:*X*)=0 and *I*(*Y* :*X*|*AZ*)=0. This yields *H*(*A*), *H*(*X*), *H*(*Y*), *H*(*Z*), *H*(*AX*), *H*(*AY*), *H*(*AZ*), *H*(*XY*), *H*(*XZ*), *H*(*Y* *Z*), *H*(*AXY*), *H*(*AXZ*), *H*(*AY* *Z*), *H*(*XY* *Z*), *H*(*AXY* *Z*)). An outer approximation is given by *Γ*(*IC*^{C})={*v*∈*Γ*_{4}|*M*_{CI}(*IC*^{C})⋅*v*=0}.

In general, the outer approximation to *c*). For the instrumental scenario, however, such additional inequalities are irrelevant. This can, for instance, be seen by constructing the following inner approximation to the cone.

### Example 2.14 (Entropic inner approximation for the instrumental scenario [50]).

] For the instrumental scenario an inner approximation is given in terms of the Ingleton cone and the conditional independence constraints from the previous example, *Γ*^{I}(*IC*^{C})={*v*∈*Γ*^{I}|*M*_{CI}(*IC*^{C})⋅*v*=0}. For this causal structure the Ingleton constraints are implied by the Shannon inequalities and the conditional independence constraints and, hence, inner and outer approximation coincide. Consequently, they also coincide with the actual entropy cone, i.e.

Inner approximations have been considered in [50]. They are particularly useful in cases where identical inner and outer approximations are found, where they identify the actual boundary of the entropy cone. In other cases, they can allow parts of the actual boundary to be identified or give clues on how to find better outer approximations.

Arguably all interesting scenarios (such as the previous example) involve unobserved variables that are suspected to cause some of the correlations between the variables we observe. These unobserved variables may yield constraints on the possible joint distributions of the observed variables, a well-known example being a Bell inequality [30] (for a detailed discussion of the significance of Bell inequality violation on classical causal structures see [31]). More generally, we would like to infer constraints on the observed variables that follow from the presence of unobserved variables.

For a causal structure on *n* random variables {*X*_{1},*X*_{2},…,*X*_{n}}, the restriction to the set of observed variables is called its *marginal scenario*, denoted by *k*≤*n* variables are observed and the remaining *n*−*k* are not. We are thus interested in the correlations among the first *k* variables that can be obtained as the marginal of some distribution over all *n* variables. Without any causal restrictions the set of all probability distributions of the *k* observed variables is *C*^{C} on the set of variables {*X*_{1},*X*_{2},…,*X*_{n}}, marginalizing all distributions *n*−*k* unobserved variables leads to the set *k* observed random variables, as can be seen in the following example.

### Example 2.15 (Observed distributions in the instrumental scenario).

For the instrumental scenario, the observed variables are *X*, *Y* and *Z* and their joint distribution is of the form

The first entropic inequalities for a marginal scenario were derived in [12], where certificates for the existence of common ancestors of a subset of the observed random variables of at least a certain size were given. One such scenario is the triangle causal structure of figure 1*c*. The systematic entropy vector approach was devised for classical causal structures in [6,8,10]. An outer approximation to the entropic cones of a variety of causal structures was given in [11]. In the following we give the details of this approach.

In the entropic picture, marginalization is performed by eliminating the coordinates that represent entropies of sets of variables containing at least one unobserved variable from the vectors. This corresponds to a projection of a cone in *w* of the observed sets of variables, i.e. of the marginal scenario *v* in the original scenario with matching entropies on the observed variables.

Starting from the set of all entropy vectors, *Γ**(*C*^{C}), those relevant for the marginal scenario can be obtained by discarding the appropriate components. For a finitely generated cone such as *Γ*(*C*^{C}), its projection can be more efficiently determined from the projection of its extremal rays. In the dual description of the entropic cone in terms of its facets (i.e. its inequality description), the transition to the marginal scenario can be made computationally by eliminating all entropies of sets of variables not contained in

Without any causal restrictions, the entropy cone *n*−*k* variables, we recover the entropy cone for *k* random variables, i.e. *n* variable Shannon cone *Γ*_{n} is projected to the *k* variable Shannon cone with the mapping *n* variable Shannon constraints contain the corresponding *k* variable constraints as a subset, and because any vector in *Γ*_{k} can be extended to a vector in *Γ*_{n}, for instance, by taking *H*(*X*_{k+1})=*H*(*X*_{k+2})=⋯=*H*(*X*_{n})=0 and *H*(*X*_{S}∪*X*_{T})=*H*(*X*_{S}) for any *X*_{T}⊆{*X*_{k+1},*X*_{k+2},…,*X*_{n}}.

For a classical causal structure *C*^{C}, we will be interested in the set

### Lemma 2.16.

*is equal to the set of entropy vectors compatible with the marginal scenario of the classical causal structure C*^{C}*, i.e.*

### Proof.

Let *v*∈*Γ**(*C*^{C}) s.t. *v*=**H**(*P*) for some *w*=**H**(*P*′) and hence

Conversely, *w*=**H**(*P*′) and, hence, there exists *v*=**H**(*P*), then

An outer approximation to *M*_{CI}(*C*^{C})⋅*v*=0 and *M*^{n}_{SH}⋅*v*≥0 hold on *k*-variable Shannon constraints which are already included in

### Example 2.17 (Entropic outer approximation for the marginal cone of the instrumental scenario [10,11]).

] For the instrumental scenario, the outer approximation to its marginal cone is found by projecting *Γ*(*IC*^{C}) to its three variable scenario and yields *I*(*X*:*Y* *Z*)≤*H*(*Z*) from [10,11]. As

As mentioned previously, non-Shannon inequalities cannot give any new entropic constraints for *IC*, as the Shannon approximation is already tight. However, in many causal structures they do. For instance in the triangle scenario of figure 1*c*, non-Shannon inequalities still lead to new entropic constraints, even after marginalization to the three observed variables [50].

#### (ii) Causal structures with unobserved quantum systems

A quantum causal structure differs from its classical counterpart in that unobserved systems correspond to shared quantum states.

### Definition 2.18.

A *quantum causal structure*, *C*^{Q}, is a causal structure where each observed node has a corresponding random variable, and each unobserved node has an associated quantum system.

In a classical causal structure, the edges of the DAG represent the propagation of classical information, and, at a node with incoming edges, the random variable there can be generated by applying an arbitrary function to its parents. We are hence implicitly assuming that all the information about the parents is transmitted to its children (otherwise the set of allowed functions would be restricted). This does not pose a problem because classical information can be copied. In the quantum case, on the other hand, the no-cloning theorem means that the children of a node cannot (in general) all have access to the same information as is present at that node. Furthermore, the analogue of performing arbitrary functions in the classical case is replaced by arbitrary quantum operations. Such a quantum framework that allows for an analysis with entropy vectors was introduced in [13]. In the following we outline this approach. However, for unity of description, our account of quantum causal structures is based upon the viewpoint that is taken for generalized causal structures in [11], which we review in the next section. (The difference is as follows: In [13] nodes correspond to quantum systems. All outgoing edges of a node together define a completely positive trace-preserving (CPTP) map with output states corresponding to the joint state associated with its child nodes. Similarly, the CPTP map associated to the input edges of a node must map the states of the parent nodes to the node in question. In [11], on the other hand, edges correspond to states, whereas the transformations occur at the nodes.)

Let *C*^{Q} be a quantum causal structure. Nodes without input edges correspond to the preparation of a quantum state described by a density operator on a Hilbert space, e.g. *A*, where for observed nodes this state is required to be classical. By (*Y* and *Z* are the only children of *A*, then there are associated spaces

A distribution, *P*, over the observed nodes of a causal structure *C*^{Q} is compatible with *C*^{Q} if there exists a quantum state labelling each unobserved node (with subsystems for each unobserved edge) and transformations, i.e. preparations and CPTP maps for each unobserved node as well as POVMs for each observed node, that allow for the generation of *P* by means of the Born rule. We denote the set of all compatible distributions

### Example 2.19 (Compatible distributions in the quantum instrumental scenario).

For the quantum instrumental scenario (figure 1*a*), *X*, a POVM *Z*. Depending on the latter, another POVM *Y* .

The set of entropy vectors of compatible probability distributions over the observed nodes, *von Neumann entropy* of a density operator

Because of the impossibility of cloning, the outcomes and the quantum systems that led to them do not exist simultaneously. Therefore, there is in general no joint multiparty quantum state for all subsystems and it does not make sense to talk about the joint entropy of the states and outcomes. More concretely, if a system *A* is measured to produce *Z*, then *ρ*_{AZ} is not defined and hence neither is *H*(*AZ*) (for attempts to circumvent this, see, for example, [54]).

### Definition 2.20.

Two subsystems in a quantum causal structure *C*^{Q} *coexist* if neither of them is a quantum ancestor of the other. A set of subsystems that mutually coexist is termed *coexisting*.

A quantum causal structure may have several maximal coexisting subsets. Only within such subsets is there a well-defined joint quantum state and joint entropy.

### Example 2.21 (Coexisting sets in the quantum instrumental scenario).

Consider the quantum version of the instrumental scenario, as illustrated in figure 1*a*. There are three observed variables as well as two edges originating at unobserved (quantum) nodes, hence 5 variables to consider. More precisely, the quantum node *A* has two associated subsystems *A*_{Z} and *A*_{Y}. The correlations seen at the two observed nodes *Z* and *Y* are formed by measurement on the respective subsystems *A*_{Z} and *A*_{Y}. The coexisting sets in this causal structure are {*A*_{Y},*A*_{Z},*X*}, {*A*_{Y},*X*,*Z*} and {*X*,*Y*,*Z*} and their (non-empty) proper subsets.

Note that, without loss of generality, we can assume that any initial, i.e. parentless quantum states, such as *ρ*_{A} above, are pure. This is because any mixed state can be purified, and if the transformations and measurement operators are then taken to act trivially on the purifying systems, the same statistics are observed. In the causal structure of example 2.21, this implies that *ρ*_{A} can be considered to be pure and thus *H*(*A*_{Y}*A*_{Z})=0. The Schmidt decomposition then implies that *H*(*A*_{Y})=*H*(*A*_{Z}). This is computationally useful as it reduces the number of free parameters in the entropic description of the scenario. Furthermore, by Stinespring’s theorem [55], whenever a CPTP map is applied at a node that has at least one quantum child, then one can instead consider an isometry to a larger output system. The additional system that is required for this can be taken to be part of the unobserved quantum output (or one of them in case of several quantum output nodes). Each such case allows for the reduction of the number of variables by one, because the joint entropy of all inputs to such a node must be equal to that of all its outputs.

Quantum states are known to obey submodularity [56] and also obey the following condition:

— Weak monotonicity [56]:

*H*(*X*_{S}∖*X*_{T})+*H*(*X*_{T}∖*X*_{S})≤*H*(*X*_{S})+*H*(*X*_{T}), for all*X*_{S},*X*_{T}⊆*Ω*(recall*H*({})=0).

This is the dual of submodularity in the sense that the two inequalities can be derived from each other by considering purifications of the corresponding quantum states [57].

Within the context of causal structures, these relations can always be applied between variables in the same coexisting set. In addition, whenever it is impossible for there to be entanglement between the subsystems *X*_{S}∩*X*_{T} and *X*_{S}∖*X*_{T}—for instance, if these subsystems are in a cq-state—the monotonicity constraint *H*(*X*_{S}∖*X*_{T})≤*H*(*X*_{S}) holds. If it is also impossible for there to be entanglement between *X*_{S}∩*X*_{T} and *X*_{T}∖*X*_{S}, then the monotonicity relation *H*(*X*_{T}∖*X*_{S})≤*H*(*X*_{T}) holds, rendering the weak monotonicity relation stated above redundant.

Altogether, these considerations lead to a set of *basic* inequalities containing some Shannon and some weak-monotonicity inequalities, which are conveniently expressed in a matrix *M*_{B}(*C*^{Q}). This way of approximating the entropic cone in the quantum case is inspired by work on the entropic cone for multiparty quantum states [34]. Note also that there are no further inequalities for the von Neumann entropy known to date (contrary to the classical case where a variety of non-Shannon inequalities is known), except under additional constraints [58–63].

The conditional independence constraints in *C*^{Q} cannot be identified by proposition 2.10, because variables do not coexist with any quantum parents and hence conditioning a variable on a quantum parent is not meaningful. Nonetheless, among the variables in a coexisting set the conditional independences that are valid for *C*^{C} also hold in *C*^{Q}. This can be seen as follows. First, the validity of any constraints that involve only observed variables (which are always part of a coexisting set) hold by proposition 2.27 below. Secondly, for unobserved systems only their classical ancestors and none of their descendants can be part of the same coexisting set. An unobserved system is hence independent of any subset of the same coexisting set with which it shares no ancestors. Note that each of the subsystems associated with a quantum node is considered to be a parent of all of the node’s children (see figure 1 for an example).

In addition, suppose that *X*_{S} and *X*_{T} are disjoint subsets of a coexisting set, *Ξ*, and that the unobserved system *A* is also in *Ξ*. Then *I*(*A*:*X*_{S} | *X*_{T})=0 if *X*_{T} d-separates *A* from *X*_{S} (in the full graph including quantum nodes). This follows because any quantum states generated from the classical separating variables may be obtained by first producing random variables from the latter (for which the usual d-separation rules hold) and then using these to generate the quantum states in question (potentially after generating other variables in the network), hence retaining conditional independence. The same considerations can be made for sets of unobserved systems. These independence constraints may be assembled in a matrix *M*_{QCI}(*C*^{Q}).

Among the variables that do not coexist, some are obtained from others by means of quantum operations. These variables are thus related by data processing inequalities (DPIs) [64].

### Proposition 2.22 (DPI).

*Let* *and* *be a completely positive trace-preserving* (*CPTP*) *map on* *leading to a state ρ*′_{XSXT}. *Then* *I*(*X*_{S}:*X*_{T})_{ρ′XSXT}≤*I*(*X*_{S}:*X*_{T})_{ρXSXT}.

Remarks: (1) the map from a quantum state to the diagonal state with entries equal to the outcome probabilities of a measurement is a CPTP map and hence also obeys the DPI. (2) In general, *I*(*A*:*B* | *C*)_{ρ′ABC}≤*I*(*A*:*B* | *C*)_{ρABC} for

The DPI provide an additional set of entropic constraints, which can be expressed in terms of a matrix inequality *M*_{DPI}(*C*^{Q})⋅*v*≥0. In general, there are a large number of variables for which DPI hold. It is thus beneficial to derive rules that specify which of the inequalities are needed. First, note that whenever a concatenation of two CPTP maps

Secondly, whenever a state *ρ*_{XSXTXR}=*ρ*_{XSXT}⊗*ρ*_{XR} and a CPTP map *ρ*_{XSXTXR} are implied by the DPIs for *ρ*_{XSXT}. This follows from *I*(*X*_{S}:*X*_{T}*X*_{R})=*I*(*X*_{S}:*X*_{T}), *I*(*X*_{S}*X*_{R}:*X*_{T})=*I*(*X*_{S}:*X*_{T}) and *I*(*X*_{S}*X*_{T}:*X*_{R})=0.

Furthermore, whenever a node has classical and quantum inputs, there is not only a CPTP map generating its output state, but this map also can be extended to a CPTP map that simultaneously retains the classical inputs, as is the content of the following lemma, which also shows that retaining a copy of the classical inputs leads to tighter entropic inequalities.

### Lemma 2.23.

*Let Y be a node with classical and quantum inputs X*_{C} *and X*_{Q} *and* *be a CPTP map that acts at this node, i.e.* *is a map from* *to* *. Then* *can be extended to a map* *such that* *with the property that ρ′*_{XCY} *is classical on* *and ρ′*_{XC}=*ρ*_{XC}. *Furthermore, the DPIs for* *imply those for*

### Proof.

The first part of the lemma follows because classical information can be copied, and hence *X*_{C}, and then performing *ρ*′ for both maps in the argument below.)

Suppose *I*(*X*_{C}*X*_{Q}*X*_{S}:*X*_{T})_{ρ}≥*I*(*Y* *X*_{S}:*X*_{T})_{ρ′} is a valid DPI for *I*(*X*_{C}*X*_{Q}*X*_{S}:*X*_{T})_{ρ}≥*I*(*X*_{C}*Y* *X*_{S}:*X*_{T})_{ρ′} is valid for *I*(*X*_{C}*Y* *X*_{S}:*X*_{T})_{ρ′}≥ *I*(*Y* *X*_{S}:*X*_{T})_{ρ′}. ▪

All the above (in)equalities are necessary conditions for a vector to be an entropy vector compatible with the causal structure *C*^{Q}. They constrain a polyhedral cone in *m* is the total number of coexisting sets of *C*^{Q},

### Example 2.24 (Entropic constraints for the quantum instrumental scenario).

The cone *M*_{B}(*IC*^{Q}) that features 29 (independent) inequalities (note that the only weak monotonicity relations that are not made redundant by other basic inequalities are *H*(*A*_{Y}|*A*_{Z}*X*)+*H*(*A*_{Y})≥0, *H*(*A*_{Z}|*A*_{Y}*X*)+*H*(*A*_{Z})≥0, *H*(*A*_{Y}|*A*_{Z})+*H*(*A*_{Y}|*X*)≥0 and *H*(*A*_{Z}|*A*_{Y})+*H*(*A*_{Z}|*X*)≥0). In this case, a single independence constraint encodes that *X* is independent of *A*_{Y}*A*_{Z}:
*I*(*A*_{Z}*X*:*A*_{Y})≥*I*(*XZ*:*A*_{Y}) and *I*(*A*_{Y}*Z*:*X*)≥*I*(*Y* *Z*:*X*), which yield a matrix
*H*(*A*_{Y}), *H*(*A*_{Z}), *H*(*X*), *H*(*Y*), *H*(*Z*), *H*(*A*_{Y}*A*_{Z}), *H*(*A*_{Y}*X*), *H*(*A*_{Y}*Z*), *H*(*A*_{Z}*X*), *H*(*XY*), *H*(*XZ*), *H*(*Y* *Z*), *H*(*A*_{Y}*A*_{Z}*X*), *H*(*A*_{Y}*XZ*), *H*(*XY* *Z*)). Although the notation suppresses the different states, there is no ambiguity because e.g. the entropy of *X* is the same for all states with subsystem *X*. The full list of inequalities is provided in the electronic supplementary material.

From *Γ*(*C*^{Q}), an outer approximation to the set of compatible entropy vectors *C*^{Q} can be obtained using Fourier-Motzkin elimination. This leads to *M*_{B}(*C*^{Q})⋅*v*≥0, *M*_{QCI}(*C*^{Q})⋅*v*=0 and *M*_{DPI}(*C*^{Q})⋅*v*≥0 (except for the Shannon inequalities, which are already included in

### Example 2.25 (Entropic outer approximation for the quantum instrumental scenario).

The projection of *Γ*(*IC*^{Q}) leads to the entropic cone *I*(*X*:*Y* *Z*)≤*H*(*Z*). Hence,

This method has been applied to find an outer approximation to the entropy cone of the triangle causal structure in the quantum case (cf. figure 1*c*) [13]. This approximation did not coincide with the outer approximation to the classical triangle scenario obtained from Shannon inequalities and independence constraints. Whether there are more as yet unknown inequalities in the quantum case remains an open question (as opposed to the classical case where even better outer approximations have already been found [50]). In [13], the method was furthermore combined with the approach reviewed in §3b, where it was applied to a scenario related to *IC* (cf. example 3.8 below).

#### (iii) Causal structures with unobserved systems in other non-signalling constraints

The concept of a generalized causal structure was introduced in [11], the idea being to have one framework in which classical, quantum and even more general systems, for instance non-local boxes [65,66], can be shared by unobserved nodes and where theory-independent features of networks and corresponding bounds on our observations may be identified.

### Definition 2.26.

A *generalized causal structure* *C*^{G} is a causal structure which, for each observed node, has an associated random variable and, for each unobserved node, has a corresponding non-signalling resource allowed by a generalized probabilistic theory.

Classical and quantum causal structures can be viewed as special cases of generalized causal structures [11,67]. Generalized probabilistic theories may be conveniently described in the operational–probabilistic framework of [68]. Circuit elements correspond to so-called tests that are connected by wires, which represent propagating systems. In general, such a test has an input system, and two outputs: an output system and an outcome. In the case of a system with trivial input, this corresponds to a preparation test, and in the case of trivial output this is an observation test. In the causal structure framework, a test is associated to each node. However, each such test has only one output: for unobserved nodes this is a general resource state; for observed nodes it is a random variable. Furthermore, resource states do not allow for signalling from the future to the past, i.e. we are considering so-called causal operational–probabilistic theories. This is important for the interpretation of generalized causal structures.

A distribution *P* over the observed nodes of a generalized causal structure *C*^{G} is compatible with *C*^{G} if there exists a causal operational–probabilistic theory, a resource for each unobserved edge in that theory and transformations for each node that allow for the generation of *P*. We denote the set of all compatible distributions

### Proposition 2.27 (Henson, Lal & Pusey).

*Let C*^{G} *be a generalized causal structure, and let X, Y and Z be pairwise disjoint subsets of observed nodes in C*^{G}. *If a probability distribution P is compatible with C*^{G}, *then the d-separation of X and Y by Z implies the conditional independence X* ⫫ *Y* | *Z. Conversely, if, for every distribution P compatible with C*^{G}, *the conditional independence X* ⫫ *Y* | *Z holds, then X is d-separated from Y by Z in C*^{G}.

This allows for the derivation of conditional independence relations among observed variables that hold in any generalized probabilistic theory, which hence restrict a general entropic cone. Furthermore, it rigorously justifies retaining the independence constraints among the (observed) variables in coexisting sets in quantum causal structures (cf. §2bii), which can be seen as special cases of generalized causal structures.

In [11], sufficient conditions for identifying causal structures *C* for which, in the classical case, *C*^{C}, there are no restrictions on the distribution over observed variables other than those that follow from the d-separation of these variables were derived. As, by proposition 2.27, these conditions also hold in *C*^{Q} and *C*^{G}, this implies

Outer approximations to the entropic cones for causal structures, *C*^{G}, based on the observed variables and their independences only were derived in [11]. Moreover, a few additional constraints for certain generalized causal structures were derived there. For example, the entropic constraint *I*(*X*:*Y*)+*I*(*X*:*Z*)≤*H*(*X*) for the triangle causal structure of figure 1*c* (which had previously been established in the classical case [69]) was found. This constraint does not follow from the observed independences, but nonetheless holds for the triangle causal structure in generalized probabilistic theories.

In spite of this, a systematic entropic procedure, in which the unobserved variables are explicitly modelled and then eliminated from the description, is not available for generalized causal structures. The issue is that we are lacking a generalization of the Shannon and von Neumann entropy to generalized probabilistic theories that obeys submodularity and for which the conditional entropy can be written as the difference of unconditional entropies [70,71].

One possible generalized entropy is the *measurement entropy*, which is positive and obeys some of the submodularity constraints (those with *X*_{S}∩*X*_{T}={}) but not all [70,71]. Using this, [72] considered the set of possible entropy vectors for a bipartite state in *box world*, a generalized probabilistic theory that permits all bipartite correlations that are non-signalling [73]. They found no further constraints on the set of possible entropy vectors in this setting (hence, contrary to the quantum case, measurement entropy vectors of separable states in box world can violate monotonicity). Other generalized probabilistic theories and multiparty states have, to our knowledge, not been similarly analysed.

#### (iv) Other directions for exploring quantum and generalized causal structures

The approaches to quantum and generalized causal structures above are based on adaptations of the theory of Bayesian networks to the respective settings and on retaining the features that remain valid, for instance, the relation between d-separation and independence for observed variables [11] (cf. §2biii). Other approaches to generalize classical networks to the quantum realm have been pursued [54], where a definition of conditional quantum states analogous to conditional probability distributions was formulated.

Recent articles have proposed generalizations of Reichenbach’s principle [74] to the quantum realm [16,75,76]. In [16], a graph separation rule, q-separation, was introduced, whereas [75,76] rely on a formulation of quantum networks in terms of quantum channels and their Choi states.

An active area of research is the exploration of frameworks that allow for indefinite causal structures [77–79]. There are several approaches achieving this, such as the process matrix formalism [80], which has led to the derivation of so-called causal inequalities and the identification of signalling correlations that are achievable in this framework, however, not with any predefined causal structure [80,81]. Another framework that is able to describe such scenarios is the theory of quantum combs [82], illustrated by a quantum switch, a quantum bit controlling the circuit structure in a quantum computation. A recent framework with the aim to model cryptographic protocols is also available [83]. Some initial results on the analysis of indefinite causal structures with entropy have recently appeared [84].

In the classical, quantum and generalized causal structures considered above only the observed classical information can be transmitted via a link between two observed variables and, in particular, no additional unobserved system. This understanding of the causal links encodes a Markov condition. In other situations, it can be convenient for the links in the graph to represent a notion of future instead of direct causation, see e.g. [85,86].

## 3. Entropy vector approach with post-selection

A technique that leads to additional, more fine-grained inequalities is based on post-selecting on the values of parentless classical variables. This technique was pioneered by Braunstein & Caves [87] and has been used to systematically derive numerous entropic inequalities [6–8,17,18].

### (a) Post-selection in classical causal structures

In the following, we denote a random variable *X* post-selected on the event of another random variable, *Y* , taking a particular value, *Y* =*y*, as *X*_{|Y =y}. The same notation is used for a set of random variables *S*={*X*_{1},*X*_{2},…,*X*_{n}}, whose joint distribution is conditioned on *Y* =*y*, *S*_{|Y =y}={*X*_{1|Y =y},*X*_{2|Y =y},…,*X*_{n|Y =y}}. The following lemma can be understood as a generalization of (a part of) Fine’s theorem [88,89].

### Lemma 3.1.

*Let C*^{C} *be a classical causal structure with a parentless observed node X that takes values X=1,2,…,n, and let P be a joint distribution over all random variables Ω=X∪X*^{↑}*∪X*^{⤉} *in C*^{C} *(with P compatible with C*^{C}*). Then, there exists a joint distribution Q over the n⋅|X*^{↑}*|+|X*^{⤉} *| random variables* *such that* *for all x∈{1,…,n}.*

### Proof.

The joint distribution over the random variables *X*^{↑}∪*X*^{⤉} in *C*^{C} can be written as *X*^{⤉} *X*=*x*)*P*(*X*^{⤉}). As required, this distribution has marginals

It is perhaps easiest to think about this lemma in terms of a new causal structure *Ω*_{|X} that is related to the original. Roughly speaking, the new causal structure is formed by removing *X* and replacing the descendants of *X* with several copies each of which have the same causal relations as in the original causal structure (with no mixing between copies). More precisely, if *X* is a parentless node in *C*^{C}, we can form a *post-selected causal structure* on *Ω*_{|X} (post-selecting on *X*) as follows: (i) For each pair of nodes *A*, *B*∈*X*^{⤉} in *C*^{C}, make *A* a parent of *B* in *A* is a parent of *B* in *C*^{C}. (ii) For each node *B*∈*X*^{⤉} in *C*^{C} and for each node *A*_{|X=x}, make *B* a parent of *A*_{|X=x} in *B* is a parent of *A* in *C*^{C}. (iii) For each pair of nodes, *A*_{|X=x} and *B*_{|X=x}, make *B*_{|X=x} a parent of *A*_{|X=x} in *B* is a parent of *A* in *C*^{C}. (Note that there is no mixing between different values of *X*=*x*.) see figures 3 and 5 and example 3.3 for illustrations. This view gives us the following corollary of lemma 3.1, which is an alternative generalization of Fine’s theorem.

### Lemma 3.2.

*Let C*^{C} *be a classical causal structure with a parentless observed node X that takes values X=1,2,…,n, and let P be a joint distribution over all random variables X∪X*^{↑}*∪X*^{⤉} *in C*^{C} *(with P compatible with C*^{C}*). Then, there exists a joint distribution Q compatible with the post-selected causal structure C*^{C}_{X} *such that* *for all x∈{1,…,n}.*

The distributions that are of interest in this new causal structure are the marginals *x* (and their interrelations), as they correspond to distributions in the original scenario. Any constraints on these distributions derived in the post-selected scenario are by construction valid for the (post-selected) distributions compatible with the original causal structure.

### Example 3.3 (Post-selection in the instrumental scenario).

Consider the causal structure *IC* where the parentless variable *X* takes values 0 or 1. For any *P* compatible with *IC*^{C}, there exists a distribution *Q* compatible with the post-selected causal structure (figure 3*a*) such that *Q*(*Z*_{|X=0}*Y* _{|X=0}*A*)= *P*(*ZY* |*AX*=0)*P*(*A*) and *Q*(*Z*_{|X=1}*Y* _{|X=1}*A*)=*P*(*ZY* |*AX*=1)*P*(*A*). These marginals and their relations are of interest for the original scenario.

Note that the above reasoning may be applied recursively. Indeed, the causal structure with variables *Ω*_{|X} may be post-selected on the values of one of its parentless nodes. The joint distributions of the nodes *Ω*_{|X} and the associated causal structure may be analysed in terms of entropies, as illustrated with the following example.

### Example 3.4 (Entropic constraints for the post-selected Bell scenario [85]).

] In the Bell scenario with binary inputs *A* and *B* (figure 1*b*), lemma 3.2 may be applied first to post-select on the values of *A* and then of *B*. This leads to a distribution *Q* compatible with the post-selected causal structure (on *A* and *B*) shown in figure 3*b*, for which *Q*(*X*_{|A=a}*Y* _{|B=b})=*P*(*XY* |*A*=*a*,*B*=*b*) for *a*,*b*∈{0,1} (in this case the joint distribution is already known to exist by Fine’s theorem [88,89]). Applying the entropy vector method to the post-selected causal structure and marginalizing to vectors of form (*H*(*X*_{|A=0}), *H*(*X*_{|A=1}), *H*(*Y* _{|B=0}), *H*(*Y* _{|B=1}), *H*(*X*_{|A=0}*Y* _{|B=0}), *H*(*X*_{|A=0}*Y* _{|B=1}), *H*(*X*_{|A=1}*Y* _{|B=0}), *H*(*X*_{|A=1}*Y* _{|B=1})) yields the inequality *H*(*Y* _{1} | *X*_{1})+*H*(*X*_{1} | *Y* _{0})+ *H*(*X*_{0} | *Y* _{1})−*H*(*X*_{0} | *Y* _{0})≥0 and its permutations [6,87].

Whenever the input nodes take more than two values, the latter may be partitioned into two sets, guaranteeing applicability of these inequalities. Furthermore, Chaves [7] showed that these inequalities are sufficient for detecting any behaviour that is not classically reproducible in the Bell scenario where the two parties perform measurements with binary outputs.

The extension of Fine’s theorem to more general Bell scenarios [90,91], i.e. to scenarios involving a number of space-like separated parties that each choose input values and produce some output random variable (and scenarios that can be reduced to the latter), has been combined with the entropy vector method in [6,8].

Entropic constraints that are derived in this way provide novel and non-trivial entropic inequalities for the distributions compatible with the original classical causal structure. This idea was used in [8] to analyse the so-called *n*-cycle scenario, which is of particular interest in the context of non-contextuality and includes the Bell scenario (with binary inputs and outputs) as a special case. (A full probabilistic characterization of the *n*-cycle scenario was given in [92].)

In [6], new entropic inequalities for the bilocality scenario, which is relevant for entanglement swapping [93,94], as well as quantum violations of the classical constraints on the 4- and 5-cycle scenarios were derived. For the *n*-cycle scenario, the (polynomial number of) entropic inequalities are sufficient for the detection of any non-local distributions [7] (just as the exponential number of inequalities in the probabilistic case [92]). In the following we illustrate the method of [6,8] with a continuation of example 3.3.

### Example 3.5 (Entropic approximation for the post-selected instrumental scenario).

The entropy vector method from §2 is applied to the 5-variable causal structure of figure 3*a*. The marginalization is performed to retain all marginals that correspond to distributions in the original causal structure (figure 1*a*), i.e. any marginals of *P*(*Y* *Z*|*X*=0) and *P*(*Y* *Z*|*X*=1). Hence, the five-variable entropic cone is projected to a cone that restricts vectors of the form (*H*(*Y* _{|X=0}),*H*(*Y* _{|X=1}),*H*(*Z*_{|X=0}),*H*(*Z*_{|X=1}),*H*(*Y* _{|X=0}*Z*_{|X=0}),*H*(*Y* _{|X=1}*Z*_{|X=1})). Note that entropies of unobserved marginals such as *H*(*Y* _{|X=0}*Z*_{|X=1}) are not included. With this technique, the Shannon constraints for the three components (*H*(*Y* _{|X=0}),*H*(*Z*_{|X=0}),*H*(*Y* _{|X=0}*Z*_{|X=0})) are recovered (the same holds for *X*=1); no additional constraints arise here.

It is interesting to compare this to the Bell scenario considered in example 3.4. In both causal structures any 4-variable distributions, *P*_{Z|X=0Z|X=1Y|X=0Y|X=1} and *P*_{X|A=0X|A=1Y|B=0Y|B=1}, respectively, are achievable (the additional causal links in figure 3*b* do not affect the set of compatible distributions). However, the marginal entropy vector in the Bell scenario has more components, leading to additional constraints on the observed variables [6,87].

In some cases, two different causal structures, *C*_{1} and *C*_{2}, can yield the same set of distributions after marginalizing, a fact that has been further explored in [95]. When this occurs, either causal structure can be imposed when identifying the set of achievable marginal distributions in either scenario. If the constraints implied by the causal structure *C*_{1} are a subset of those implied by *C*_{2}, then those of *C*_{2} can be used to compute improved outer approximations on the entropic cone for *C*_{1}. Furthermore, valid independence constraints may speed up computations even if they do not lead to any new relations for the observed variables (Note that some care has to be taken when identifying valid constraints for scenarios with causal structure [95].). Similar considerations also yield a criterion for indistinguishability of causal structures in certain marginal scenarios—if *C*_{1} and *C*_{2} yield the same set of distributions after marginalizing, then they cannot be distinguished in that marginal scenario.

In examples like the above, where no new constraints follow from post-selection, it may be possible to introduce additional input variables in order to certify the presence of quantum nodes in a network. The new parentless nodes can then be used to apply lemma 3.1 and the above entropic techniques. Mathematically, introducing further nodes to a causal structure is always possible. However, this is only interesting if experimentally feasible, e.g. if an experimenter has control over certain observed nodes and is able to devise an experiment where he can change their inputs. In the instrumental scenario, this may be of interest.

### Example 3.6 (Variations of the instrumental scenario).

In this scenario (figure 1*a*), a measurement on system *A*_{Z} is performed depending on *X* (where, in the classical case, *A*_{Z} can w.l.o.g. be taken to be a copy of the unobserved random variable *A*). Its outcome *Z* (in the classical case a function of *A*) is used to choose another measurement to be performed on *A*_{Y} to generate *Y* (classically another a copy of *A*). It may often be straightforward for an experimenter to choose between several measurements. In the causal structure, this corresponds to introducing an additional observed input *S* to the second measurement (with the values of *S* corresponding to different measurements on *A*_{Y}). Such an adaptation is displayed in figure 4*a*. (Note that, for ternary, *S* the outer approximation of the post-selected causal structure of figure 4*d* with Shannon inequalities does not lead to any interesting constraints (as opposed to the structure of figure 4*e*, which is analysed further in example 3.8).)

Alternatively, it may be possible that the first measurement (on *A*_{Z}) is chosen depending on a combination of different independent factors, which each correspond to a random variable *X*_{i}. For two variables *X*_{1} and *X*_{2} the corresponding causal structure is displayed in figure 4*b*. This is an example of a causal structure where non-Shannon inequalities among classical variables lead to a strictly tighter outer approximation in the classical and quantum case than the approximations derived using only Shannon and weak-monotonicity constraints (also if there is a causal link from *X*_{1} to *X*_{2}) [50].

Taken together, these two adaptations yield the causal structure of figure 4*c*, relevant in the context of the principle of information causality [96] (see also example 3.8 below).

A second approach that relies on very similar ideas (also justified by lemma 3.1) is taken in [18]. For a causal structure *C*^{C} with nodes *Ω*=*X*∪*X*^{↑}∪*X*^{⤉} , where *X* is a parentless node, conditioning the joint distribution over all nodes on a particular *X*=*x* retains the independences of *C*^{C}. In particular, the conditioning does not affect the distribution of the *X*^{⤉} , i.e. *P*(*X*^{⤉} |*X*=*x*)=*P*(*X*^{⤉}) for all *x*. The corresponding entropic constraints can be used to derive entropic inequalities without the detour over computing large entropic cones, which may be useful where the latter computations are infeasible. The constraints that are used in [18] are, however, a (diligently but somewhat arbitrarily chosen) subset of the constraints that would go into the entropic technique detailed earlier in this section for the full causal structure. Indeed, when the computations are feasible, applying the full entropy vector method to the corresponding post-selected causal structure gives a systematic way to derive constraints, which are in general strictly tighter (cf. example 3.7).

So far, the restricted technique has been used in [18] to derive the entropic inequality

### Example 3.7.

Applying the post-selection technique for a binary random variable *C* to the causal structure from figure 5*a* yields the effective causal structure 5*d*. The latter can be analysed with the above entropy vector method, which leads to a cone that is characterized by 14 extremal rays or equivalently in terms of 22 inequalities, both available in the Supplementary Material. The inequalities *I*(*Z*: *X*_{|C=1})≥0, *I*(*Z*:*Y* _{|C=0})≥0, *I*(*X*_{|C=1}:*Y* _{|C=1}|*Z*)≥0 and *H*(*Z*|*X*_{|C=0})≥*I*(*X*_{|C=1}*Z*: *Y* _{|C=1}), which are part of this description, imply (3.1) above. We are not aware of any quantum violations of these inequalities.

Structures 5*b* and 5*c* both lead to the causal structure 5*e* upon post-selecting on a binary *C*. The latter causal structure turns out to be computationally harder to analyse with the entropy vector method and (working with existing variable elimination software on a desktop computer) we have not been able to perform the corresponding marginalization when taking all Shannon and independence constraints into account. Hence, the method outlined in [18] is a useful alternative here.

### (b) Post-selection in quantum and general non-signalling causal structures

In causal structures with quantum and more general non-signalling nodes, lemma 3.1 is not valid. For instance, Bell’s theorem can be recast as the statement that there are distributions compatible with the quantum Bell scenario for which there is no joint distribution of *X*_{|A=0}, *X*_{|A=1}, *Y* _{|B=0} and *Y* _{|B=1} in the post-selected causal structure (on *A* and *B*) that has the required marginals (in the sense of lemma 3.2).

Nonetheless, the post-selection technique has been generalized to such scenarios [13,17], i.e. it is still possible to post-select on parentless observed (and therefore classical) nodes taking specific values. In such scenarios, the observed variables can be thought of as obtained from the unobserved resources by means of measurements or tests. If a descendant of the variable that is post-selected on has quantum or general non-signalling nodes as parents, then the different instances of the latter node and of all its descendants do not coexist (even if they are observed, hence classical). This is because such observed variables are generated by measuring a quantum or other non-signalling system. Such a system is altered (or destroyed) in a measurement, hence does not allow for the simultaneous generation of different instances of its children due to the impossibility of cloning.

In the quantum case, this is reflected in the identification of the coexisting sets in the post-selected causal structure, as is illustrated with the following example. (Note that different instances of a variable after post-selection have to be seen as alternatives and not as simultaneous descendants of their parent node as the representation of the post-selected causal structure might suggest.)

### Example 3.8 (Information causality scenario in the quantum case [13]).

] The communication scenario used to derive the principle of information causality [96] is based on the variation of the instrumental scenario displayed in figure 4*c*. It has been analysed with the entropy vector method in [13], an analysis that is presented in the following.

Conditioning on values of the variable *S* is possible in the classical and quantum cases. However, whereas in the classical case the variables *Y* _{|S=s} for different *S* share a joint distribution (cf. lemma 3.1), they do not coexist in the quantum case. For binary *S*, the coexisting sets are {*X*_{1},*X*_{2},*A*_{Z},*A*_{Y}}, {*X*_{1},*X*_{2},*Z*,*A*_{Y}}, {*X*_{1},*X*_{2},*Z*,*Y* _{|S=1}} and {*X*_{1},*X*_{2},*Z*,*Y* _{|S=2}}. The only independence constraint in the quantum case is that *X*_{1}, *X*_{2} and *A*_{Y}*A*_{Z} are mutually independent. Marginalizing until only entropies of {*X*_{1},*Y* _{|S=1}}, {*X*_{2},*Y* _{|S=2}}, {*Z*} and their subsets remain, yields only one non-trivial inequality, *n*=2. (Note that in [13] they derived the more general inequality *I*(*X*_{1}:*Y* _{|S=1})+*I*(*X*_{2}:*Y* _{|S=2})≤*H*(*Z*)+*I*(*X*_{1}:*X*_{2}), where *X*_{1} and *X*_{2} are not assumed independent. Furthermore, this is also the only inequality found in the classical case when restricting to the same marginal scenario [17].) The same inequality was previously derived by Pawłowski *et al*. [96] for general *n*, where the choice of marginals was inspired by the communication task considered. Subsequently, another marginal scenario was considered in [13]—the one with coexisting sets {*X*_{1},*X*_{2},*Z*,*Y* _{|S=1}}, {*X*_{1},*X*_{2},*Z*,*Y* _{|S=2}} and all of their subsets—which led to additional inequalities.

Similar considerations were applied to causal structures allowing for general non-signalling resources, *C*^{G} in [17]. Let *X*^{↑}_{O} are the observed descendants and *X*^{⤉} _{O} the observed non-descendants of *X*. If the variable *X* takes values *x*∈{1,2,…,*n*}, this leads to a joint distribution of *X*=*x*, i.e. there is a joint distribution for *x*, denoted by *X* does not affect the distribution of the independent variables *X*^{⤉} _{O} , the distributions *X*^{⤉} _{O} , i.e. *x*, where *s* runs over the alphabet of *X*^{↑}_{O}. This encodes no-signalling constraints. There may be other constraints that arise from no-signalling. For instance, example 3.9 below suggests further constraints for each

In terms of entropy, there are *n* entropic cones, one for each *X*^{⤉} _{O} and on all of its subsets. These constraints define a convex polyhedral cone that is an outer approximation to the set of all entropy vectors achievable in the causal structure. Whenever the distributions

Several examples of the use of this technique can be found in [17], including the original information causality scenario (which we discuss in example 3.9) and an entropic analogue of monogamy relations for Bell inequality violations [97,98].

### Example 3.9 (Information causality scenario in general non-signalling theories).

This is related to example 3.8 above and reproduces an analysis from [17]. In this marginal scenario, we consider the Shannon cones for the three sets {*X*_{1},*Y* _{|S=1}}, {*X*_{2},*Y* _{|S=2}} and {*Z*} as well as the constraints *I*(*X*_{1}:*Y* _{|S=1})≤*H*(*Z*) and *I*(*X*_{2}:*Y* _{|S=2})≤*H*(*Z*) which are conjectured to hold [17]. (This conjecture is based on an argument in [99] that covers a special case; we are not aware of a general proof.)

These conditions constrain a polyhedral cone of vectors (*H*(*X*_{1}),*H*(*X*_{2}),*H*(*Z*),*H*(*Y* _{|S=1}), *H*(*Y* _{|S=2}),*H*(*X*_{1}*Y* _{|S=1}),*H*(*X*_{2}*Y* _{|S=2})), with 8 extremal rays that are all achievable using PR-boxes [65,66]. Importantly, the stronger constraint *I*(*X*_{1}:*Y* _{|S=1})+*I*(*X*_{2}:*Y* _{|S=2})≤*H*(*Z*), which holds in the quantum case (cf. example 3.8), does not hold here.

## 4. Alternative techniques

Instead of relaxing the problem of characterizing the set of probability distributions compatible with a causal structure by considering entropy vectors, other computational techniques are currently being developed. In the following, we give a brief overview of these methods.

In this context, note also that there are methods that allow certification that the only restrictions implied by a causal structure are the conditional independence constraints among the observed variables [11], as well as procedures to show that the opposite is the case [100,101]. Such methods may (when applicable) indicate whether a causal structure should be analysed further (corresponding techniques are reviewed in [18]).

### (a) Entropy vectors for other entropy measures

Entropy vectors may be computed in terms of other entropy measures, for instance in terms of the *α*-Rényi entropies [102]. For a quantum state *ρ*_{X}, the *α*-Rényi entropy is *H*_{1}(*X*)=*H*(*X*)). Classical *α*-Rényi entropies are included in this definition when considering diagonal states.

One may expect that useful constraints on the compatible distributions can be derived from such entropy vectors. For 0<*α*<1 and *α*>1 such constraints were analysed in [103]. In the classical case, positivity and monotonicity are the only linear constraints on the corresponding entropy vectors for any *α*≠0,1. For multiparty quantum states monotonicity does not hold for any *α*, like in the case of the von Neumann entropy. For 0<*α*<1, there are no constraints on the allowed entropy vectors except for positivity, whereas for *α*>1 there are constraints, but these are nonlinear. The lack of further linear inequalities that generally hold limits the usefulness of entropy vectors using *α*-Rényi entropies for analysing causal structures. To our knowledge, it is not known how or whether nonlinear inequalities for Rényi entropies may be employed for this task. The case *α*=0, where

The above considerations do not mention conditional entropy and hence could be taken with the definition *H*_{α}(*X*|*Y*):=*H*_{α}(*XY*)−*H*_{α}(*Y*). Alternatively, one may consider a definition of the Rényi conditional entropy, for which *H*_{α}(*X* | *Y* *Z*)≤*H*_{α}(*X* | *Y*) [104–108]. With the latter definition, the conditional Rényi entropy cannot be expressed as a difference of unconditional entropies, and so to use entropy vectors, we would need to consider the conditional entropies as separate components. Along these lines, one may also think about combining Rényi entropies for different values of *α* and to use appropriate chain rules [109]. Because of the large increase in the number of variables compared to the number of constraints it is not clear that this will yield useful new conditions.

A second family of entropy measures, related to Rényi entropies, are the Tsallis entropies [110,111], which can be defined by *H*_{T,α}(*X*):=(1/(1−*α*))(2^{(1−α)Hα(X)}−1). Little work has been done on these in the context of causal structures, but some numerical work [112] suggests that they have advantages for detecting non-classicality in the post-selected Bell scenario (see also [113]).

### (b) Non-entropic techniques

#### (i) Polynomial restrictions on compatible distributions

The probabilistic characterization of causal structures, depends (in general) on the dimensionality of the observed variables. Computational hardness results suggest that a full characterization is unlikely to be feasible, except in small cases [114,115]. Recent progress has been made with the development of procedures to construct polynomial Bell inequalities. A method that resorts to linear programming techniques [15] has led to the derivation of new inequalities for the bilocality scenario (as well as a related four-party scenario). Another iterative procedure allows for enlarging networks by adding a party to a network in a particular way. (Here, adding a party means adding one observed input and one observed output node as well as an unobserved parent for the output, the latter may causally influence one other output random variable in the network.) This allows for the constructions of nonlinear inequalities for the latter, enlarged network from inequalities that are valid for the former [14].

#### (ii) Inflations of causal structures

Furthermore, a recent approach relies on considering enlarged networks, so-called inflations, and inferring causal constraints from those [19,116]. Inflated networks may contain several copies of a variable that each have the same dependencies on ancestors (the latter may also exist in several instances) and which share the same distributions with their originals. Such inflations allow for the derivation of probabilistic inequalities that restrict the set of compatible distributions. These ideas also bear some resemblance to the procedures in [20], in the sense that they employ the idea that certain marginal distributions may be obtained from different networks; they are, however, much more focused on causal structures featuring interesting independence constraints. Inflations allowed the authors of [19] to refute certain distributions as incompatible with the triangle causal structure from figure 1*c*, in particular the so-called W-distribution, which could neither be proved to be incompatible entropically nor with the covariance matrix approach below.

#### (iii) Semidefinite tests relying on covariance matrices

One may look for mappings of the distribution of a set of observed variables that encode causal structure beyond considering entropies. For causal structures with two generations, i.e. one generation of unobserved variables as ancestors of one generation of observed nodes, a technique has been found using covariance matrices [20]. Each observed variable is mapped to a vector-valued random variable and the covariance matrix of the direct sum of these variables is considered. owing to the law of total expectation, this matrix allows for a certain decomposition depending on the causal structure. For a particular observed distribution and its covariance matrix, the existence of such a decomposition may be tested via semidefinite programming. The relation of this technique to the entropy vector method is not yet well understood. A partial analysis considering several examples is given in Section X of [20].

## 5. Open problems

The entropy vector approach has led to many certificates for the incompatibility of correlations with causal structures. However, we still lack a general understanding of how well entropic relations can approximate the set of achievable correlations. Firstly, the non-injective mapping from probabilities to entropies is not sufficiently understood and, secondly, the current methods employ further approximations, e.g. by restricting the number of non-Shannon inequalities that can be considered at a time. It is as yet unknown whether the entropy vector method (without post-selection) can ever distinguish correlations that arise from classical, quantum and more general non-signalling resources. Such insights may also inform the question of whether there exist novel inequalities for the von Neumann entropy of multiparty quantum states.

The post-selection technique allows for the derivation of additional constraints that may distinguish quantum from classically achievable correlations in the Bell scenario and possibly in other examples. However, the method relies on the causal structure featuring parentless observed nodes, hence it is not always applicable (see e.g. the triangle scenario). In such situations, one may try to combine the entropic techniques reviewed here with the inflation method [19], which might allow for further entropic analysis of several causal structures, e.g. of the triangle scenario.

Criteria to certify whether a set of entropic constraints is able to detect non-classical correlations are currently not available. For many of the established entropic constraints on classical causal structures it is unknown whether or not they are also valid for the corresponding quantum structure. In the case of the Bell scenario, this problem has been overcome. It has been shown that the known entropic constraints are even sufficient for detecting any non-classical correlations [7]. However, as the proof is specific to the scenario, finding a systematic tool to analyse the scope of the entropic techniques remains open.

## Data accessibility

Additional accompanying data can be found in the electronic supplementary material.

## Authors' contributions

Both authors contributed to the ideas present in the article. M.W. performed the computations and wrote the first draft. Both authors discussed and contributed to the final version of the manuscript.

## Competing interests

We have no competing interests.

## Funding

R.C. is supported by the EPSRC’s Quantum Communications Hub (grant number EP/M013472/1) and by an EPSRC First Grant (grant no. EP/P016588/1).

## Acknowledgements

We thank Rafael Chaves and Costantino Budroni for confirming details of [17].

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3906112.

- Received July 12, 2017.
- Accepted September 28, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.