## Abstract

In this paper, we study the steady-state behaviour of a reaction network of interacting molecules using the chemical master equation (CME). The model considers a set of base species from which further compounds are created via binary reactions, as well as by monomolecular and dissipation reactions. The model includes external arrivals of molecules into the reaction volume and assumes that the reaction rates are proportional to the number of molecules of the reactants that are present. We obtain an explicit expression for the solution of the CME in equilibrium under the assumption that the system obeys a mass conservation law for the overall rate of incoming and outgoing molecules. This closed-form solution shows that the joint probability distribution of the number of molecules of each species is in ‘product form’, i.e. it is the product of the marginal distributions for the number of molecules of each species. We also show that the steady-state distribution of the number of molecules of each base and synthesized species follows a Poisson distribution.

## 1. Introduction

This paper considers chemical reactions that start from a set of base molecular species and can allow the composition of viable compounds from these species. The reactions are assumed to result either between molecules of any two compounds or by transformation of some compounds into another one. Such reactions are of particular interest in biochemistry. Within this context, we derive a chemical master equation (CME), and then obtain its steady-state solution under the physically motivated assumption that the overall arrival rate of molecules into the system is identical to the rate at which molecules are removed from the system.

This steady-state solution has the remarkable property that the joint probability distribution of the number of molecules is the product of the marginal distributions for the number of molecules of each species. The marginal probability distribution for the number of molecules of each compound is shown to follow a Poisson law.

The models that we present are a generalization, restricted to the equilibrium solution, of the work by Jahnke & Huisinga (2007) where monomolecular reactions are studied and the explicit time-dependent solution is derived. The equilibrium solution for monomolecular systems is itself a special case of a result known within the computer system performance and operations research community and in operations research as Jackson's theorem (e.g. Gelenbe & Pujolle 1999).

In early work (Bartholomay 1964), the analogy between stochastic models of queues and chemical reactions was pointed out, while fairly complex queuing models have long been used to analyse the performance of information processing systems (Gelenbe 1991, 1993; Gelenbe & Pujolle 1999; Gelenbe & Fourneau 2002). Related approaches with a similar scientific approach arise in the study of gene regulatory networks using their probabilistic master equations (Gelenbe 2007*a*). Interestingly enough, recent work (Cardelli 2007) attempts to formalize chemical reactions in a manner that allows their detailed computer simulation from first principles. Thus, the scientific links between chemistry and models of computer systems and programs seem to arise from different angles.

All of these approaches complement the work of Gillespie (1976, 2001), which has offered effective solution techniques for continuous time and discrete state-space Markov models of the molecular representation for the manner in which chemical reactions take place. This probabilistic framework parallels the deterministic rate equations (Segel 1984; Cho *et al*. 2003), based on the premise that reactions will have a significant noise content (McAdams & Arkin 1997; Rao *et al*. 2002) due to the random encounter of the constituent molecules. Probabilistic considerations can be of even greater interest in biochemistry when intracellular dynamics are considered (Fell 1997) owing to the smaller physical dimensions, and greater variability that may be involved.

Much work has also been devoted to relating the parameters of the deterministic generalized mass action models (Wolkenhauer *et al*. 2004) of reactions to the stochastic counterpart that is known as the CME (Gillespie 1992). Recent work (Fayolle *et al*. 2004) has related the stochastic model solutions to their deterministic large populations limits on the assumption that the steady-state solution of the CME has ‘product form’ (Gelenbe 1993; Gelenbe & Pujolle 1999), meaning that the joint probability distribution for the number of molecules of different species is given as the product of the marginal distribution of the number of molecules in each species. However, the analytical solution of the CME presents significant difficulties, and only some special cases have been solved explicitly (Schnell & Mendoza 1997; Jahnke & Huisinga 2007). Related work in computer science (Gelenbe 2007*b*) deals with models of computer infection.

## 2. Reaction channels

Consider a reaction network that includes an arbitrary set of compounds that would be denoted by capital letters A, B, C, D, etc., where these letters are place holders (i.e. variables) that denote arbitrary compounds. The reaction network that we model consists of three types of reaction channels:

binary reaction channels of the form (1): A+B→C;

unary reaction channels of the form (2): C→D; and

dissipation reaction channels of the type (3): C→*.

Again, note that A, B, C, D are just placeholder symbols for the general types of molecule being represented.

Furthermore, the number of reaction channels and distinct compounds that may result from reactions is not bounded *a priori*, since we use combinations of existing compounds to create new compounds. In practice, many of these combinations may be unstable or unlikely to occur, but the potential number of viable compounds considered is in principle unlimited.

## 3. The mathematical assumptions

The state of the reaction system is the set *K*(*t*) that represents the number of molecules of each compound at time *t*≥0. A particular value taken by *K*(*t*) is the set of pairs {(C, *c*)}, where *c* is the number of molecules of compound (or species) C in the system. Thus, if *Z*^{+} is the set of non-negative integers and *Ω* is the set of all compounds that may appear in this reaction system, then the state of the reaction system, which keeps track of the number of molecules of each compound present in the reaction system, will be a subset of *Ω X Z*^{+}, namely *K*(*t*)⊂*Ω X Z*^{+}.

We denote by *k* a particular value taken by the set *K*(*t*), so that the CME is given in terms of the probability distribution *p*(*k*,*t*)=Prob[*K*(*t*)=*k*], based on the assumption that {*K*(*t*):*t*≥0} is a continuous time and discrete state-space Markov chain, and that *p*(*k*, *t*) satisfies a Chapman–Kolmogorov equation (Medhi 1991). In this case, the state space is composed of denumerable subsets of *Ω X Z*^{+}. The specific assumptions used to construct the Markov process {*K*(*t*):*t*≥0} are given below.

Molecules of a generic compound C enter the reaction volume from an external source, according to a Poisson process of rate

*λ*_{C}≥0. If*λ*_{C}=0, this simply means that molecules of compound C can only be generated internally in the reaction volume as a result of reactions among other molecules.Molecules of the different compounds have a propensity either to engage in a binary reaction with each other, or to engage in a monomolecular reaction, and this happens in the infinitesimal time interval [

*t*,*t*+Δ*t*[. We denote by*μ*_{…}≥0 the propensities of the reactions, which are being defined below.A molecule of (some) species C may engage in a monomolecular reaction yielding a molecule of some species A with probability

*μ*_{CA}Δ*t*+*o*(Δ*t*). If there are*c*molecules of species C present in the system before the reaction occurs, then the probability of this monomolecular reaction will be*c.μ*_{CA}Δ*t*+*o*(Δ*t*), i.e. proportional to the number of molecules of species C that are present.A molecule of some species C may engage in a reaction with a molecule of some species A to produce a molecule of species D with probability

*c.a.μ*_{CAD}Δ*t*+*o*(Δ*t*). Note that the propensity*μ*_{CAD}of this reaction will be zero if either such a reaction cannot take place, or if the reaction does take place but it produces a molecule of a species that is different from D. Obviously the effective reaction rate is once again proportional to*a.c*, i.e. to the respective number of molecules of A and C, which are present before the reaction, and also proportional to the propensity*μ*_{ACD}of this reaction.This model includes the possibility for a molecule of some species to react with a molecule of the same species.

Note that there may be reasons (e.g. related to energy levels) to distinguish the case where a molecule of type C initiates the reaction with a molecule of type A from the case where a molecule of type A initiates the reaction with a molecule of type C. Since a binary reaction, when initiated by a molecule of type C, requires that a molecule of the other type A to be present in the system, if the number

*a*=0, then obviously the reaction cannot take place. In cases where one would*not*wish to distinguish between the ‘directionality’ of the reaction (initiated by one species of molecule or by the other species), one would simply take*μ*_{ACD}=*μ*_{CAD}, which is likely to be the common case in chemical reality.Finally, with probability

*μ*_{Cl}Δ*t*+*o*(Δ*t*), a molecule of some species C is depleted or ‘lost’ from the reaction system; hence the symbol*μ*_{Cl}≥0.

Note that the parameter

*μ*_{CA}Δ*t*can be viewed as the probability that a molecule of type C the energy level needed to initiate a monomolecular reaction that transforms it into a molecule of type A in some time interval [*t*,*t*+Δ*t*[. Similarly,*μ*_{CAD}Δ*t*is the probability that a molecule of type C undergoes a collision with a molecule of type A, that it has the right size and geometry, and has the energy level needed to initiate a binary reaction with a molecule of type A in [*t*,*t*+Δ*t*[, resulting in a molecule of type D.

### (a) The chemical master equation

State transitions in this model will be defined in terms of set operations. For the sets *X* and *Y*, where *Y* is a subset of *X*, let *X*-*Y* denote the set of elements of *X*, which are not contained in *Y*.

Now for the set of pairs *k*={(C, *c*)}, we define(3.1)and(3.2)and *k*−*e*_{A} is not defined if (A,*a*)∉*k*.

For *mathematical convenience*, in order to reduce the number of distinct terms that are given in the equations below, define the rate parameter *R*_{C}, which is the sum of all the propensities relating to some arbitrary compound C, namely(3.3)

With the assumptions that are made, the probability distribution of the system state at time *t*+Δ*t* is derived from that at time *t* using the following expression:(3.4)where *o*(Δ*t*) denotes all terms that have coefficients of order (Δ*t*)^{2} or higher; whereas the terms on the r.h.s. of (3.4) correspond to the following events:

the first term corresponds to the case where nothing happens: no external arrival of molecules and no molecules are able to initiate a reaction. This follows from the fact that the probability that no external arrivals of molecules occur, that none of the molecular species react, and that no molecules depart from the reaction system, in the time interval [

*t*,*t*+Δ*t*[, is(3.5)The second term corresponds to the possible arrival into the reaction system of one molecule of any type C, hence the term

*p*(*k*−*e*_{C},*t*) that corresponds to the state*before*the arrival. However, because any state*k*can only have non-negative values (these are counts of molecules), we have the term 1[*c*>0] that is 1 if*c*>0, and is 0 otherwise.The third term covers monomolecular reactions from type C to type A, where the term

*p*(*k*+*e*_{C}−*e*_{A},*t*) corresponds to the previous state where there is one extra molecule of compound C and one less of species A, and obviously we must have*a*>0 since if a new molecule of compound A is created in this reaction, the resulting number of such molecules must be positive.The fourth term represents the decay or loss of a molecule of type C.

The fifth term corresponds to binary reactions leading to molecules of type D. Note that the term

*μ*_{CAD}is the propensity of molecular species C to react with molecular species*S*_{A}, in order to produce a molecule of compound D, and the multiplying factor (*c*+1)(*a*+1) corresponds to the fact that the reaction rate is assumed to be proportional to the size of each of the two reacting populations. The term*p*(*k*+*e*_{C}+*e*_{A}−*e*_{D},*t*) corresponds to the state*before*the reaction, since the reaction will replace one molecule of each of compound C and A to create a molecule of the compound D.

Given the term *p*(*k*,*t*+Δ*t*)−*p*(*k*,*t*) and moving this to the l.h.s., then dividing both sides by Δ*t*, and taking the limit Δ*t*→0, we obtain the CME:(3.6)

### (b) Steady-state solution

The steady-state solution that we obtain is summarized in the following result, and in order to simplify the expressions that are given, we will first introduce some notation for all the compounds A, C, D:(3.7)(3.8)(3.9)and define *β*_{C}*, the total incoming rate of molecules of species C into the reaction system for the compound C*, both from the external source and as a result of reactions, as(3.10)where (3.10) is a system of equations that relates the rate of flow of all the compounds with each other and with the externally arriving flow rates *λ*_{C}. The existence and uniqueness of the solution to these equations is discussed in the appendix A.

*Consider β*_{C} *given by the system of equations* (*3.10*) *and assume that the following flow conservation equation is satisfied over all the compounds in the system*:(3.11)*where l*_{C} *is given in* (*3.7*)*. Note that this equation simply states that the total rate of externally arriving molecules to the reaction system must be identical to the total rate of ‘departing’ molecules due to the dissipation reactions*.

*Furthermore, let G⊂Ω be any finite subset of the set of all compounds, such that for each compound A∈G, β*_{C}*>*0*, and let* .

*Then the steady-state solution to the CME* (*3.7*) *is given by*(3.12)

Note that (3.12) is the solution for the *joint* probability distribution of the number of molecules of each of the compounds in *G*, and is expressed as the *product* of the marginal distribution of the number of molecules of each species. This is a remarkable property, since it allows us to compute directly the marginal probability distribution of the number of molecules for each molecular species separately once the *β*_{A} have been obtained; furthermore, the joint probability distribution can then be obtained by just multiplying the known marginal probabilities. This property is sometimes known as a ‘product form solution’; it is seldom found in stochastic systems, but has been shown to exist, for instance, in a number of queuing network models (Gelenbe 1991, 1993; Gelenbe & Pujolle 1999; Gelenbe & Fourneau 2002).

From the expression for *p*(*k*) in (3.12), we easily deduce that *Φ*_{A} is the average number of molecules of compound A that are present in the reaction channel in steady state.

Given (3.6) in steady state by setting the derivative of the probability distribution *p*(*k*,*t*) to zero yields the following equation for the equilibrium probability distribution *p*(*k*):(3.13)Note that if (3.12) is true, then(3.14)(3.15)(3.16)(3.17)Now dividing both sides of (3.14) by *p*(*k*) from (3.12), and using the above expressions, we obtain(3.18)which after using (3.10), becomes(3.19)After cancelling identical terms, we getwhich is the flow conservation relation (3.11) that has been assumed to hold. ▪

The steady-state solution is obtained under the assumption (3.11), which is physically based. It states that the total arrival rate of molecules is identical to the rate at which they are removed from the system as a result of the loss of molecules through depletion or through attempted reactions that fail because the co-reactant species is depleted.

## 4. Conclusions

In this paper, we have studied a probability model of chemical reactions, which allows the combination of any two compounds that exist in the reaction system to create another compound, as well as monomolecular reactions that transform any of the compounds (or initial molecular species) into another one. We also allow dissipation reactions that eliminate certain compounds or species from the system. The effective reaction rates between species or compounds are proportional to the number of molecules of the participating compounds that are present in the system. Furthermore, each of the reactions has specific reaction rates that depend on the compounds that are involved. The model allows us to model chains of reactions that may lead to arbitrarily large compounds that contain any number of base molecules of each species.

We derive a CME, and obtain a closed-form solution for the steady-state probability distribution of the number of molecules of each original or resulting species in the reaction volume provided that a conservation law on the flow of molecules is satisfied. The solution obtained has the remarkable property that the joint probability distribution of the number of molecules of each species is in ‘product form’, i.e. it is the product of the marginal distributions of the number of molecules of each species. Our results also show that the steady-state distribution of the number of molecules of each species follows a Poisson distribution.

Thus, this work provides a compact analytical solution to the CME in steady state, offering the advantage of avoiding time-consuming human programming of the model, or of time-consuming Monte Carlo computer simulations. It is hoped that future work will include a finer representation of the events in chemical reactions such as the role of energy.

## Footnotes

One contribution of 6 to a Special Feature ‘Stochastic networks: from theory to practice’.

- Received November 20, 2007.
- Accepted March 5, 2008.

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

- Copyright © 2008 The Royal Society