## Abstract

We consider an evolving network of a fixed number of nodes. The allocation of edges is a dynamical stochastic process inspired by biological reproduction dynamics, namely by deleting and duplicating existing nodes and their edges. The properties of the degree distribution in the stationary state is analysed by use of the Fokker–Planck equation. For a broad range of parameters, exponential degree distributions are observed. The mechanism responsible for this behaviour is illuminated by use of a simple mean field equation and reproduced by the Fokker–Planck equation. The latter is treated exactly, except for an approximate treatment of the degree–degree correlations. In the limit of 0 mutations, the degree distribution becomes a power law with exponent 1.

## 1. Introduction: networks and evolutionary dynamics

Whenever a phenomenon can be thought of in terms of components and relations between components, the mathematical language of graph theory or networks may be helpful to the description, analysis and the understanding of the relevant problem of interest. A large amount of work is currently being carried out with the aim to understand the structure and statistical properties of networks in the hope that certain aspects of the general mathematical characterization of network structure may be related to common functional properties, e.g. vulnerability to the breakdown of part of the network (Biggs 1994; Albert & Barabási 2002; Newman 2003).

Our aim in the present paper is to discuss an example of a persistently evolving network of fixed size. The dynamics is driven by a Moran process (Moran 1962) inspired by evolutionary dynamics. A time step consists of two events: (i) a randomly selected node is removed with all its attached edges and (ii) a randomly selected node is selected for duplication. When a node is removed, all its edges disappear obviously as well. The duplication event involves the creation of new edges. The daughter node inherits edges with tunable probabilities. Mutations are represented in two ways. Firstly, the daughter inherits edges to neighbours of the parent with a probability that can be smaller than 1. Secondly, edges to nodes not connected to the parent are added to the daughter with a probability that may be larger than 0. Both these processes tend to make the daughter differ from the parent. Finally, the parent–daughter relationship suggests that an edge should be established between daughter and parent with a certain probability.

Other models in the literature have considered aspects of the network dynamics described above. For persistently growing networks, the process consisting solely of the second (duplication) step has been considered (e.g. Vázquez 2003; Ipolatov *et al*. 2005; Krapivsky & Redner 2005). One finds typical power-law degree distributions. A model considering a stochastic combination of rewiring, addition of new links and creation of new nodes was studied by Albert & Barabási (2000). They found power-law degree distributions with exponents above 2. When the probability for rewiring of edges is above a certain limit, the degree distribution becomes exponential. A model consisting of adding and removing edges to a fixed set of nodes was studied by Epstein & Wang (2002). The model generates power-law degree distributions. Power-law degree distributions with exponents above 2 were also found in other models of fixed node number in which preferential attachment is an explicit part of the dynamics (see Cheng & Tang 2004; Sarshar & Roychowdhury 2004; Salathe *et al*. 2005). Our model, consisting of a fixed number of nodes, produces typically exponential degree distributions, except in the limit of perfect inheritance where a degree distribution is obtained, i.e. a power-law distribution with an exponent 1.

The paper is organized as follows. In the next section, we describe the dynamical algorithm of the model and, to develop some intuition, discuss the degree distribution by a simple mean field argument. Next, we derive the Fokker–Planck equations for the degree distribution and discuss approximations involved in these equations. In the summary and discussion section, we relate the simple node-and-edge model to emerging network structures in the individual-based tangled nature model (Anderson & Jensen 2005; Laird & Jensen 2006*b*, 2007) of evolutionary ecology and also discuss more broadly the relevance of the simple node-and-edge dynamics and the results derived.

## 2. Simple node model

Let us consider the following simple node-and-edge model of a network of *N* nodes and associated edges. The dynamics conserve the number of nodes. A time step consists of choosing a node at random and removing it, together with all its connected edges. Next, another node, a parent, is randomly selected from the remaining *N*−1 nodes and is duplicated in the form of a daughter. All nodes connected to the parent are now given connections to the daughter with probability *p*_{e}. All nodes unconnected to the parent are given connections to the daughter with probability *p*_{n}. An edge between the daughter and the parent is placed with probability *p*_{p} (for a similar model with , see Farid & Christensen 2006). These probabilities represent the degree of similarity or correlations between the daughter and the parent. The daughter will be a complete copy of the parent if and . It seems natural to allow for a possible ‘interaction’ between the parent and the offspring, which is represented by the possibility of establishing an edge between the parent and the offspring with probability *p*_{p}. It is straightforward to check in the mean field (Laird & Jensen 2006*a*) that the described edge and node dynamics converge towards a steady-state network with a time-averaged connectance(2.1)The dynamics is simple to simulate. The results are independent of initial configuration. To make the transient very short, one may start the simulation from a binomial network of *N* nodes where edges between any two nodes are established with a probability equal to the mean field connectance given in equation (2.1). After a short transient, a steady state is established. The time-averaged degree distribution behaves exponentially for all values of the control parameters *p*_{e}, *p*_{n} and *p*_{p} (see figure 1 and Laird & Jensen 2006*a*), except in the limit and vanishing *p*_{n} and *p*_{p}, where the distribution falls off like one over the degree. Inspired by the relation between the node-and-edge model and the tangled nature model (see §4 below for details), we choose *p*_{p} equal to the connectance in equation (2.1); i.e. we link *p*_{e}, *p*_{n} and *p*_{p} together by solving the equation and obtain(2.2)We will, in a moment, write down the complete Fokker–Planck equation for the degree distribution of a network evolving according to the process described above. The full equation is, however, rather involved and can only be solved by numerical iteration. It is therefore illuminating to make the following simplistic and heuristic considerations. Let *n*_{k}(*t*) denote the number of nodes of degree *k* after *t* time steps. Let us focus solely on the following aspects of the dynamics.

*Removal*. A node of degree*k*is selected for annihilation that occurs with probability*n*_{k}(*t*)/*N*. Nodes sharing edges with the removed node decrease their degree by 1. The probability that a node of degree*k*ends up as a degree*k*−1 node through this process is , where the degree*k*_{r}of the removed node is replaced by the average degree.*Duplication*. The process of attaching edges to the new daughter node will increase the degree of existing nodes with probabilities that depend on whether these share edges with the parent node or not. A node receives an edge because it is selected to become a neighbour of the daughter (of a degree*k*_{p}parent) with probability

(2.3)As we are seeking a qualitative self-consistent mean field equation, we substituted the average degree 〈*k*〉 in the last expression. A parent node of degree *k* receives an edge to the daughter, which occurs with probability . It is, in general, rather complicated to estimate the probability with which the new daughter node ends up with a specific degree (see §3 below for details). However, in the limit , for fixed *p*_{p} so according to equation (2.2), the probability that the daughter is allocated *k* edges can be estimated as(2.4)The first term corresponds to the daughter connecting to the parent and inheriting *k*−1 edges from the parent. The second term corresponds to no edge between the parent and the daughter and *k* edges inherited. Here, and in the following few equations, we denote by those terms of order *p*_{n} arising from the allocation of edges between the daughter and the nodes not connected to the parent node. We combine these events to obtain the mean field equation for the evolution of *n*_{k}(*t*)(2.5)In the stationary limit , we obtain the following solution:(2.6)where(2.7)and . Using the normalization and the self-consistent equations(2.8)we obtain the exponential solution with and(2.9)where the approximation refers to the limit for fixed *N*. The divergence of the exponential cut-off in the limit and is in qualitative agreement with the change in the exponential part of the degree distribution obtained in simulations of the network (figure 1). However, this simplistic mean field discussion is only of heuristic value. We now present the full Fokker–Planck-like equation for the process.

## 3. Fokker–Planck equation

Some care has to be taken when we develop the Fokker–Planck equation for the ensemble-averaged time-dependent number of nodes of degree *k*, *n*_{k}(*t*), constrained by the condition . Firstly, it is worth mentioning that the equations are concerned with the ensemble-averaged quantity *n*_{k}(*t*) and accordingly neglect ‘microscopic’ fluctuations in the number of nodes of degree. Secondly, to make a closed set of equations, one needs to perform some kind of truncation of a hierarchy of equations, which couples the degree distribution to the degree–degree correlation functions, which, in turn, is coupled to triple correlation functions, etc. This is usually the case. We will first write down the equations formally, including the needed degree–degree correlation functions, and then make clear the nature of the heuristic approximation we have used to estimate this correlation function.

We structure the analysis in the following way. Removal (R): the effect of removing, from a population of *N*, a node and its edges described by a rate term . Duplication (D): the effect of introducing, into a population of *N*−1, a new daughter node and attaching edges described by a rate term . Our equation has accordingly the form(3.1)Removal of a node affects the network in two ways: , the node being removed from the network and , the effect on the nodes being adjacent, i.e. sharing edges, to the node being removed. Therefore,(3.2)

The effect of the duplication process is conveniently broken up into three sub-effects: , the effect on the parent; , the effect on the daughter; and , the effect on the adjacent nodes, i.e. those that will receive an extra edge as a result of the duplication. Hence,(3.3)We have suppressed the time step, *t*, and network size, *N*, for notational ease.

Next, we derive detailed expressions for each of the terms above. The direct effect on *n*_{k} of removing a node of degree *k* is to decrement *n*_{k}. The probability of selecting a node of degree *k* is *n*_{k}/*N* and therefore,(3.4)After the removal, the degree of the nodes connected to the removed node, i.e. the adjacent nodes, will decrease by 1. For this we need the *Edge* probability,(3.5)In general we do not have a closed analytic expression for , but below we give approximate forms neglecting, or treating non-rigorously, degree–degree correlations. Here, we note(3.6)The first sum is over the degree of the removed node and the second sum is over the number, *q*, of nodes of degree the removed node is connected to.

A node of degree *k* is selected for duplication with probability . The daughter of this node receives an edge to the parent with probability *p*_{p}. Thus, the parent increases its degree by 1 with probability(3.7)The new daughter node can add to *n*_{k} by an amount determined by the probability of finishing with *k* edges,(3.8)To keep track of the bookkeeping, we have introduced a new probability(3.9)which is given by(3.10)The r.h.s. adds up the probabilities associated with the process where the daughter inherits *k*_{1} edges to nodes already connected to the parent. Each of these edges is inherited by the daughter with probability *p*_{e}. The daughter may receive an additional edges to nodes not connected to the parent. Each of these edges are attached to the daughter with probability *p*_{n}. The factor denotes the probability of edges allocated to the daughter, *k*_{1} of the edges are inherited, i.e. these edges connect to some of the *k*_{p} nodes connected to the parent. In addition, the daughter receives *k*_{2} edges, which connect to nodes not connected to the parent. The probability for this event is(3.11)Next, we consider the effect of the duplication on the adjacent nodes, and we need to distinguish between the nodes sharing an edge with the parent (Ed) and nodes not connected to the parent (nEd). Let us first consider the Ed nodes. We introduced previously as the probability that a node, here the parent, of degree *k*_{p} is connected to *q*_{E} nodes of degree *k*. The duplication process will, with probability *p*_{e}, attach a new edge from the daughter to each of these nodes and thereby increase their degree from *k* to *k*+1. Let us now turn to the nEd nodes. There are nodes that share no edge with the parent. With probability , a total of *q*_{nE} of these nodes are of degree *k* and will receive a new edge to the daughter. Here, is equivalent to introduced in equation (3.5), though is concerned with the nodes that a node of degree *k*_{p} (in a set of *N*−1 nodes) is *not* connected to. Among these nodes, *q*_{nE} have degree *k* with probability . Therefore, we have(3.12)

Degree–degree correlations induced by the evolutionary dynamics makes it difficult to write an explicit form for and . Numerical simulations show that the network is disassortative (Newman 2003), i.e. nodes with a high degree tend to attach to nodes with a low degree, and that the Pearson correlation coefficient decreases rapidly with size of the network and increased connectance. This numerical finding suggests that it makes sense to analytically treat the degree correlations approximately. One can choose to neglect the correlations altogether and try to estimate *p*_{Ed} and *p*_{nEd} by purely binomial arguments in the following way. First, we deal with . The *k*_{1} edges emerging from the degree *k*_{1} node connects (in this approximation) to nodes of degree *k*_{2} with probability (remember there are *N*−1 nodes when the duplication takes place), hence, we use(3.13)When we treat in the same approximation, we obtain since we now pick *q* nodes among the nodes not connected to the degree *k*_{1} node under consideration. It appears to be better to treat the correlations by a somewhat different argument that focuses on the edge dynamics. This approach leads to better numerical convergence towards the results obtained by direct simulation (figures 1 and 2). We use the following urn argument. We place edges in an urn. The edges are of two types. Type edges correspond to the edges connecting nodes of degree *k*_{2}. In addition, we have type edges connecting nodes of degree different from *k*_{2}. The probability that among *k*_{1} randomly picked edges *q* are of type and *k*_{1}−*q* are of type is given by(3.14)Again, we assume . In general, it is not simple to find analytic solutions to this somewhat involved set of equations. The result of iterating the Fokker–Planck equation (3.1) using these estimates is shown in figure 2 for diversity , which makes the numerical iteration manageable. We note good qualitative agreement with the behaviour of simulation results presented in figure 1.

Let us finally mention that direct simulations (Laird & Jensen 2006*a*) of the simple model described in §2 show that in the limit , for fixed , so according to equation (2.2), the degree distribution *n*_{k} behaves like 1/*k*. The Fokker–Planck equation (3.1) confirms this result. In this limit, the Fokker–Planck equation reduces to(3.15)Including only the leading terms from and , one obtains(3.16)In the limit and , this equation has the stationary solution . For detailed numerical study of the 1/*k* behaviour, see Laird & Jensen (2006*a*).

## 4. Summary and discussion

Let us now briefly address the relevance of the simplistic network model discussed above. The inspiration to the model came from a study of emergent networks in the individual-based tangled nature (Christensen *et al*. 2002; Laird & Jensen 2006*b*, 2007). The basics of the tangled nature model (Christensen *et al*. 2002; Laird & Jensen 2006*b*, 2007) are as follows. Individuals are described by type vectors . The number of individuals of type ** T** at time

*t*is denoted by

*n*(

**,**

*T**t*). Different types influence each other through an interaction matrix (

**-matrix) that accounts for all possible interactions between any possible set of types. Once the matrix**

*J***is defined it never changes. The dynamics consists of the configuration of occupied types changing around in the fixed space given by the positions**

*J***and the coupling matrix**

*T***. Selection leads to only a small fraction of types being occupied, and their interactions will be described by a small subset of the elements of this complete matrix. Species are defined as emergent structures in the type space in the following way. At time**

*J**t*the local maxima,

*T*_{max}, of the occupancy

*n*(

**,**

*T**t*) are identified. All occupied types within a distance from a given

*T*_{max}smaller than the correlation length of the matrix

**are considered to be belonging to the species defined by**

*J*

*T*_{max}.

The structure of the interaction network between extant species is found to depend on the statistical properties of ** J**. A proportion,

*θ*, of the elements of the

**-matrix, is assigned non-zero (and non-symmetric) values; all other elements are zero. The interactions assigned in the type space can either be uncorrelated (Hall**

*J**et al*. 2002; Anderson & Jensen 2005) or correlated (Laird & Jensen 2006

*b*, 2007). If no correlations are present in the type space, the evolved networks of interactions between extant species exhibit a binomial degree distribution as does the underlying network of non-zero

**-matrix elements (Anderson & Jensen 2005). The correlated case is more interesting and is relevant to this paper. Correlations are made to decay exponentially with the separation in type space. This implies that offspring will see a set of interactions that are very similar to the interactions of the parent, even when mutations make the offspring differ slightly from the parent. When correlations are present in**

*J***, the evolutionary dynamics is able to generate a network of interactions, between extant species, described by an exponential degree distribution. This is very different from the binomial distributions exhibited by a network constructed by randomly selecting positions in the same type space. This is interesting since Stumpf & Wiuf (2005) have studied the properties of sub-networks obtained by random sampling nodes in a larger network. They showed that the binomial (or Poisson) networks are invariant under decimation. If the large network has a binomial degree distribution, a randomly sampled subset of the network will also exhibit a binomial degree distribution. The results derived by Stumpf and Wiuf are concerned with the statistical properties of sub-networks obtained by**

*J**random*sampling. What we have found is that evolutionary dynamics is able to generate sub-networks typically characterized by the exponential degree distributions, even when the full network has a binomial degree distribution.

The simple node-and-edge model discussed in §2 can be seen as the explanation of how the evolutionary dynamics is able to produce sub-networks with a degree distribution of a functional form totally different from the one describing the entire network, from which they are sampled. A qualitative link between the tangled nature model and the simple node-and-edge model can readily be established. To understand this phenomenology of the tangled nature model, we now neglect the fluctuations present at the level of individual-based dynamics and assume a more coarse-grained viewpoint in which we consider species as either occupied or not. That is, we turn the coarse-grained measure *n*(** T**,

*t*) into a binary equal to 1 when and 0 when . We consider the dynamics at the level of species, which implies that creation events correspond to one species splitting into two species (a speciation event) and annihilation events correspond to a species going extinct. We elevate the dynamics of the individual-based tangled nature model to the level of species and describe this high-level dynamics by the node-and-edge model. The more correlated the interaction matrix

**of the tangled nature model is, the more similar will the offspring species be to the parent, implying that the edge probabilities**

*J**p*

_{e}should be large and

*p*

_{n}small, respectively. If the connectance of the interaction matrix

**is large it is probable that the offspring will end up with a link to the parent species, i.e. the probability**

*J**p*

_{p}should be high. An exact link between the set of probabilities (

*p*

_{e},

*p*

_{n}and

*p*

_{p}) of the node-and-edge model and the parameters defining the tangled nature model is not possible, nor is it needed, since the obtained results are robust for a broad range of control parameters.

The exponential degree distributions found in the tangled nature model and node-and-edge model may also be of relevance to naturally occurring food webs (see Dunne *et al*. 2002) and of interest to protein–protein interaction networks (Mering *et al*. 2002; Ipolatov *et al*. 2005).

## Acknowledgments

I am grateful for very fruitful collaboration with Dr Simon Laird.

## Footnotes

One contribution of 6 to a Special Feature ‘Stochastic networks: from theory to practice’.

- Received November 20, 2007.
- Accepted February 7, 2008.

- © 2008 The Royal Society