## Abstract

Complex systems techniques are used to analyse X-ray micro-CT measurements of grain kinematics in Hostun sand under triaxial compression. Network nodes with the *least* mean shortest path length to all other nodes, or highest relative closeness centrality, reside in the region where the persistent shear band ultimately develops. This trend, whereby a group of grains distinguishes themselves from the rest in the sample, remarkably manifests from the onset of loading. The shear band's boundaries and thickness, evident from the network communities' borders and essentially constant mean size, provide corroborating evidence of early detection of strain localization. Our findings raise the possibility that the formation and the location of the persistent shear band may be decided in the nascent stages of loading, well before peak shear stress. Grain-scale digital image correlation strain measurements and statistical tests confirm the results are robust. Moreover, the trends are unambiguously reproduced in a discrete element simulation of plane strain compression.

## 1. Introduction

The development of a robust theory for granular materials (e.g. soil, powders, grains, etc.) is arguably one of the most formidable challenges in mechanics. Terzaghi [1] reflected on the long history of this problem in reference to soil mechanics: ‘(Coulomb) purposely ignored the fact that sand consists of individual grains. Coulomb's idea proved very useful as a working hypothesis, but it developed into an obstacle against further progress as soon as its hypothetical character came to be forgotten by Coulomb's successors. […] The way out of the difficulty lies in dropping the old fundamental principles and starting again from the elementary fact that sand consists of individual grains’. In some respects, Terzaghi's words did not go unheeded. The consideration of the behaviour of geomaterials from the most basic units, i.e. the grains, has in fact been the *raison d'être* of micromechanics of granular materials. This discipline owes much of its success to the discrete element method (DEM), introduced in the 1970s by Cundall & Strack [2]. DEM's dominance in micromechanical analysis remains unsurpassed, with all but a few datasets at the particle scale arising from DEM simulations [3–9]. As aptly enunciated by Sibille & Froiio [10]: ‘This has led to the paradox of micromechanics of granular materials as a science based almost entirely on “virtual evidence”’.

Recent developments from laboratory experiments, however, herald a new era in micromechanics of granular materials. Measurements of contact forces, contacts and grain kinematics in two-dimensional idealized assemblies of photoelastic discs have been achieved and analysed [11–14]. The past few years have also witnessed advances in the measurement of basic properties for natural geomaterials at ever-increasing spatial resolution: existing datasets include information on kinematics at the grain-scale [15–20]. With the measurement of individual grain properties now within experimental reach, researchers are facing an unprecedented opportunity to integrate or complement these measurements with data from DEM simulations [12,21] to probe the rheology of geomaterials, just as Terzaghi envisioned—from observations of behaviour of individual grains.

Experimental access to information at the grain-scale allows us to answer existing questions as well as to discover new mechanisms operating across the spatial scales, from the grain to the bulk. Of course, this new capability throws down the gauntlet to mathematics and statistics to mine these copious data (kinematics of tens of thousands of grains in different stages of a loading test) for new and useful insights, so that granular behaviour can be better understood, predicted and ultimately controlled. Our objective is to revisit the phenomenon of strain localization (shear banding) using a complex systems analysis of kinematical data for Hostun sand from X-ray micro-CT measurements and for an analogue from DEM simulations. We employ concepts and techniques from complex networks inspired in part by progress in the study of structure and function of brain networks from neuronal data, obtained using magnetic resonance imaging (MRI) [22]. The central question we seek to answer in this study is: can measures of complex networks, suitably constructed from individual grain kinematics, detect strain localization? If so, do such network properties provide early detection of failure and offer new and useful insights on material behaviour?

This paper is organized as follows. Our proposed three-phase strategy for material characterization is presented in §2. In §3, we discuss phase 1 for the two systems under study: the triaxial compression test on Hostun sand and a kinematical analogue from a DEM simulation of plane strain compression of a polydisperse assembly of spherical particles. Section 4 presents the method for constructing the functional complex networks from the kinematical data (the initial part of phase 2). To extract as much physical insight from the network analysis and, to the extent possible, render transparent the physical meaning of the network properties, we combine in §5: the network analysis in the abstract state space forged around the question posed above (the latter part of phase 2); and the mapping of network properties from state space to the physical domain, along with the battery of statistical and randomization tests to establish whether or not the findings are robust (phase 3). We conclude in §6.

## 2. Strategy for material characterization

Our strategy for material characterization is to analyse material behaviour in two distinct spaces: the physical and the abstract ‘state space’ of graphs or complex networks. Analysis proceeds in three phases. Phase 1 is undertaken in the physical space: we use X-ray micro-tomography to establish the position and kinematics (displacements and rotations) of individual grains in a deforming Hostun sand sample, over many stages of a triaxial compression test. Phase 2 is undertaken in the abstract state space: we map the kinematical data from phase 1 to a state space where the deforming sample is represented by an evolving complex network, before examining the evolution of key properties of this network. In this study, the network is constructed solely from individual grain kinematics (i.e. grain positions do not enter the analysis in state space). Finally, in phase 3, we map the information gained from the state space into the physical domain of the sample, using grain positions, to extract spatial–temporal information on material rheology. This study focuses mainly on phases 2 and 3; phase 1 is reported elsewhere but is summarized here for completeness. Several key elements of phases 2 and 3 warrant attention. Phase 2 has been deliberately designed to enable an additional layer of exploration of the physical system within a pure mathematical framework—without appeal to physical laws or need for the network properties to bear an explicit physical meaning. This abstraction to a state space in phase 2 enables us to: (i) fully exploit key developments in the related disciplines of graph theory and complex networks and (ii) forge a method of network construction as well as judiciously select network properties based on the questions motivating the study. It is not until phase 3, when information from the state space is mapped back to the physical domain, that the interpretation of the physical significance of the results from phase 2 is established. A critical element of phase 3 is the series of tests devised using modern statistics to probe the veracity of the findings from phase 2.

In support of the above characterization of Hostun sand rheology, two more independent studies are performed to establish comprehensively the robustness of the results. Specifically, we test the reproducibility of trends uncovered for Hostun sand from: (i) calculations of continuum strain at grain-scale resolution using digital image correlation (DIC) and (ii) implementation of the proposed method of characterization for a kinematical analogue from DEM simulations.

## 3. Hostun sand and baseline system: current state of knowledge

In brief, we describe here the two datasets for phase 1. The first is from a triaxial compression test on Hostun sand, undertaken with *in situ* X-ray micro-tomography scanning. The second, referred to hereafter as ‘baseline’, comes from a discrete element simulation of a plane strain compression test. This baseline system should not be considered as a model for the real Hostun sample; however, it can be thought of as a kinematical analogue of Hostun in two dimensions, since the samples' response to loading exhibits broadly similar rheological features, including that of a single persistent shear band. Salient aspects of each system's rheology are summarized here for completeness, with further details reserved in the electronic supplementary material. This information on rheology serves as a reference point for phase 3, against which the consistency and novelty of the findings from phase 2 can be objectively judged.

### (a) Kinematics of Hostun sand under triaxial compression

The experimental dataset is from a triaxial compression test on a dense sample of dry Hostun sand, comprising 54 889 grains. The key novelty of this test is that it is performed in an X-ray tomograph. The sample, measuring 11 mm in diameter by 22 mm in height, is prepared by pluviation into a latex membrane stretched in a mould. The cell pressure is kept constant, while the so-called stress deviator (i.e. the difference between axial stress and cell pressure) is applied by advancing a piston which compresses the sample axially, at a constant speed. Further details concerning the material and the experimental technique are given in Andò *et al.* [15]. The response is typical of dense sand as shown in the electronic supplementary material: there is a peak stress deviator at around 5.5 per cent imposed axial strain, followed by a reduction in stress, tending towards a ‘residual’ value at the end of the test. The volumetric response is slightly contractive then dilative throughout, as expected. Note that these global measurements are the only ones typically possible in a standard triaxial test. However, in this test, X-ray tomography scans were carried out at 16 levels of imposed axial strain. This is evident in the drops of axial stress seen in figure 1*a* which correspond to stress relaxations at constant imposed axial strains. For each scan, a three-dimensional image of the sample is reconstructed. Coming from the 16 scans, kinematical data for 12 strain intervals are analysed in the present study: 01–02, 02–04, 04–05, 05–08, 08–09, 09–10, 10–11, 11–12, 12–13, 13–14, 14–15, 15–16. The peak stress deviator coincides with scan 09.

A recently developed technique (called ID-Track) is then employed in tracking the grains in four dimensions, i.e. three-dimensional space and time [15]. Each grain is assigned a unique label (an ID), and measurable grain features (e.g. volume) enable its identification from one scan to the next. This process employs a backward scheme analysis of pairs of consecutive ‘previous-and-current’ three-dimensional images: individual grains are identified in each image based on information established from the previous image. The quality of the results of ID-Track depends critically on the quality of the process of grain separation and labelling (the segmentation process). Errors in segmentation do occur with some of these natural grain shapes, and may prevent the tracking of those grains for the remainder of the test.

Figure 1*c*–*e* shows the displacements and rotations of all the tracked grains in a vertical slice across four strain intervals, from the start to the end of loading history. At the first strain interval of the test, 01–02, the macroscopic strain field is *almost* homogeneous. The very small deviation of the strain field from a perfectly uniform (or affine) field with respect to the macroscopic (sample) scale is due to a slight misalignment of the sample from the vertical. The vertical displacement in the vertical slice (figure 1*d*) in 01–02 increases relatively smoothly from zero, imposed at the top of the sample, to 0.1 mm at the bottom of the sample. This displacement field is ‘inclined’ relative to the direction of loading (evident in the tilted gradation of the colour map) because the sample axis is initially tilted by 1.4^{°}. The key aspect to note is that the change in the displacement field is gradual in the initial stages of loading (01–02, 02–04); as loading proceeds, particularly from around the peak stress ratio, the displacement field portrays to an increasing extent a ‘shear band’ that is some seven to eight grains wide by the last interval 15–16. Using the benefit of hindsight, a retrospective analysis of the displacement field from vertical and cross-sectional slices of the sample, starting from the last interval 15–16 back to 01–02, reveals a *contrasting* trend: relative to the shear band boundaries marked by the lines of distinct step change in gradient in 15–16, the displacement field in 01–02 and 02–04 changes smoothly at a near constant rate in the direction perpendicular to the shear band. This trend is evident for grains: along a cut normal to the shear band (figure 1*c*), throughout the vertical slice (figure 1*d*), and in all cross-sectional slices spanning the band from top to bottom of the sample (see the electronic supplementary material).

Up until the peak, the relatively noisy measurement of rotations appears to be randomly distributed in space. During the intervals 08–09 and 09–10, straddling the peak stress ratio, we observe the beginning of a localization in the rotations: see 08–09 in figure 1*e* (cf. the porosity maps in figure 1*b*). In the stages after the peak, e.g. 15–16 in figure 1*e*, when the band is fully developed, the rotations are manifestly concentrated in the shear band. The rotation and displacement fields present the shear band in the same location, although the band appears to be wider in the rotation field, i.e. 10–12 grains thick (figure 1*e* far right). Since the technique provides the axis of rotation for each grain, the component of rotation in the plane of the cross-sectional slices can and is also computed. From this measure, it appears that the grains inside the band do not show any more or less spatial concentration of rotation. Thus, while grain rotations clearly concentrate in the band, there is no evidence of bias in favour of rotations in the plane of the shear band.

### (b) Kinematics of baseline system under plane strain compression

The baseline model is a discrete element simulation of plane strain biaxial compression under constant confining pressure of an assembly of 5098 spherical particles [8]. A contact moment is introduced to account for the effect of particle shape angularity, e.g. resistance to relative rotations due to particle interlocking [23], thus providing a closer representation of the grain kinematics encountered in Hostun sand. As stated earlier, the baseline system is not to be considered as a model of the Hostun sample but simply a kinematical analogue. Like Hostun, the baseline sample fails via strain localization with a single persistent shear band governing its rheological behaviour in the post-peak stress or large strain regime. Details of this simulation such as the contact models used, particle size, sample preparation and loading protocol are provided in the electronic supplementary material. We have the usual discrete element output data for every one of these particles over 299 simulation time-steps or strain states, spanning all regimes of the loading history: strain-hardening to peak stress ratio at axial strain *ϵ*_{yy}=0.034, then strain-softening to *ϵ*_{yy}=0.04 and, finally, a levelling off (modulo fluctuations) of the stress ratio to a residual stress state.

Prior to the onset of force chain buckling, just before the peak stress ratio, the baseline model deforms in an essentially affine or uniform manner—from the length scale of a small particle cluster (i.e. a particle and its first ring of neighbours) to the macroscopic scale. This is true if we ignore rattlers and microbands which contribute a small amount of global non-affine deformation and dissipation in the strain hardening regime. Apart from these spatio-temporally non-persistent structures, no localization of any material property is evident in the baseline sample, from either the contact or the kinematical data. The *essential* inelastic structural rearrangements, *persistent in space and time*, develop only after the first force chain buckling event. These progressively spread but remain concentrated in the band. Once the shear band is fully developed, the sample exhibits a pseudo steady-state behaviour. The temporal evolution of the *global* properties of the baseline system fluctuates about a near constant non-zero value, with contact network properties (e.g. average values of degree, clustering and subgraph centrality) inversely tracking the measures of particle kinematics (e.g. average non-affine deformation) [24,25]. A burst in kinematical activity in the shear band (unjamming) is followed by a relatively quiescient period (jamming). As the system unjams from the collective buckling of force chains, the global kinetic energy, dissipation and average non-affine deformation all rise to a sharp peak, while particle connectivity inside the sample degrades. As the system jams in the ensuing interval, there is a brief and partial recovery in the system connectivity resulting in the growth of new force chains, until the next cycle of unjamming–jamming commences.

## 4. Development of functional complex networks

We now embark on phase 2. Here our task is to construct complex networks representing the deforming Hostun sample in state space, before examining these using appropriate network properties. As stated earlier, the method of network construction and the network properties depend crucially on the questions we are seeking to answer in this study. These questions are forged around patterns of interaction between constituent grains which manifest through the grain kinematics: i.e. collective motion of particle groups which are indicative of the evolution of instabilities and their spatial length scales. Our overall approach is depicted in figure 2. This approach is based on similar concepts adopted in studies of functional brain networks which are constructed from neuronal data obtained using magnetic resonance imaging [22]. The construction of a functional network from such data rests on a statistical analysis of the degree of similarity of pairs of time series.

In the case of Hostun sand and the baseline sample, the temporal (strain) evolution of the grain kinematics of each system is tracked throughout loading history. As previously discussed, the displacement and total rotation of individual constituent grains in Hostun sand are measured at a non-uniform sequence of 13 strain states (or twelve strain intervals). The number of sampled strain states is too few to form a long enough time series for reliable analysis; the same can be said of the baseline system, even though this has many more strain states than Hostun. Hence our strategy here is to construct a complex network at each strain interval, and then analyse not only the properties of each network but also their evolution throughout loading.

Over each strain interval, the position, displacement and rotation of each grain in the physical space are given by ID-track. With this information, analysis in phase 2 proceeds in three steps, as depicted in figure 2. First, we map the measured kinematical data for each grain to a state space: this is a four-(three-) dimensional space comprising scalar components of the displacement vector and one total rotation for the Hostun (baseline) system. Each point in the state space corresponds to a node in the network or a grain in the physical space. Second, we assign a link that connects two nodes in the network if the kinematics of the corresponding grains match each other: i.e. near-coincident points in the state space with respect to Euclidean distance. This second step is crucial, as the connections in the network determine the properties of the network. Here, we connect each node to its *k*-nearest neighbours in state space, following similar concepts proposed in time-series-based networks [26], where *k* cannot be arbitrary. The arbitrary selection of too low a *k*-value may result in a disconnected network with many components (subgraphs), thereby reducing the effectiveness of global scale network information extracted by examining the shortest path lengths. In this case, a pair of nodes from two distinct components cannot be connected through a finite number of links, i.e. the length of the path connecting these nodes is infinite. On the other hand, the arbitrary selection of too high a *k*-value may result in a very densely connected network in which every node is linked to almost all the other nodes in the network. In this case, results based on shortest path lengths are invariably rendered meaningless, and all other network measures are smeared to the extent that only outliers in the data are trivially detected. *Our choice of k is not arbitrary: we solve an optimization problem to find the minimum k value (or minimum number of links for each node) that is needed to form a single connected graph.* This method of selecting a suitable

*k*-value strikes a balance between the two extremes of low and high

*k*and ensures that a finite number of non-repeating links connects any two nodes in the network [27].

In our method of connecting nodes, a different Euclidean distance threshold in the displacement–rotation state-space applies to each network node. An alternative method would be to use the same fixed Euclidean distance threshold for each node and connect all nodes within this distance. As we have shown in Walker *et al.* [27], this leads to a different optimization problem if one wants the resulting network to be a single connected graph; also this network would reveal different aspects of, in response to a different set of questions on, material behaviour. For the 12 complex networks representing Hostun sand behaviour over the 12 strain intervals from the onset of loading to failure, the minimum *k* value ranges from 4 to 7; those for the baseline sample range from 3 to 6. This method of network construction and the criterion for choosing *k* has been recently tested and found to capture the essential global features of the displacement fields from DIC measurements [27]. Before proceeding to the third and final step, it is worth noting that when we consider the extended four-dimensional coordinate space comprising the three scalar components of the displacement vector and one total rotation, each component is normalized by the standard deviation across all measured values for that component. In this way, the ‘dynamic range’ of each coordinate is the same and we achieve a fairer comparison of relative Euclidean distances in the assessment of the similarity of grain kinematics in the second step. This strategy ensures that network connectivity is not biased in favour of a dominant difference in one coordinate: for instance, information contained in the differences of grain displacements may be lost if the total distance in four-dimensional coordinate space is dominated by relative rotations and vice versa.

In the third and final step of phase 2, we examine the defining properties of the functional *k*-networks to reveal organizational structure and associated kinematics within the deforming material, including: the network communities, their length scales from shortest paths and the closeness centrality index of individual grains. Bear in mind that a useful abstract network property may not always lend itself to an obvious physical interpretation. This makes the task of developing satisfactory so-called correspondence rules [28] all the more challenging. Our strategy for material characterization has been deliberately designed to allow an additional exploration of the system in a pure mathematical context, enabling us to fully exploit developments in graph theory and complex networks. That said, to the extent possible, we will relate the physical system and the abstract network to render transparent the physical meaning of the network construction and analysis so as to help future formalisms of meaningful correspondence rules. Abstract representations of the *k*-networks, conceived using individual grain displacement and rotation for the baseline system and Hostun sand system, are shown in figure 3. Connections are all that matter in these graph representations. As such, the shape of these graphs are not normally useful, since there are an infinite number of ways to draw and visualize a graph (collection of dots and connections) on a plane. However, a visually appealing and often quite informative method is to position the nodes using a spring-electrical embedding algorithm [29]. As is evident in figure 3, we find that the network shape using this particular network representation seems to reflect the formation of the persistent shear band and is suggestive of the splitting of the sample into two parts separated by the band. In particular, the ‘neck’ seen in the visualizations of the networks post-peak stress corresponds to grains physically located within the shear band of the sample.

## 5. Rheology from network characterization

We now complete phase 2, before mapping the information from state space to physical space for phase 3. Our characterization of the functional *k*-networks from grain kinematics, as part of phase 2, is centred around the concept of shortest path lengths [30]. Shortest paths play a crucial role in network representations of many phenomena, since information often travels along the shortest and most efficient available paths, notably in communication, transport and organizational systems [31,32]. As there are clearly multiple paths connecting any pair of distinct nodes *i* and *j* in the networks established for the Hostun sand and baseline sample (recall figure 3), the shortest path *d*_{ij} is simply that path with the minimum path length or number of links. Hence, the range of values for *d*_{ij} is , where shortest path length is infinite if nodes *i* and *j* lie in two different disconnected components (or subgraphs) of the global network. In this section, we will motivate and establish measures at multiple resolutions of the abstract networks comprising the mesoscale and the macroscale, before mapping this information back into the physical space in phase 3. Recall the central question we seek to answer in this study concerns the earliest stage in the loading history at which these measures can detect the initiation of the shear band.

### (a) Community structures and length scales

Granular materials exhibit collective dynamics from the interactions among their constituent grains. These interactions are not confined to physical, pair-wise contacts. In the lead up to and during failure, long-range interactions (i.e. collective behaviour) become dominant, leading to instability patterns forming at multiple spatial scales (e.g. microbands, vortices, shear bands, buckling force chains). At the onset of shear banding, just before peak stress ratio, non-contacting as well as contacting particles in the sample begin to move collectively, exhibiting very similar displacements and rotations. This collective motion progressively continues until the shear band is fully developed, at which point the two parts of the sample on either side of the band can be observed to move in an almost rigid-body motion. We are interested in uncovering the details of such organization in grain interactions (i.e. spatial borders and length scales), especially of groups of grains or mesoscopic regions in the sample following a common kinematical rule—not just when the shear band manifests post-peak stress but at *all* stages of loading. This organizational structure may be determined from the so-called ‘network communities’ [30].

The communities of a complex network are groups of nodes with dense intraconnections but relatively sparse interconnections. A method for identifying communities is by optimization of a modularity function [33] for a candidate partition of nodes expressed as
5.1where *e*_{ii} is the percentage of the number of links with both ends (nodes) being in the community, *a*_{i} is the percentage of links that start from a community [29] and the number of distinct communities (*M*) is indexed by *i*. To initialize the process, we start by treating each node as a community. We then successively amalgamate groups in pairs, choosing at each step that pair which leads to the highest increase in modularity. By maximizing this function, one obtains the ‘best’ partition of the network into communities.

Figures 3 and 4 present the communities of the *k*-networks. To understand their meaning, recall that the links connecting pairs of nodes are decided according to how well the kinematics of the associated pairs of grains match. Therefore, a community is a group of grains exhibiting very similar motions in the physical sample—to the extent that a grain in a community moves in a manner more ‘synchronized’ with that of another grain from within its community, compared with that of a grain from outside its community. Because grains belonging to a community exhibit essentially synchronized motion, a community can also be thought of as a group of grains in the sample following a common kinematical or evolution rule. (Our use of the term synchronized is purely in the dictionary sense and should not be confused with the process of synchronization studied in dynamical systems research.)

The communities in state space in figure 3, when mapped back to the physical sample, are shown in figure 4. The numbers of communities for the Hostun sand sample, comprising 54 889 grains or 329 334 degrees of freedom are: 32 and 36 (44 and 30) for the initial 01–02 and final 15–16 strain intervals of the displacement network (displacement and rotation network), respectively. The baseline sample comprises 5098 grains or 15 294 degrees of freedom: at the initial state, the network constructed from the grain displacements has 32 communities while that network constructed from grain displacements and rotations has 26 communities. At the axial strain of *ϵ*_{yy}=0.0388, when the shear band is fully developed, we observed a rise to 49 and 48 communities in the baseline sample, respectively. The increase in communities at *ϵ*_{yy}=0.0388 relative to the initial state is mainly in the shear band and reflect the significant non-affine grain kinematics occurring in this region during an unjamming period (sharp peaks in figure 5; electronic supplementary material). This makes sense as the non-affine measure, i.e. the deviation of a particle's motion to that dictated by the local uniform strain, quantifies the level of heterogeneity in the grain kinematics. Also the most significant unjamming event in the loading history of this sample occurs at the strain state *ϵ*_{yy}=0.0388, where the highest peak in the global mean non-affine curvature and strain is attained. The significant spatial variation in grain kinematics during the unjamming stages in the baseline sample seem less pronounced in the three-dimensional Hostun sample with many more grains, judging from the slightly less numbers of communities developing post peak stress: for the Hostun sample, compare 34–56 communities pre-peak stress to 29–46 communities post-peak stress.

We next explore whether network subdivisions by communities are revealing of the evolution of instabilities in the lead up to and during bulk failure. Such communities, or groups of grains exhibiting common kinematics, give rise to an intermediate or mesoscopic *length scale* of the topological organization of the network based only on its connectivity. To extract this length scale, we consider each of the communities and examine their respective subnetworks involving only the intracommunity links. We then calculate the shortest path length between all pairs of member nodes in each community to establish the *community size* at each strain state: this is defined to be the mean of the distribution of shortest path lengths in a community. The distribution of these community sizes is informative, in particular, of prevailing length scales at each strain state as well as throughout loading history. As shown in figure 5, these distributions are relatively narrow and ranges from one to around 21 links for Hostun (figure 5*a*), for which *k*-networks typically embody links in the hundreds of thousands. Also, while the community sizes 〈*l*_{Hostun}〉 in the *k*-networks constructed using grain displacements (displacements and rotations) ranged from around 6–13 (5–7) links, we find that the temporal or strain evolution of the mean value of these community sizes for the whole sample remains essentially invariant throughout loading, i.e. around 〈*l*_{Hostun}〉=10 (〈*l*_{Hostun}〉=6) links. A similar trend applies to the community sizes of baseline *k*-networks comprising tens of thousands of links. The temporal (strain) evolution of the mean community size maintains an essentially constant value at around 〈*l*_{baseline}〉=6–7 links throughout loading (figure 5*b*,*c*). In general, the *k*-networks which take into account rotation leads to lower community sizes. This almost invariant mean community size, derived from a relatively narrow range of shortest path length distributions in the *k*-networks, is identifying and extracting a *natural* mesoscopic length scale of the grain kinematics.

In general, community length scales are a compelling attribute to examine in complex systems because they arise as a natural outcome as opposed to a prescribed input to the analysis. In the specific cases studied here, the significance of the persistent length scales identified above for the physical domain warrants further scrutiny. We investigated the relation between this length scale and its counterpart in the contact network for the baseline wherein connections represent physical contacts. Therein communities represent true clusters of contiguous grains, and a link is arguably of the order of the mean grain diameter. We found the mean community size of the contact network to be similarly invariant throughout loading and is around 10 links: this is not only of the same order as the community sizes in the *k*-networks, but is also consistent with the thickness of the shear band (i.e. eight times the mean grain diameter). As we do not currently have information on contacts for the Hostun sample, the computation for the mean community size for its contact network will be left for future work. That said, the 6–10 links—observed from the kinematic networks from the very first strain interval 01–02—is certainly consistent with the shear band thickness for Hostun which is around eight grain diameters.

### (b) Closeness centrality

One of the big questions in the study of localized failure in granular materials is the ‘predictability’, not of the occurrence, but of the *location* of the shear band. From the standpoint of material characterization, this question on predictability translates to: is the process of strain localization encoded in the grain kinematics and, if so, can a consummate detector of this process be devised from the kinematical data for Hostun *without need to specify a spatial scale of observation*, and how early in the loading regime can this process be detected? Recall the experimental observations of the Hostun sample summarized briefly in §2: no distinct shear band manifest prior to peak stress ratio. To explore this issue in the abstract state space, we turn our attention to the graph-theoretic distance of a given node to all other nodes in the entire network, and quantitatively characterize how efficiently information travels to and from that node. In many real-world network systems, information travels along the shortest, i.e. most efficient, available paths. Thus, a metric of great practical interest is that which quantifies the *efficiency of information flow through the network*. One such metric is the closeness centrality of a node *i* which is the reciprocal of the sum of shortest path lengths to all other nodes *j* in the network, i.e.
5.2where *d*_{ij} is the shortest path length between node *i* and node *j*. To deal with disconnected networks, it is common to multiply each *C*(*i*) by the number of nodes in the connected component to which node *i* belongs. However, we have dropped this pre-factor, since our method of network construction (i.e. choose the minimum *k* so that a node's connection to its *k*-nearest neighbours in state space leads to a single connected component) ensures that all nodes in the network are reachable by design: all *k*-networks here have one component. Furthermore, we present our closeness centrality findings relative to the maximum *C*(*i*), i.e. we normalize so that the maximum value is one and the pre-factor cancels. Mathematically, equation (5.2) shows that the closeness centrality of a node is a measure of a node's proximity to all the other nodes in the network. The nodes with the highest relative closeness centrality are closest to all the other nodes in the network: hence, speed of information flow from these to the rest of the network is fastest. In a diffusion process, a node that has high closeness centrality is likely to receive information more quickly than others. Also, those nodes with high closeness centrality typically lie in the ‘centre’ of the network and on the boundaries of network communities, acting as nexus or bridging points in a network.

To some extent, we can relate the network measure of closeness centrality in state space and the physical motion of the grains in the Hostun sample. For the *k*-networks constructed here, closeness centrality can be interpreted as a measure of how close or similar a grain's motion is to the motion of *all* the other grains in the sample. In other words, the grains corresponding to those nodes with the highest closeness centrality values are those which move in a manner most similar to or representative of the motions of *all* the other grains in the entire sample. Keep in mind that in state space where the network analysis is undertaken, nodes with the highest closeness centrality values correspond to those which can access *all* other nodes in the network through *least number of links*. Now recall that in the Hostun test, the sample was compressed from the bottom: the X-ray micro-CT data show that the grain displacements in the initial stages of loading increase from an imposed zero value at the top, to some finite value at the bottom of the sample. Following the test's reference frame (done solely for convenience as, of course, this is ultimately irrelevant due to frame invariance), such a displacement distribution would imply that the nodes with the highest closeness centrality are in fact symmetrically located—all along the mid-height of the sample. To visualize this, imagine a single column of grains such that the grain displacements increase gradually from zero for the top grain through to some value for the bottom grain. In state space, this would lead to a linear chain of nodes: those nodes in the middle of the chain will have the highest closeness centrality. A sample deforming under a perfectly uniform or affine strain at the macroscopic scale would therefore exhibit a band of highest closeness centrality grains situated symmetrically and at mid-height of the sample. We show in figures 6*a* and 7*a*,*b* the resulting spatial distributions of closeness centrality when mapped back into the physical domain of the Hostun sand sample. The most striking aspect of these patterns, similar to those observed earlier in the community boundaries in figure 4*a*, is the strong bias or asymmetry from as early as the second strain interval 02–04. Specifically, the location of the nodes (grains) with the highest relative closeness centrality—is situated in the location of the persistent shear band that has yet to develop. The side-on views of the sample in figure 6*a*, selected to match the perspective of the slices in figure 1*d*, highlight the developing deformation and identify the location of those grains whose centres correspond to the nodes with the highest closeness centrality in the *k*-networks. It can be seen that such grains localize in the region of the shear band—*and that such localization is evident at the early stages of loading*. Furthermore, as the horizontal slices show in figure 7*a*, this ‘band of grains’ propagates across the width of the sample from the bottom to the top of the confining membrane. Observe also that while the high relative closeness centrality grains appear to capture strain localization extremely well and follows the band as it traverses the height of the sample, the localization at the earlier strain interval (02–04) is more heavily biased towards the lower part of the sample (figure 7*a*). Indeed, the pattern in the first strain interval (01–02) in figure 7*a* already shows a greater concentration of high relative closeness centrality grains from the lower left towards the upper right corner of the sample. Given that closeness centrality is a measure of information propagation through a network, is it possible then that this metric is identifying the potential sites of *nucleation of failure*? To answer this question for sand, we must wait for further developments in experimental methodology whereby force propagation or proxies thereof within three-dimensional deforming sand samples can be observed so that the evolution of force chains can be tracked, and force chain failure by buckling detected.

As mentioned in §2, reproducibility of results on Hostun was tested not just for baseline but through independent calculations of continuum strain taken at high resolution, i.e. grain-scale (measurement points being around 0.4 mm apart), using DIC [15]. The spatial distributions of the maximum shear strain corresponding to those in figure 6*a* for the four selected strain intervals discussed earlier are shown in figure 6*b*. Strain localization is evident, albeit faint, from as early as 02–04. Interestingly, however, the continuum strain field for 08–09 portrays a region of localization in the lower part of the sample that is more widely spread across the sample width, than the shear band trending upwards to the top part of the confining membrane: this may possibly be a competing shear band emanating from the lower but opposite side of the confining membrane inclined perpendicular to the ‘winning’ band that ultimately develops after peak stress ratio. The cross-sectional (*xy*-plane) views for 08–09 of the maximum shear strain field confirms the absence of a localization to one distinct contiguous zone in these lower regions of the sample (see the electronic supplementary material). These trends are unlike those evident in the corresponding closeness centrality measurements (figure 6*a*), which consistently show that the region of the sample with high closeness centrality is as concentrated as in the preceding strain interval. Closeness centrality seems able to filter out secondary factors (e.g. competing but temporary shear bands) and are manifestly unambiguous in its identification of the shear band that will ultimately prevail in the final stages of loading. These findings provide a direction for future experimental design and application of our network methods. Recent simulations [34] suggest that for samples with a large spatial variation in density, the detection of early strain localization with DIC measures can give misleading results (i.e. the ultimate shear band forms elsewhere). The experimental Hostun sand sample considered here may be regarded as relatively homogeneous, thus the future testing of the generality of our methods and conclusions to more heterogeneous sand mixtures is warranted.

Having confirmed the presence of strain localization in 02–04 for Hostun, through an independent grain-scale strain measure using DIC, we next turned our attention to the closeness centrality of nodes for the baseline system. Figure 8 presents the spatial distributions of the closeness centrality when mapped back into the physical domain of the baseline sample. The location of the nodes (grains) with the highest relative closeness centrality—manifestly concentrates in the region of the persistent shear band—well before the nucleation of the band, as detected from the onset of force chain buckling. As the distributions are similar for both classes of *k*-networks, we only show here results from the class of networks constructed using grain displacement and rotation (a movie of the evolution of these distributions for many more strain states from this class of networks and those from grain displacements only is given as an electronic supplementary material). Again, a strong asymmetry is apparent from the onset of loading. Recall from §3*b*, we observed no such persistent localization in the region of the shear band prior to peak shear stress through any of the previous measures established for the baseline system. Furthermore, in contrast to the kinematical *k*-networks developed here, the contact networks of the baseline system at these strain states (and, for that matter, all stages) of the loading history exhibit a high degree of symmetry—a central core encapsulating a cluster of particles having high relative closeness centrality situated in the middle of the sample. This core is essentially invariant throughout loading history, even though there is a clear reorganization in the communities in the contact network (not shown).

A question that we must now address to complete phase 3 is how ‘real’ are the above results? Is it likely that the detection of the shear band so early in the loading history, through the *k*-networks—in both the Hostun sand and baseline samples—occurred simply by chance? That is, if we constructed surrogate networks by randomly connecting points in state space regardless of matching kinematics, would network properties of these surrogate random networks still exhibit localization when mapped back to the material domain? Randomization tests help to answer such questions.

### (c) Randomization tests

Randomization tests are statistical tests to help determine if the observed experimental data are *inconsistent* with a stated hypothesis. Typically, these tests begin with the generation of new datasets which are both consistent with the observed data *and* the stated hypothesis that the observations are the result of random chance. (We refer to such datasets as surrogates.) A statistic with the capability to distinguish whether data are inconsistent with the stated hypothesis is calculated for *both* the original observed dataset and its surrogates. If the calculated value of the statistic for the observed data is sufficiently different (e.g. to a certain significance level) to the distribution of statistical values calculated for the surrogate data, then one can conclude with a given uncertainty that the observed data is inconsistent with the stated hypothesis. It is vital to remember that failure to demonstrate inconsistency with the stated hypothesis does not imply consistency. Randomization tests and surrogate data methods are only capable of *rejecting* the null hypothesis, i.e. that the observed data did not occur by chance.

An example of the use of surrogate data methods can be found in nonlinear times-series analysis. A suite of randomization tests have been developed in an effort to detect determinism in observed experimental data as opposed to various linear hypotheses alternatives [35]. Surrogate datasets in complex networks can be generated by randomly rewiring the original network while preserving certain structural properties to produce a surrogate network: for example, rewiring to preserve node degree. Specific network quantities (e.g. shortest path length distributions) can be calculated for the original and rewired networks to test various hypotheses about network structure [36]. In this study, we consider randomized surrogate networks which preserve node degree (a local property). Hence, any measure based on shortest path lengths, intended to summarize global connectivity, should be a good test statistic to discriminate the original and surrogate networks. The procedure we followed for random rewiring of a network to preserve node degree is presented in detail in the electronic supplementary material.

To complete phase 3 of our material characterization, we performed the node degree preserving randomization procedure to generate 49 surrogate networks where 10 per cent of the links have been rewired, for each strain state of the loading programme. It is possible that the rewired surrogate networks become disconnected and so measures based directly on shortest path lengths will become infinite valued since, by definition, the shortest path length between two nodes in two different disconnected components of a network is infinity. This calls for a suitable shortest path length measure which remains finite even for networks with multiple components. One such measure is the *harmonic mean* given by the reciprocal of the efficiency, i.e. *h*=1/*E* where,
5.3and *d*_{ij} is the shortest path length between nodes *i* and *j* and *N* is the total number of nodes in the network [31]. Clearly, if for two nodes in two different disconnected parts of the random network, then its contribution to the efficiency is zero. Using the harmonic mean, we compare the *k*-networks of Hostun sand against their randomized surrogate networks, and demonstrate that this shortest path length based measure (and, by implication, all other measures derived from shortest paths) is indicative of a statistically significant, systematic pattern of connections in Hostun sand—that is not merely a product of chance or sampling variability.

For each strain stage of the loading history, we calculate the harmonic mean of each randomly rewired surrogate network to obtain a distribution. Next, we compare the harmonic mean value of the kinematical network of Hostun sand for the given strain stage. If this value is ‘well into the tails’ of the surrogate distribution, we can conclude that the connectivities in the kinematical network of Hostun sand are sufficiently different from those in the random surrogate networks. The results in figure 9*a* show that the null hypothesis is unambiguously rejected at all stages of the loading and that any network quantity which is based on shortest paths, e.g. closeness centrality, which localizes in the material is indeed of significance and indicative of a systematic pattern of connections.

We gathered additional evidence of veracity of these findings by performing the same tests on the baseline system. Again, for all strain states of the loading regime, we find that the value of the harmonic mean calculated for the original kinematic network lies ‘far’ from the range of harmonic mean values obtained for the rewired surrogate networks (figure 9*b*,*c*). Thus, with respect to this node degree-preserving random rewiring method, the patterns of connections in the original kinematical *k*-networks are not random—and the early and precise localization of high relative closeness centrality in *both* Hostun sand and the baseline system is unlikely to be a product of chance.

## 6. Conclusion

New experimental data on triaxial compression of Hostun sand, comprising X-ray micro-CT measurements of displacement and rotation of individual grains over many stages of loading, were analysed using complex networks. The central question we sought to answer was: can measures of complex networks, suitably constructed from individual grain kinematics, detect strain localization and, if so, do such network properties provide early detection of failure and offer new and useful insights on material behaviour? We discovered a remarkable behaviour in the kinematics of Hostun sand. Findings from network measures of closeness centrality suggest that the ultimate fate of initially dense and homogeneous granular materials may be decided and encoded in the grain kinematics in the initial stages of loading, well before peak stress ratio. The shortest path length scales and borders of network communities corroborate this discovery.

The network property of closeness centrality, in particular, the highest relative closeness centrality nodes, when mapped back to the material domain, correspond to the particles in the final persistent shear band. Incredibly, this same network measure also appears to detect the shear band proximity in the initial stages of loading—during which no localization from the observed particle kinematics alone is evident. To eliminate the possibility that closeness centrality was not just picking up a bias inherent in the imperfect initial conditions of the experiment, we repeated the study using kinematical data from a DEM simulation (baseline model). Our results there were the same as for the experimental system. Indeed no other measure from past studies of the baseline model (and of other DEM simulations of different material properties under monotonic compression in two and three dimensions) have detected a persistent localization at the initial stages of loading—from an isotropic state. As a further test of reproducibility of our findings, we examined the development of localization in the Hostun sample from independent calculations of continuum strain taken at grain-scale resolution using DIC. A faint hint of a localization region was detected at the same strain interval that closeness centrality first identified this region. Interestingly, however, the region of localized strain from the DIC measurements became somewhat dispersed just before peak stress ratio. This is unlike the manifestly unequivocal trend in the corresponding closeness centrality measurements, which show that the region of the sample with high closeness centrality is as concentrated as in the preceding increment. Closeness centrality measurements are apparently not ‘distracted’ by possible secondary factors (e.g. competing but temporary shear bands) in the grain kinematics; instead, closeness centrality appears to have the capability of highlighting the pattern of strain localization that will ultimately prevail in initially homogeneous samples.

We also examined the organization and length scales of the pattern of grain interactions in Hostun sand and the baseline system using network communities, i.e. groups of grains in each of which member grains exhibit more similar kinematics than with those from different groups. The impetus here lies in the well-established fact that, in common with other real-world complex systems, the complex behaviour observed in granular materials at the macroscopic level emerges from the multi-scale interactions among its constituent units—the grains. Network communities resolve themselves into sub-groups with an average size or shortest path length of around 6–10 links. This length scale, common to both systems, manifests in the initial stages of loading and remains essentially invariant for the rest of loading history; also, for the baseline system where grain contacts are known, this community size proves to be consistent with the observed thickness of the shear band. Furthermore, the boundaries of some of the communities, when mapped back into the physical domain, align themselves along the shear band—again from the initial stages of loading—well before peak stress ratio.

Soil mechanics has raised the bar on grain-scale measurements. For the first time, tens of thousands of individual grains in three-dimensional compression tests of sand (here, Hostun) can be individually tracked in space and time. This unprecedented development raises questions on data analysis. What is the best approach that maximizes the quality and quantity of information one can extract from such data? How can one ‘mine’ such data to gain hidden information, not otherwise accessible from conventional methods? What metrics can unearth deep insights, illuminating not only what dynamical processes emerged in the deforming sample, but also why? If the latter can be answered, then there is the potential to push the boundaries of data analysis in soil mechanics—beyond the retrospective and into one capable of delivering prospective information (i.e. identify future trends). This study demonstrates that the strategy of abstracting kinematical information of tens of thousands of moving grains into a complex network and then mapping information from the network analysis back to the physical domain shows considerable promise, and appears capable of complementing and expanding on the information obtained from more traditional data analysis, e.g. porosity maps, fabric tensors. In particular, we draw attention to the implications of these findings in two aspects of granular media research. For multi-scale material characterization, this study confirms that many of the kinematical phenomena observed and extracted from two-dimensional DEM simulations exists in reality. For continuum modelling, our findings show that the development of strain localization is clearly not a bifurcation phenomenon, albeit bifurcation theory may still be capable of capturing salient aspects of the fully developed ‘persistent shear localization’ region.

Experimental soil micromechanics has opened the door for research into soil characterization and, in turn, continuum modelling to pay heed to an echo of Terzaghi's reflections on Coulomb. The challenge for data analysis is immense and we have demonstrated here the potential of complex systems theory to contribute to this effort.

## Acknowledgements

A.T. thanks Dr John F. Peters for many insightful discussions on strain localization and dedicates this paper to him. We thank the editor and all our reviewers for constructive comments. A.T. and D.M.W. acknowledge the support of the Australian Research Council (Discovery grant nos DP0986876 and DP120104759), the US Army Research Office (Single Investigator grant no. W911NF1110175) and the Melbourne Energy Institute. G.V. and E.A. acknowledge Steve Hall, Jacques Desrues and Pierre Bésuelle to whom the experimental data on Hostun sand equally belongs. The X-ray scanner in Laboratoire 3SR was principally funded by a project of the French research agency ANR (MicroModEx, contract no. ANR-05-BLAN-0192).

- Received October 15, 2012.
- Accepted January 3, 2013.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.