## Abstract

A multi-physics formulation for data-driven prognosis (DDP) is developed. Unlike traditional predictive strategies that require controlled offline measurements or ‘training’ for determination of constitutive parameters to derive the transitional statistics, the proposed DDP algorithm relies solely on *in situ* measurements. It uses a deterministic mechanics framework, but the stochastic nature of the solution arises naturally from the underlying assumptions regarding the order of the conservation potential as well as the number of dimensions involved. The proposed DDP scheme is capable of predicting onset of instabilities. Because the need for offline testing (or training) is obviated, it can be easily implemented for systems where such *a priori* testing is difficult or even impossible to conduct. The prognosis capability is demonstrated here via a balloon burst experiment where the instability is predicted using only online visual observations. The DDP scheme never failed to predict the incipient failure, and no false-positives were issued. The DDP algorithm is applicable to other types of datasets. Time horizons of DDP predictions can be adjusted by using memory over different time windows. Thus, a big dataset can be parsed in time to make a range of predictions over varying time horizons.

## 1. Introduction

Predictive or prognostic capability represents a core competency required in many walks of life. Traditionally, analytical or numerical models based on conservation principles are used to make such predictions. These models require satisfaction of three principles: (i) equilibrium (which embodies satisfaction of a conservation principle), and (ii) compatibility and (iii) constitutive relations. In most cases, the constitutive relations depend on prior knowledge of constitutive or material parameters. Most often, a failure criterion and associated critical or threshold value of a parameter (typically a scalar) is also required to predict onset of instabilities. Such estimation protocols require controlled offline testing or ‘training’ for the prediction model. Further, applicability of such a prediction model is limited to situations spanned by the ‘column space’ of the training set.

Numerous applications of practical interest also suffer from two major drawbacks: (i) the explicit expression for conservation principle or utility function being conserved may be unknown, and (ii) the system may not be amenable to perform offline (in particular, destructive) testing. For instance, while trying to predict the onset of instability in a material (e.g. necking in a tension test done on a material), the material properties can be estimated even before the actual experiment is done. The inferred constitutive parameters are then used for prognosis during the actual experiment. But, if the part under consideration is already in service (e.g. in a structure or part of a person's anatomy), it may not be accessible for such offline parameter evaluation. This is also true when a non-materialistic or abstract system (e.g. a global economic system, genetics or a healthcare system) is considered, there is neither any way to know the specific system properties, nor there is any method to assess the explicit nature of the conservation potential. In such cases, there is a need to rely on a predictive algorithm that can function without *a priori* knowledge of these parameters.

This work circumvents these two difficulties by devising a data-driven prognosis (DDP) algorithm. The proposed algorithm requires assumption of the existence of a conservation principle, but its exact form need not be specified *a priori*. Instead, the exact form of the conservation functional is only specified locally, in the neighbourhood of each observation point (and is assumed to be piecewise quadratic in the current work), whereas its global form remains unknown. The conservation principle is then defined as the minimization of system curvature at each of the observation points. Based on such minimization principle, a dimensionless length scale of the underlying phenomenon is estimated at each observation point in each individual dimension. The constitutive parameters of any system (physical or abstract) are contained (with correct combination rule) within the expression for dimensionless length scales. Finally, stability is defined as the ability of the system to minimize its local curvature below a threshold value (determined as inverse of length scale in the present work) at each individual observation point. Thus, the stability characteristic of a system at a given location depends only on the local curvature of the system and the instantaneous dimensionless length scale at that location, obviating the need for explicit estimation of constitutive parameters.

## 2. Literature review

The key to useful prognosis is consistent estimation of future trend based on past experience and current observations defining the current state of the system. Two key questions are: (i) Will the current trend continue over a pre-specified prediction horizon? and (ii) What is the remaining life or time-span over which current activities can be continued? A detailed literature review along these lines may be found in reference [1].

Prognostics and health management capabilities combine sensing and interpretation of data on the environmental-, operational- and performance-related parameters of the product to assess the health of the product and then predict remaining useful life (RUL). The data are often collected in real-time or near-real-time and used in conjunction with prediction models to provide an estimate of its state-of-health or degradation and the projection of remaining life. Traditionally, these prediction models use either a model-based approach or a data-driven approach.

Model-based approaches rely on an understanding of the physical processes and interrelationships among the different components or subsystems of a product [2], including system modelling and physics-of-failure (PoF) modelling approaches.

In system modelling approaches, mathematical functions or mappings, such as differential equations, are used to represent the product. The constitutive parameters or coefficients of the differential operators must be known *a priori* or estimated offline based on a controlled set of stimulations. Statistical estimation techniques based on residuals and parity relations are then used to detect, isolate and predict degradation [2,3]. Model-based prognostic methods have been developed for digital electronics components and systems such as lithium ion batteries [4], microprocessors in avionics [5], global positioning systems [6] and switched mode power supplies [7].

One of the examples of statistical estimation techniques and system modelling that is used for anomaly detection, prediction and diagnosis is the ‘divide-and-conquer’ [2] dynamic modelling paradigm. The system input–output operation space is partitioned into small regions using self-organizing maps and then a statistical model of the system expected behaviour within each region is constructed based on time–frequency distribution. The significant deviations from the trained normal behaviour are recognized as anomalies. Then, ‘diagnosers’ can be constructed for various known faults within each operational region to identify the types of faults. This divide-and-conquer approach leads to a localized decision-making scheme, where anomaly detection and fault diagnosis can be performed locally within each operational region.

Another system modelling prognostic method used for diagnosis for lithium-ion batteries is the Bayesian framework approach [4]. The Bayesian learning framework attempts to explicitly incorporate and propagate uncertainty in battery ageing models. The relevance vector machine (RVM)–particle filter (PF) approach provides a probability density function for the end-of-life of the battery. RVM is a Bayesian form representing a generalized linear model of identical functional form of the support vector machine (SVM). Both the above-mentioned methods use some sort of supervised learning.

PoF-based prognostic methods use knowledge of a product's lifecycle loading conditions, geometry, material properties and failure mechanisms to estimate its RUL [8–11]. PoF methodology is based on the identification of potential failure mechanisms and failure sites of a product. A failure mechanism is described by the relationship between the *in situ* monitored stresses and variability at potential failure sites. PoF-based prognostics permit the assessment and prediction of a product's reliability under its actual application conditions. It integrates *in situ* monitored data from sensor systems with models that enable identification of the deviation or degradation of a product from an expected normal condition and the prediction of the future state of reliability. Such methodology requires establishment of ‘benchmark’ parameters or normal behaviour of the product. This requires controlled offline testing or controlled ‘training’ regimen.

PoF-based approach has been applied to analyse the health of printed circuit boards (PCB) to vibration loading in terms of bending curvature [8]. First, the components which are most likely to fail and their locations are identified at certain vibration loading levels. Sensors are placed at those areas to monitor the PCB response. Then, a database is built that reflects the relation between the PCB and its critical components. Similar approaches have also been applied for surface mount assemblies to extract the state of damage at an instant and predict the RUL based on accumulated damage [12].

The data-driven approach uses statistical pattern recognition and machine learning to detect changes in parameter data, isolate faults and estimate the RUL of a product [13–16]. Data-driven methods do not require product-specific knowledge of such things as material properties, constructions and failure mechanisms. In data-driven approaches, *in situ* monitoring of environmental and operational parameters of the product is carried out, and the complex relationships and trends available in the data can be captured without the need for specific failure models. There are many data-driven approaches, such as neural networks, SVMs, decision tree classifiers, principal component analysis, PF and fuzzy logic [13]. However, such techniques also require a definition of normal operation, which is typically based on a training set or previously observed circumstances. Thus, current state-of-the-art predictions in data-driven approaches extend only to circumstances that can be spanned by the ‘column space’ of the training conditions. This severely limits their usefulness, particularly in unforeseen circumstances. Alternatively, developing a data-driven scheme for a specific purpose requires very carefully articulated training regimen (capturing the purpose) for the prognostic system.

Neural networks are examples of traditional data-driven systems, where the data processing is carried out at a number of interconnected processing elements called neurons. The neurons are usually organized in a sequence of layers, an input layer, a set of intermediate layers and an output layer. During training, the network weights are adjusted depending on the type of learning process. A set of feedforward back propagation networks are used that undergo supervised learning to identify the current operating time of an operating bearing. Two classes of neural network models, single-bearing models and clustered-bearing models have been developed [17]. Both classes of models use degradation information associated with the defective phase of bearing degradation. In first class, a single bearing is used to train a single back propagation neural network. In the second class, the bearings are classified in groups (clusters) based on similarity in their failure and defect times. Each net is then trained using degradation information associated with bearings in the cluster [17].

Thus, current state-of-the-art in both system modelling and data-driven approaches is limited by the prior training requirement. Controlled offline or *a priori* testing is needed for accurate constitutive parameter estimation. Moreover, the applicability of a prognostic system is strictly limited to the vector space spanned by the basis vectors of its training regimen.

By contrast, the proposed DDP approach estimates the relevant constitutive parameters *in situ*. Instead of carefully designed ‘training’ regimen, it can use any two time sequences of data. However, it focuses only on the situation at hand, rather than trying to master any general situation. Accordingly, only a relevant combination of material and geometric parameters is extracted as an instantaneous dimensionless length scale in each dimension, and used in real time. Such a DDP scheme is then capable of predicting the probability of the onset of instability over a prediction horizon. Alternatively, the prediction horizon needed for the probability of instability to exceed a threshold value can be considered the RUL.

## 3. Determination of length scale

Let us consider an observable body or a phenomenon containing a finite number of observation points. At each point, information is collected at multiple dimensions (that collectively satisfy work conjugacy requirements), and at discrete instants of time. Let us now consider two specific observation points A and B. At both these points, each dimension is denoted by *i*=1,*n*. The value recorded at these two points in some particular dimension *i*=*d* can be denoted as

In order to develop a model describing such a phenomenon, it is first assumed that the system under observation is conservative. As a first attempt, it is also assumed that a piecewise second-order potential is sufficient to describe the pairwise interactions in the system. Thus, the general potential function is assumed to be quadratic in the neighbourhood of each observation points. However, the nature of the quadratic potential function (coefficients of the second-order polynomial) can vary from one point to the other, representing a higher-order global relationship. The approximations and intrinsic uncertainty introduced in the model predictions owing to this approximation will be examined later.

Next, it is attempted to satisfy the three canonical requirements: (i) compatibility, (ii) equilibrium, and (iii) constitutive relation. It is further acknowledged that objectivity or frame invariance (with respect to the observer) is a requirement for describing the behaviour of such a system. Objectivity implies that the state of the observed system remains invariant with respect to different observers or variations in the observation procedure. In the present development, compatibility is enforced indirectly by requiring that the system be objective at every observable scale at each location [18]. Such an indirect approach satisfies compatibility as well as objectivity simultaneously. Using such an approach [18], the conservation of linear momentum in the neighbourhood of point A may be described as
*i* [18]. Such a rank satisfies both compatibility (and objectivity) as well as equilibrium at A. *k* during a time step. The parameter *ρ* is density and *E*_{ijkl} is tangent modulus. The parameter *L*_{l})
*i* and *j*) be interchangeable. Similarly, the interchangeability of (*k* and *l*) is mandated by the symmetry requirements on the definition of strain. The requirement of work conjugacy necessitates symmetry in potential function and this enforces interchangeability between (*i* ; *j*) *pair and* (*k* ; *l*) *pair*. After all such transformations, the above (3.4) can be written as
*L*_{i} is 2^{dim size}, 8 and 16 values of *L*_{i} are obtained when number of total dimensions are 3 and 4, respectively. The number of possible solutions is called the number of roots of the length scale. It is interesting to note that solution for a dimensionless form

A constant

### (a) Energy exchange rate and its manifestation as a curvature

It is assumed that equilibrium in the system is satisfied instantly, whereas satisfaction of compatibility (in an objective framework) only happens with time. Thus, at every instant, the system state adapts in an attempt to satisfy compatibility in addition to equilibrium. As a result, the Borda count at an observation point A changes over the associated length scale (which itself can also change as a result of this adaptation). The local curvature reflects this change, which is manifested as a local energy exchange (absorption or release) rate. We assume that loss of stability occurs locally when the curvature at a point A exceeds a threshold value (denoted by inverse of dimensionless length scale). It is further assumed that global instability occurs when such unstable points can form a ‘chain’ or an energy exchange pathway, and it meets two additional conditions: (i) the chain length exceeds a critical threshold, and (ii) the energy exchange rate along such a pathway exceeds a critical threshold [24].

## 4. Algorithms

The algorithms needed for evaluation of length scales, curvatures as well as instability prediction criteria are discussed in this section [24].

### (a) Length-scale estimation

This calculation follows a generic approach. This approach is followed, so that the algorithm can be applied for any system in which the actual Poisson's ratio of the system (or mode mixture between dilatational and distortional modes) is not known *a priori*. It is already known that Poisson's ratio for pure volumetric deformation or dilatation (*ν*_{v})=−1 and Poisson's ratio for only shape change (at constant volume) or pure shear (*ν*_{s})=0.5. Using these two prior known values, we calculate the proportion of volumetric deformation and shear deformation in the phenomenon under observation by requiring that the combination minimizes the resulting curvature.

### (b) Nature of roots

At each point, there are a number of dimensions. So, for one point, there are 2^{dim size}×dimsize number of roots of ^{dimsize} number of roots. Thus, only a stochastic estimate of the length scale at an observation point is possible. This introduces the stochastic characteristics of the resulting prognosis. In this work, four dimensions (three spatial dimensions and greyscale colour) are used. Thus, we obtain 16 roots at each observation point at every time step. This multiplicity in the number of roots gives rise to the stochastic nature of the prognosis protocol, and a probability for instability (over the specified time horizon) can be estimated simply by knowing what fraction of the roots are indicating incipient instability.

### (c) Curvature calculation

This calculation also follows a generic approach. It is assumed that, in any combination, equilibrium is satisfied instantly, whereas satisfaction of compatibility in addition to equilibrium takes longer time. Thus, an original gradient of *R* is the objective rank [17] and *κ*_{i} is calculated as *i* and a specific root of length scale, and applied to all points.

### (d) Post processing: categorization of observation points at an instant

Following the curvature and associated length-scale calculations, a path dependency index (PDI) is calculated at each observation point, and the points are categorized according to the PDI. There are eight possible categories that characterize the nature of instabilities that can arise at each point. Assignment of the first seven categories are based on PDI, and will be explained in this section. The eighth and ninth categories are assigned after further processing. It is based on the global transcendation index (GTI) which is explained in §4e. These categories are summarized in table 1.

### (e) Chain length calculation

The chain length determination is done at every point for each dimension and each root [24]. This is done to check how long does a defect continue in either directions from one point to another point in order of their ranks. Proximity is defined by difference in rank. The curvature combination rules of Saari [25] are used, and it is assumed that sufficient time is allowed to reach steady state after each combination. In addition to material systems (e.g. balloon burst phenomenon), this facilitates chain length computation in abstract systems (e.g. economic systems) without specific proximity definitions. These chains constitute energy exchange pathways in the system, and to be on an energy exchange pathway, a point must have a PDI≥5. A chain length that reaches critical threshold is assumed to transcend to the next aggregated scale in hierarchy, and is ‘promoted’ to show up as a ‘local’ instability at the next level of aggregation.

### (f) Global transcendation index

After the system attains path dependency (at a local level), it may progress towards instability and failure if two additional criteria are also met: (i) the locally path-dependent points link in forming a chain, whose length exceeds a threshold value, and (ii) the aggregated or ‘residual’ curvature' of the entire system exceeds a critical threshold. The GTI turns non-zero and positive if both of these conditions are met. A category of 8 is assigned for an observation point if PDI>5 and GTI>0 in one dimension, and a category of 9 is assigned if both of these conditions are met in multiple dimensions.

This section describes the ‘zoom-out’ or aggregation procedure for calculating the critical chain length and the residual (or aggregated) curvature for the system at a time instant. The residual curvature provides a measure of the energy exchange rate of the system as a whole with its environment. Further details may be found in [24].

The ‘zoom-out’ procedure uses the fact that for a truly conservative and isolated system, the observed curvature should be zero when the ‘stand-off’ distance of the observer is both zero and infinity. The *x*-axis in figure 1 represents such stand-off distance or ‘zoom-out’ level, whereas *y*-axis represents the system curvatures estimated under such situations. A log-scale in *x* is used for convenience. It is assumed that at *x*-value of zero, the system curvature drops to zero. However, owing to resolution limitations of the imaging system, it was not possible to image at that level. Thus, the level of aggregation of the most detailed acquired image was arbitrarily assigned a value of unity, and only ‘zoom-out’ was carried out with increasing levels of aggregation. The residual curvature value when further zoom-out could not be carried out is assumed to be an *x*-value of infinity. However, in practice, it represented a level when the entire video frame was aggregated to 9 (3×3) points. The curvature value at this aggregated extremum is called the ‘residual curvature’ of the system. A non-zero value of residual curvature represents a measure of the energy exchange rate through the system. Owing to the nature of our calculation, only a magnitude of the residual curvature is obtained. It is both positive and negative, simultaneously. This is due to the conservation assumption in our analysis, that requires any energy absorbed (by the system as a whole) to be released within the time step under consideration, and vice versa.

Because, we do not have any physical data more detailed than the captured pixel level in our image, we arbitrarily assume that the plot is symmetric to the left and right of *x*=1 and drops to zero at *x*=0.

We calculate *κ*,

The aggregation level represents the number of points that were considered together as a unit for the specific calculation. From figure 1, it is attempted to extrapolate the kappa graph backwards for lower values of *x* extending to zero by using the assumed symmetry condition. After this, the *y*-axis) of the calculated kappa line. The existence of a process zone associated with local instability is assumed. It is further assumed (figure 1) that the process zone size instantly jumps to point B if it reaches point A, provided the *y*-value (or curvature) at B is lower than A. The log(*x*) value at intersection point A is converted to fractional number of points relevant to the aggregated frame under consideration, and is scaled by the actual frame under consideration to reach an estimate for the critical chain length. The intersection of

## 5. Balloon burst experiment

### (a) Data collection

The balloon bust experiment consisted of mechanically blowing a balloon until it ‘popped’ or failed by bursting. The XYZM files that provided length, breadth, depth and colour information of the balloon as a three-dimensional objects are collected as a sequence of greyscale video frames over the entire duration of the experiment.

### (b) Experimental set-up

The three-dimensional balloon video (figure 2 shows sample video frames) is taken as a sequence of high-speed real-time three-dimensional-shape measurements based on rapid phase-shifting technique [26]. The experimental system takes full advantage of the single-chip digital light processing (DLP) technology for rapid switching of three coded fringe patterns. A colour fringe pattern with its red, green and blue channels coded with three different patterns is created by a personal computer. When this pattern is sent to a single-chip DLP projector, the projector projects the three-colour channels in sequence repeatedly and rapidly. To eliminate the effect of colour, the colour filters on the colour wheel of the projector are removed. As a result, the projected fringe patterns are all in greyscale. A properly synchronized high-speed black-and-white (B/W) CCD camera is used to capture the images of each colour channel from which three-dimensional information of the object surface is retrieved. A colour CCD camera, which is synchronized with the projector and aligned with the B/W camera, is also used to take two-dimensional colour pictures of the object at a frame rate of 26.7 frames s^{−1} for texture mapping. Along with this system, a fast three-dimensional reconstruction algorithm and parallel processing software [26] is also used to realize high-resolution, real-time three-dimensional shape measurement at a frame rate of up to 40 frames s^{−1} and a resolution of 532×500 points per frame. The XYZM files are just the XYZ points that get triangulated (one for every pixel of the two-dimensional image), and then the BMP file is the texture which is simply the three captured images averaged together [26].

### (c) Collected data files

The input files are in ‘xyzm’ format. Each of the observation point represents a pixel of the video captured of the balloon being blown. Each observation point has four dimensions or four types of information (*x*, *y*, *z* and mask information). From these four types of information, the *x*-coordinate, *y*-coordinate, *z*-coordinate and colour information for each of the observation point is obtained. These four kinds of information form the four dimensions that the prediction algorithm uses. These four dimensions together form a conservative system. The value of the colour information decreases as air is pumped into the balloon while the value of the *x*, *y*, *z* information increases. Because, it is a video file, the collected data can grow very quickly in size. A 5 min video (at 40 Hz) will typically result in almost 3200 GB of data for XYZM type of files. The proposed DDP algorithm reduces memory requirements by parsing such data, and using only frames at *nΔt* (*n*=1, 2, 3, etc.) intervals. Thus, past memory over only *nΔt* is needed at any time instant. The proposed DDP algorithm can function very well with *n*=1, whereas higher values of n extends the prediction horizon. Of course, the implicit assumption in the prognosis scheme remains that the dimensionless length scales calculated over previous *nΔt* remains valid over future *nΔt* steps. This limits the upper bound of *n* or the prediction horizon that can be used at any time.

## 6. Model-based prognosis and verification

### (a) Path dependency index

Because the data size for the video file grows very rapidly, the first objective of our analysis is to make a prognosis using current observation, and hold a minimal number of frames in memory. As a first attempt, only two frames (current frame and its immediate predecessor) are used at each time instant.

First, the local Borda count and objective rank are calculated at each of the observation point, at each instant of time. Using two consecutive time instances, the dimensionless length scales are calculated at each observation point. This allows calculation of the local curvature at each observation point. The local curvature at a time step is compared with the threshold value or critical curvature (denoted by inverse of length scale), and a PDI is calculated (table 1). A PDI greater than or equal to 5 indicates a local instability with potential to transcend to global scale.

Next, we follow an objective aggregation procedure consistent with the assumption of piecewise and pairwise second-order potential [24]. This is called the ‘zoom-out’ procedure that identifies the critical ‘chain length’ or length of the energy exchange pathway needed for local instabilities to transcend to global scales.

However, the existence of a chain length greater than the minimum or critical length only constitutes a necessary condition for such transcendence. The energy exchange rate through such a pathway must also exceed a threshold value to meet the sufficiency condition for transcendence of local instabilities to a global scale. The critical energy release rate is a constitutive property that can be normally determined accurately only with careful offline testing. Instead of such offline testing, we use the fact that whenever local instabilities transition to global scales, almost all of the energy stored along such a pathway gets released (transformed to another form). Thus, the energy exchange rate rises and then falls very rapidly to near zero during such transcendent phenomena. We use this rapid change in energy release rate in our prognosis scheme, and a trigger is initiated whenever the dimensionless energy release rate drops by more than 80% within a single time step. The choice of 80% is arbitrary in this case, based on inherent noise floor in our experimental and computational procedures.

Together, the existence of: (i) greater than critical chain length, and (ii) greater than 80% drop in energy release rate over a single time step, constitute a positive reading for the GTI, and GTI is set to be greater than 0.

We continue the calculation of PDI and GTI for every time step of acquired data to prognosticate about the balloon burst phenomenon. When PDI>5 and GTI>0, it indicates the imminent bursting of the balloon. The model-based prognosis results are finally compared with the actual time of balloon burst that are experimentally observed.

Progression of PDI for the colour dimension across all times is shown in figure 3. It shows the percentage of points that have reached different PDI markers as the balloon gets blown bigger with progression of time. The different PDI categories are explained in table 1. Figure 4 shows a plot that signifies the instants when the system shows local instabilities (PDI≥5), without regard to the chain length information. For a point to be counted as path-dependent, at least 8 of the 16 roots must show a PDI≥5. If at least one point is path-dependent, the system is considered path-dependent. So for the balloon, path dependency started at time index 4, and the balloon is also path-dependent at time indices 13, 16, 22.

Figure 5 shows the number of roots (of 16) in colour dimension that had PDI≥5, and also crossed the critical chain length threshold (27 for the balloon experiment in colour dimension [24]). From figure 5, the balloon has crossed the critical chain length at time indices 20, 21, 23, 24 and 25.

### (b) Residual curvature

Figure 4 explains when the system is path-dependent, whereas figure 5 explains when there is a possibility for the local instability to transcend to global scales by forming energy exchange channels longer than the critical threshold. Figure 6 shows residual curvature for the system at different time indices. Residual curvature provides a dimensionless aggregated measure of the magnitude of the energy exchange rate for the entire system, and a local instability can transcend to global scale if the residual curvature also exceeds a critical threshold. However, such threshold can only be measured via carefully controlled offline testing. Here, we use another fact that the stored energy gets rapidly depleted during any such global transcendation phenomenon. Hence, to detect imminent balloon burst, we monitor the rate of change, particularly a rapid drop from an elevated value for the residual curvature. A trigger is initiated whenever the residual curvature drops by more than 80% within a single time step. Figure 6 shows the percentage of total number of roots in all four dimensions of the balloon (of 64 roots) that had a drop of residual curvature greater than 80%. The instants of time where the number of roots reached a value greater than 20% (approx. 13 roots out of 64) are said to be the time instants where the balloon is dissipating large amount of energy. The time indices where this happens are 2, 17 and 22.

### (c) Composite failure prediction

The final failure prediction takes all of the above analysis into account. For a system-level failure to occur (figure 7), first, the system needs to enter the path-dependent stage. A system can be stable even if it is path-dependent. While being path-dependent, the system can nucleate ‘dislocations’ at multiple points. The points which are only contiguous to each other in rank can actually form a chain. If the chain formed in such a manner exceeds the short-term or long-term critical chain length of the system then the dislocations can join and form a line defect which may ultimately give rise to failure. In addition to this, the system needs to have greater than critical energy exchange rate. This activity is monitored via a large and rapid drop in residual curvature. A drop of greater than 80% is used as a trigger in our analysis. Once both these take place (GTI>0) after the system has already entered the path dependency mode (PDI>5), failure can be predicted for that system. In the case of the balloon, the system was under constant monitoring when it entered the path dependency stage at time index 4. After time index 4, it had triggered long-term chain formations at time indices 20, 21, 23, 24 and 25. It had also triggered residual curvature drops at time indices 17 and 22 after time index 4. So, the time index by which all of the phenomena had taken place is 20. Hence, the balloon was supposed to be approaching failure at that time index and the blow-up would be expected soon. Time index 20 corresponds to a frame number of 2099. In reality, the balloon had burst at frame 2382. Hence, the DDP scheme predicts the failure (1−(2099/2382))×100=11.88% ahead of actual time.

### (d) Results of other datasets for balloon verification

Similar analyses were also carried out using: (i) randomly selected different viewing windows on the same balloon, as well as (ii) different balloons. Tables 2 and 3 record the video frame indices pertaining to significant transitions in these additional datasets.

## 7. Discussion and conclusion

A DDP algorithm suitable for handling multi-scale and multi-physics problems is developed here. The DDP algorithm is verified against balloon burst experiments using *in situ* greyscale video data only, and no offline testing. The DDP prognosticator never failed to predict an incipient instability. Thus, the failure predictions were conservative and always contained a safety margin.

Table 2 shows results for three datasets collected on a single balloon at different arbitrary locations. The DDP algorithm predicted occurrence of balloon burst approximately 5% ahead of actual occurrence observed experimentally. The worst case was a prediction about 12% ahead, and best-case prediction was about 2% ahead.

Additional tests were performed with completely different sets of balloons (different brand name and size). While there is still no false-positive, the DDP algorithm predicted failure much earlier for this additional set. The worst prediction is 25.5% ahead of actual failure, and the best prediction is 2.3% ahead. It has been observed that noise levels during these additional experiments used to collect data (table 3) were significantly higher in the laboratory. Moreover, it is believed that the polymers of the additional balloons were different, and could be toughening during the test as a result of the imposed deformation. In our DDP protocol, the critical chain length from the zoom-out calculation was only estimated once near the beginning of the test. This was a direct consequence of attempts to reduce computational burden, because the zoom-out calculation was computationally intensive. Moreover, the critical chain length estimation procedure needed human intervention. This might have resulted in a smaller estimate of critical chain length compared with the toughened state of the underlying balloon polymer, and contributed to premature failure predictions. We believe a protocol using re-calculation of critical chain length from the zoom-out (at every time step or least a periodic update) would have been better, and will be attempted in future.

A further approximation occurs in the zoom-out procedure owing to the assumption that curvature is highest at the observed scale. This need not be strictly true, but can be rectified only by collecting data at several scales, and progressing with increased level of details (or zooming in) until a maxima in curvature occurs.

The reliance of the DDP prognosis on only short-term memory (at least two data frames need to be held in memory at an instant) significantly reduces the requirement that long data sequences be held to infer transition statistics of the system. However, big datasets in spatial dimensions must still be handled. The proposed algorithm attempts to conquer such big datasets by organizing them in a hierarchy of scales. However, data can only be aggregated. This implies that the prognosticator can traverse only from local (or detailed) scale to aggregated or global scale, but not vice versa.

The proposed DDP algorithm obviates the need for *a priori* offline testing or ‘training’ to extract the transitional statistics of the system. Instead of attempting to develop a capability for predicting system response under general conditions, it uses only online available data and short-term memory (minimum two time frames are needed) to develop a prognosticator specializing in the ‘current’ situation or ‘problem at hand’, and makes a prediction regarding incipient instability over a relatively short prediction horizon. Greater number of time frames may be used to stretch the prediction horizon. For RUL estimates, the best type of data for the proposed algorithm are those collected over short-time windows separated longitudinally over a much larger time span. The separation time for the windows can be varied compared with the span of the data collection window to facilitate prognosis over varied prediction horizons. However, such a scheme requires capability for both short-term and long-term memory.

Current implementation of the proposed DDP algorithm is much slower than real time owing to limitations of computational capability—both in terms of processor speed as well as number of processors deployed in parallel. The need for human intervention in critical chain length estimation also represents an obstacle against achieving real-time implementation. Estimation of a theoretical error bound also remains an unfinished work at present. Work is in progress to improve both those aspects to approach real-time implementation of the proposed DDP prognosticator.

The DDP algorithm developed here is also applicable to any datasets representing a conservative system. A companion paper applies similar framework for prognosis of osteoarthritis after reconstructive surgery following ACL injury [27].

## Data accessibility

The data used in this work are accessible at: https://iastate.box.com/chandra-kar.

## Authors contributions

A.C. developed the DDP theory and the DDP algorithm used. O.K. helped with algorithm development, and was responsible for creating the computer code based on the DDP algorithm. O.K. also conducted the balloon burst experiments.

## Funding statement

This work is supported by the United States National Science Foundation through grant numbers: CMMI 0900093 and CMMI 1100066. The authors gratefully acknowledge this support. Any opinions, conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of the sponsoring agencies.

## Competing interests

The authors have no competing interests in relation to this work.

## Acknowledgements

The authors are thankful to Prof. Song Zhang of Purdue University and Mr Nik Karpinsky for their help with three-dimensional video imaging, and to Mr Kuan-Chuen Wu for help with data processing.

## Footnotes

A related article can be viewed at http://dx.doi.org/10.1098/rspa.2014.0526.

- Received July 10, 2014.
- Accepted February 26, 2015.

- © 2015 The Author(s) Published by the Royal Society. All rights reserved.