## Abstract

Assessment of the credibility of a mathematical or numerical model of a complex system must combine three components: (i) the fidelity of the model to test data, e.g. as quantified by a mean-squared error; (ii) the robustness, of model fidelity, to lack of understanding of the underlying processes; and (iii) the prediction-looseness of the model. ‘Prediction-looseness’ is the range of predictions of models that are equivalent in terms of fidelity. The main result of this paper asserts that fidelity, robustness and prediction-looseness are mutually antagonistic. A change in the model that enhances one of these attributes will cause deterioration of another. In particular, increasing the *fidelity* to test data will decrease the *robustness* to imperfect understanding of the process. Likewise, increasing the *robustness* will increase the *predictive looseness*. The conclusion is that focusing only on fidelity-to-data is not a sound decision-making strategy for model building and validation. A better strategy is to explore the trade-offs between robustness-to-uncertainty, fidelity to data and tightness of predictions. Our analysis is based on info-gap models of uncertainty, which can be applied to cases of severe uncertainty and lack of knowledge.

## 1. Introduction

In computational physics and engineering, biological conservation, economics, homeland security and other fields, numerical models are developed to predict the behaviour of a system whose response cannot be measured experimentally and whose behaviour is incompletely understood. A key aspect of science-based predictive modelling is to assess the credibility of predictions. Credibility expresses the extent to which the results of simulation reliably represent the phenomenon of interest with a degree of accuracy consistent with the intended use of the model (Doebling 2002). The literature is rich in examples of the need for credible system modelling (Friswell & Mottershead 1995; Natke & Cempel 1997; Banks & Castillo-Chavez 2003; Bårdsen *et al.* 2005; Burgman 2005).

The study of the credibility of models has a long history. Perhaps, the earliest contribution was Laplace's proof of the central limit theorem and its use to provide a maximum-likelihood motivation for least-square estimation. Hypothesis tests such as the *χ*^{2}-test or the Kolmogorov–Smirnov test provide statistical concepts and tools for assessing model credibility. Tests such as these assume asymptotic datasets, statistical independence of observations, and so on. Bayesian methods of model updating have also been extensively used. Kennedy & O'Hagan (2001) develop a Bayesian approach to model calibration which derives the posterior distribution of estimated model parameters and of subsequent model predictions by assuming that functions of the model error are described by a Gaussian process. The choice of an approach to assess model credibility depends on the analyst's judgement and understanding of the problem. As Kennedy & O'Hagan note regarding their assumption of normality: ‘It is, of course, important that normality … is a reasonable representation of prior knowledge or beliefs' (2001, p. 432).

The main contributions of this paper employ info-gap decision theory, which is particularly suited to situations in which probabilistic information is deficient or lacking. Nonetheless, one element of our analysis can employ probabilistic or statistical tools.

This paper argues that assessment of the credibility of a mathematical or numerical model must combine three components: (i) The fidelity of the model to test data. Fidelity can be quantified in different ways as discussed in §3, for instance, with mean-squared deviation between test data and model predictions, or other methods. (ii) The robustness, of model fidelity, to lack of understanding of the underlying processes. (iii) The prediction-looseness of the model. ‘Prediction-looseness’ refers to the magnitude of the range of predictions expected from a family of models all of which have fidelity no worse than a specified value. Predictive focus is the complement of predictive looseness. The importance of prediction-looseness stems from the fact that, to predict with confidence, there should be little difference (or small looseness) between the predictions of models with the same fidelity. We stress that the credibility of a model depends on all three attributes: fidelity to data, robustness to ignorance and prediction-looseness. No one attribute alone can establish the credibility of a model.

The main results of this paper, theorems 3.1 and 4.1, assert that fidelity, robustness and prediction-looseness are mutually antagonistic. A change in the model which enhances one of these attributes will cause deterioration of another. In particular, increasing the *fidelity* to test data will decrease the *robustness* to imperfect understanding of the process. Fidelity to data adds warrant to the predictions by anchoring them in experience. But without robustness to imperfect understanding of the underlying process, the fidelity is not credible. Likewise, increasing the *robustness* will decrease the *predictive focus*. Robustness is needed in order to add warrant to the fidelity, but lack of predictive focus vitiates the main purpose of the model. A model can be highly warranted by virtue of high fidelity to data and high robustness to epistemic uncertainty, only by increasing its predictive looseness.

It is important to stress that our definition of ‘model’ is not restricted to first-principle partial differential equations. ‘Models’ may be physics-based models, phenomenological equations, back-of-the-envelope calculations, statistical regression models obtained by fitting test data, as well as expert judgement. These are all models in the broad sense that we use these various representations of information and knowledge to make predictions, and they are all based on assumptions of various sorts. Similarly, a ‘family of models’ is not restricted to a model whose coefficients can be changed. The family of models could include all of the above classes as alternatives.

In §2, we introduce our notation. In §3, we define the info-gap robustness function and re-iterate the well-known result that robustness and fidelity are antagonistic. The main contribution of the paper appears in §4, which discusses predictive looseness and its antagonistic relation to robust fidelity. Section 5 presents an heuristic example that illustrates the interaction between model structure and measurement error for the common scientific activity of identifying a mathematical law governing the behaviour of a system. We conclude the paper in §6 with a discussion of some philosophical questions. The scientific activities of validating models and warranting our understanding of the underlying complexity are questioned in light of the trade-offs exposed by the theorems.

## 2. Notation

Our basic notation is:

*y* is an observable real-valued vector which is predicted by a model. Examples include material deflections, natural vibration frequencies, population densities, economic variables, etc. The dimension of *y* is *J*.

*p* is a vector of control parameters that characterize the configuration of the system and which may, in some situations, be chosen by the analyst. Examples include material dimensions and properties, loading forces, temperatures, time, location, habitat, economic regime, etc. Specification of the configuration will be important when we wish to distinguish between the configuration *p* at which a model is updated and the configuration *p*^{′} (which may or may not differ from *p*) at which forecast or prediction is made.

*q* is a vector of parameters which specify the structure and coefficients of the model for predicting *y*. These parameters are not necessarily real-valued coefficients. The elements of *q* can represent discrete or linguistic variables that select a model structure, a functional form expressing the relation between variables, etc.

The vectors *p* and *q* differ from one another. Vector *p* specifies the regime or configuration of a physical system, either when it was measured or a regime in which its behaviour will be predicted, whereas *q* specifies a mathematical model for describing the physical system. The model depends on the configuration of the system, *p*, and the model is specified by *q* which, for instance, may be the values of coefficients.

Our analysis entails two types of mathematical models. The *physical models* and *M*(*p*,*q*) predict values of *y*. The *uncertainty model* represents the epistemic uncertainty in the physical models.

is a physical model which will be updated and used for predicting the value of *y* in configuration *p*. Different realizations of are specified by the analyst's choice of *q*. is a real *J*-vector-valued function. We will sometimes refer to by its specifying vector *q*.

*M*(*p*,*q*) is an alternative possible physical model for predicting the value of *y* in configuration *p*, for instance, a model proposed by a competing theory or a different expert. In the absence of epistemic uncertainty, no alternatives to would be considered. As the horizon of uncertainty increases, more and more alternative models become candidates. Like , *M*(*p*,*q*)∈ℜ^{J}.

is an info-gap model for the uncertainty in *M*(*p*,*q*). The info-gap model is an unbounded family of nested sets of physical models *M*(*p*,*q*), centred on . We will encounter an example of an info-gap model in §5. The info-gap parameter for uncertainty in the physical model is *α*. We will specify generic properties of info-gap models of uncertainty in §3. See Ben-Haim (2006) for further discussion.

*N* measurements of *y*, denoted , have been made in configuration *p*.

## 3. Robustness of the fidelity between model and measurement

Let *M*(*p*,*q*) be any physical model in the info-gap model at horizon of uncertainty *α*. Let represent the error of model *M* with respect to test data . We impose no restrictions of the function other than it be real-valued, though typically it will be based on a vector norm. For instance:
3.1
where ∥⋅∥ is a vector norm. is the weighted least-squared error if ∥⋅∥ is the weighted Euclidean norm:
3.2
where *W* is a real, symmetric, positive definite matrix. Alternatively, one might use the absolute norm:
3.3
where *w*=(*w*_{1},… ,*w*_{J})^{T} is a real vector and *y*_{ij} is the *j*th element of *y*_{i}. In this case, is the mean, weighted, absolute error.

Various statistical tools for assessing observational error and model fidelity can be employed, such as the level of significance of goodness-of-fit, or a Bayesian measure of risk, etc. The analyst chooses the error function , perhaps in order to filter out or otherwise manage observation error. In our analysis, the only constraint is that be real scalar-valued.

‘Robustness’ has many meanings. Berger (1980), for instance, has developed Bayesian concepts of robustness. As we will use it, the concept of robustness derives from a prior concept of non-probabilistic uncertainty. Knight (1921) distinguished between ‘risk’ based on known probability distributions and ‘true uncertainty’ for which probability distributions are not known. Wald (1945) studied the problem of statistical hypothesis testing based on a random sample whose probability distribution is not known, but whose distribution is known to belong to a given class of distribution functions. Similarly, Ben-Tal & Nemirovski (1999) are concerned with uncertain data within a prescribed uncertainty set, without any probabilistic information. Likewise, Hites *et al.* (2006), p. 323 view ‘robustness as an aptitude to resist to ‘approximations’ or ‘zones of ignorance”. We are concerned with robustness against Knightian uncertainty.

is the physical model selected for predicting *y*. However, the model is surely imperfect. It entails approximations, both acknowledged and unknown. is robust to these uncertainties if can err greatly and still reproduce the observed data with acceptable fidelity. Because is imperfect, the analyst cannot realistically aspire to perfect fidelity to the test data. Rather, should *satisfice* the fidelity: achieve an acceptable level of fidelity. Let *r*_{c} denote the greatest acceptable error between model and data. The *robustness* of physical model , at configuration *p*, is the greatest horizon of possible alternative models up to which the error between the model and the data is no greater than *r*_{c}:
3.4

Model *q* is preferred to model *q*^{′} if *q* is more robust than *q*^{′}, at the same level of satisficed fidelity:
3.5
where ‘≻’ means ‘is preferred to’. The robust-optimal model, , maximizes the robustness and satisfices the mean-squared error at the level *r*_{c}:
3.6
We will denote the maximal robustness, , by .

The following theorem establishes a basic trade-off between robustness-to-uncertainty, , and fidelity to test data *r*_{c}: robustness diminishes ( gets smaller) as greater fidelity is demanded (*r*_{c} is reduced). The same trade-off holds also for the maximal robustness . For proof see Ben-Haim (2000).

For the formulation of this theorem, we need to specify two axioms of info-gap models. An info-gap model is a family of nested sets, , *α*≥0. These set-valued functions have the following properties:
3.7
and
3.8
We will employ an additional axiom later on:
3.9
The addition operator is the Minkowski sum: adding every element of one set to every element of the other set.

These info-gap axioms are exceedingly unrestrictive and can be used to represent model uncertainties of many sorts. For instance, info-gap models can represent uncertain quadratic terms in nominally linear models, or uncertain slopes of monotonic functional forms. Additionally, info-gap models can represent uncertainty in probabilistic models. For instance, the hyper-parameters of probability densities are often chosen by fitting low-order moments, which determines the unobserved tails of the distribution. This is problematic unless the shape of the distribution is known from first principles. An info-gap model can represent the uncertain shape of the unobserved tails. Numerous examples of info-gap models can be found in the study of Ben-Haim (1996,2006).

### Theorem 3.1

*Robustness improves as fidelity deteriorates.*

*Let* *be an info-gap model which obeys the axioms of nesting and contraction. Let* *and* *be its robustness function and maximal robustness function.* *implies:
*
3.10
*and
*
3.11

Equation (3.10) asserts that, for any given model, specified by *q*, the robustness to model-error increases as the required fidelity is relaxed. Equation (3.11) asserts that the most robust model at one level of fidelity, , is more robust than the most robust model at better fidelity, *r*_{c}.

Theorem 3.1 does not imply that it is impossible to find a model that is simultaneously true to the test data (small *r*_{c}) and robust to the uncertainty (large ). For instance, if the noise in the data was very low then the fidelity could be high, regardless of our uncertainty about the model structure. Likewise, if there were little uncertainty about the model structure—if our understanding of the process were nearly complete—then our robustness would be large. What the theorem is stating is that, in any given epistemic state, the robustness to model-error trades off against fidelity to the data.

## 4. ‘Looseness’ of model prediction

In this section, we explore the ‘looseness’ of model prediction: the magnitude of the range of predicted values deriving from models which all satisfy a specified fidelity requirement. We prove a theorem whose meaning is that a change in the model which enhances fidelity-robustness to modelling error, , also increases the looseness of the model prediction. In other words, under the conditions of the theorem, fidelity-robustness and prediction-looseness are antagonistic attributes of any model. An earlier and less general version of this theorem appears in the study of Ben-Haim & Hemez (2004).

### (a) The definition

We first need some definitions. Let *T* denote the space in which the elements of the info-gap model are defined. For any element *μ*∈*T* and any sets *U* and *V* in *T*, define *ρ*(⋅) as a ‘size’ function with the following two properties:
4.1
and
4.2
‘Nesting’ means that if *U* is contained in *V* , then the size of *U* is no larger than the size of *V* . ‘Translation invariance’ means that the size of a set does not change as the set is translated in the space.

As an example, let *f*(*u*) be a real-valued affine scalar function. The following function satisfies equations (4.1) and (4.2):
4.3

We are now able to define prediction-looseness.

The robustness of model at configuration *p* is defined in equation (3.4). The robustness of fidelity-to-data can only be evaluated at a configuration, *p*, for which test data exist. However, can be used to predict the behaviour at any configuration, *p*^{′}; the prediction is . Of course, *p* and *p*^{′} can be one and the same: predicting behaviour at a configuration for which data exist.

Let *Λ*(*p*,*p*^{′},*q*) denote a set of model predictions at configuration *p*^{′}, based on information obtained at configuration *p*. Specifically, based on the info-gap model , *Λ*(*p*,*p*^{′},*q*) is the set of predictions at *p*^{′} of all models which, at configuration *p*, have fidelity no worse than the fidelity of . We have no reason to reject any model, *M* in *Λ*(*p*,*p*^{′},*q*), if fidelity-to-data is used as a measure of merit. The formal definition of *Λ*(*p*,*p*^{′},*q*) is
4.4
*Λ*(*p*,*p*^{′},*q*) is one particular set in the family of nested sets which constitute the info-gap model of uncertainty, , *α*≥0. Namely, *Λ*(*p*,*p*^{′},*q*) is the uncertainty set which is centred on and whose horizon of uncertainty equals the robustness of . If *Λ*(*p*,*p*^{′},*q*) is a large set then the range of predictions of fidelity-equivalent models is large: the prediction-looseness is great. If *Λ*(*p*,*p*^{′},*q*) is a small set then the prediction-looseness is small.

Using a ‘size’ function, *ρ*(⋅), with the properties of equations (4.1) and (4.2), define *λ*(*p*,*p*^{′},*q*) as:
4.5
We refer to *λ*(*p*,*p*^{′},*q*) as the *predictive looseness* of the model specified by *q*, .

When *p*^{′}=*p*, then *λ*(*p*,*p*^{′},*q*) is the looseness in predicting the outcomes, in response to inputs *q*, in the configuration at which the model was updated. This is the most common situation in system modelling. When *p*^{′}≠*p* then *λ*(*p*,*p*^{′},*q*) is the looseness of what is properly called a forecast from one configuration to another.

### (b) The theorem

Both large fidelity-robustness, , and small prediction-looseness, *λ*(*p*,*p*^{′},*q*), are desirable. We will say that robustness and looseness are *sympathetic* if a change in *p* or *q* improves them both; otherwise they are *antagonistic*. The following theorem shows that, under fairly weak conditions, robustness and looseness are always antagonistic. Examples are presented in §5.

### Theorem 4.1

*Fidelity-robustness and prediction-looseness are antagonistic.*

*Let* *be an info-gap model that obeys the axioms of nesting, contraction and translation, and let* *be its robustness function. For any models q*^{☆} *and q*^{′} *and any configurations p*^{☆}*, p*^{′}* and p:*
4.6
4.7
and
4.8
*where λ and Λ are related by equation (4.5) by the ρ(⋅) function, and* *.*

The supposition in relation (4.6) asserts that model *q*^{☆} is more robust (at configuration *p*^{☆}) than is model *q*^{′} (at configuration *p*^{′}, which may be the same as *p*^{☆}). Relations (4.7) and (4.8), which are implied by relation (4.6), assert that *q*^{☆} makes looser predictions than *q*^{′} at any configuration *p* (which may be the same as either *p*^{☆} or *p*^{′}, or may differ from both). In other words, the change from (*q*^{′},*p*^{′}) to (*q*^{☆},*p*^{☆}) enhanced the robustness but decreased the predictive focus at *p*.

### Proof.

Define the following concise notation. , , and .

Supposition equation (4.6) and the axiom of nesting, relation (3.7), imply: 4.9 The axiom of translation, relation (3.9), implies: 4.10 Combining relations (4.9) and (4.10), we obtain: 4.11 From equation (4.4), we see that and . Thus equation (4.11) is precisely equation (4.7).

Equation (4.11), together with the properties of translation invariance and nesting, equations (4.1) and (4.2), imply: 4.12 With the definition of predictive looseness in equation (4.5), this implies equation (4.8). This completes the proof. ■

### (c) The dilemma

Three quantities are central to the info-gap analysis of modelling and forecasting: fidelity of the model to the test data, *r*_{c}; robustness, to model-uncertainty, of the fidelity, ; and prediction-looseness *λ*(*p*,*p*^{′},*q*). Two trade-offs relate these quantities.

— Robustness decreases as fidelity improves: gets smaller as

*r*_{c}gets smaller (theorem 3.1).— Robustness decreases as looseness improves: gets smaller as

*λ*(*p*,*p*^{′},*q*) gets smaller (theorem 4.1).

These trade-offs imply that it is not possible to simultaneously increase the robustness, increase the fidelity and decrease the prediction-looseness. This is illustrated schematically in figure 1. The right quadrant shows the trade-off between robustness to model uncertainty and fidelity to test data *r*_{c}, as asserted by theorem 3.1: poor fidelity (large *r*_{c}) implies good robustness (large ). The left quadrant portrays theorem 4.1. The horizontal axis to the left portrays the predictive looseness *λ* and the curve shows that the predictive looseness and the robustness increase together. Poor fidelity *r*_{1} has high robustness and large predictive looseness *λ*_{1}. Good fidelity *r*_{2} has low robustness and small predictive looseness *λ*_{2}. Fidelity *r*_{c} and predictive looseness *λ* are sympathetic: they improve or deteriorate together. But they are both antagonistic to robustness , so that good (small) values of *r*_{c} and *λ* are unreliable when associated with poor (small) values of .

*High fidelity* (small *r*_{c}) of model implies that the model is true to the measurements, which adds warrant to the model. Low fidelity means that the model is not responsive to the measurements.

*Large robustness* (large ) of model means that a wide selection of models *M*(*p*,*q*) around the model have fidelity to the test data no worse than *r*_{c}. That is, could be modified greatly without diminishing the fidelity. Hence, if *r*_{c} represents high fidelity, then large robustness strengthens belief in the validity of the model, , because the fidelity of is immune to errors and imperfections in its formulation. A small value of robustness impugns the model because, even if *r*_{c} corresponds to high fidelity, this trueness to the data may be the accidental result of the specific erroneous structure of the model. In a perfect world, free of info-gaps, robustness would not be important. The importance of robustness, for warrant of model , derives from the analyst's info-gaps which induce a lack of confidence that the structure of is fundamentally correct. If is robust, then its imperfections, whatever they may be, are only marginally important as they do not seriously diminish the model's fidelity to the data.

*Small predictive looseness* (small *λ*(*p*,*p*^{′},*q*)) implies that all the models that are equivalent to in terms of satisficing the fidelity to the test data, also agree in their predictions of the system behaviour in configuration *p*^{′}. A large value of looseness means that fidelity-equivalent models strongly disagree in their predictions of the system behaviour.

*The dilemma.* ‘Truth’ is a difficult philosophical concept. Nonetheless, at least from a pragmatic point of view, *fidelity* to data is necessary (though not sufficient) for trueness of the model. *Robustness* to model uncertainty indicates trueness of fidelity. *Looseness* of model prediction increases as fidelity-robustness to model-uncertainty improves.

The dilemma results from the conflict between two uncertainties: spread of the data (calibration and measurement errors, and the experimental variability owing to lack of control of the experiment) and epistemic limitation (imperfect understanding of the process). The need for fidelity arises from spread of the data, while the need for robustness arises from epistemic uncertainty about the structure of the model. We explore this dilemma further in §6.

The dilemma may be resolved, in practical applications, by making value judgements of various sorts. The question ‘How robust is robust enough?’ calls for judgements like those elicited by the question ‘How safe is safe enough?’. Those value judgements can be based on formal tools such as analogical reasoning (Ben-Haim 2006, ch. 4), by appeal to guidelines, or with informal reasoning. Determination of acceptable fidelity may be assisted by probabilistic considerations if the fidelity relates to aleatory uncertainty of the measurements. Determining acceptable prediction-looseness will often be supported by the implications of prediction error. Finally, judgements must balance the three attributes. A vast array of tools are available for these judgements, including formal reasoning (Fagin *et al.* 1995), artificial intelligence (Lawry 2006) and expert elicitation (Meyer & Booker 2001). Further exploration of these issues is beyond the scope of this paper.

## 5. Example: a nonlinear system

In this section, we will consider a system which is modelled as a linear input–output relation in which uncertain quadratic terms are ignored. The model may represent a mechanical force–displacement relation, or an economic Phillips curve relating inflation to unemployment or the biological relation between the rate of flow of river water and the spawning success of fish, and so on. The example will demonstrate the trade-off asserted by theorem 4.1, and the violation of the trade off when the conditions of the theorem are not satisfied.

### (a) Formulation

Consider a multi-input single-output system subject to a vector *p*=(*p*_{1},… ,*p*_{K})^{T} of inputs. The scalar output is *y*. Our current best estimate of the input–output relation is a linear model , where *q* represents system properties. Specifically, , where *q*=(*q*_{1},… ,*q*_{K})^{T} is a vector of model coefficients.

Let us consider unmodelled quadratic terms which may, in fact, be present in the input–output relation:
5.1
where and *Q*=(*Q*_{1},… ,*Q*_{K})^{T}.

We have no direct data on *Q* for the system of interest, but evidence from other somewhat similar systems indicates that *Q* is typically small on average but can vary substantially. Let us define *S*^{−1} as a real, symmetric, positive definite matrix specifying a *K*-dimensional ellipsoid which approximates the cluster of the observed or suspected *Q* vectors of the related systems. For instance, *S*^{−1} may be the inverse of the population covariance matrix *S* of observed *Q* vectors. Alternatively, *S*^{−1} may be chosen to approximate experts’ opinions of the dispersion of the *Q* vectors. In any case, as the data are very scarce, and not derived from the system which is actually of interest, it is possible that values of *Q* well outside this ellipsoid can occur for the system being analysed. Furthermore, the evidence is insufficient to verify the choice of a probability distribution for *Q*.

In light of these considerations, we represent the uncertainty in the input–output relation, arising from the uncertain quadratic terms, by the following ellipsoid-bound info-gap model:
5.2
This info-gap model is an unbounded family of nested sets , *α*≥0, where each is a set of input–output models *M*(*p*,*q*). Two levels of uncertainty are entailed in this info-gap model. At any horizon of uncertainty, *α*, the specific realization *Q* is unknown. In addition, the horizon of uncertainty, *α*, is unknown.

The info-gap model in equation (5.2) obeys the axioms of nesting and contraction. Note that *π* depends on *p*. Thus, obeys the axiom of translation only if the configuration *p* is fixed. Consequently, we can apply theorem 4.1 only if we hold *p* constant. That is, for the current example, theorem 4.1 implies that, *if* model *q*^{☆} is more robust than model *q*^{′} in the same configuration *p*, *then* the prediction-looseness of model *q*^{☆} exceeds the prediction-looseness of model *q*^{′}. In this example, we will simplify our notation and define *λ*(*p*,*q*)=*λ*(*p*,*p*,*q*) and *Λ*(*p*,*q*)=*Λ*(*p*,*p*,*q*).

### (b) Prediction-looseness

In the present example, the set *Λ*(*p*,*q*), defined in equation (4.4), is the interval from the least to the greatest predicted values, as the model *M*(*p*,*q*) varies continuously on the uncertainty set whose horizon of uncertainty equals the robustness, . As in equation (4.3), the following function satisfies equations (4.1) and (4.2):
5.3
We are thus entitled to adopt this function as the predictive looseness in this example. It is readily shown, using Lagrange optimization, that the looseness is:
5.4
As anticipated by theorem 4.1, the looseness and robustness are antagonistic (at fixed *p*): any change in the model coefficients *q* which improves the robustness (makes larger) causes the prediction-looseness to deteriorate (makes *λ* larger as well).

Equation (5.4) indicates that a plot of *λ* versus is a line of positive slope through the origin, generated by varying the model characteristics *q*, as shown in figure 2. The slope, , can be thought of as the cost (in terms of lost predictive focus) of a unit increase in robustness. A large slope means that a small improvement in robustness to uncertainty entails a large increase in the range of predictions of models whose fidelity to the test data is no worse than *r*_{c}.

The slope in figure 2 is large if *π*, the squared inputs, are large, or if the population covariance matrix *S*, is large. ‘Large *S*’ means, roughly, that the measurements are widely dispersed. Large measurement variability makes the attainment of robustness to epistemic uncertainty very costly in terms of predictive focus.

Also interesting is the interaction between *π* and *S* and its influence on the trade-off between looseness and robustness. Let *s*_{1} and *s*_{K} be the least and greatest eigenvalues of *S* (*s*_{1}>0 since *S* is positive definite.) Then one finds that:
5.5
This demonstrates how the inputs *π* and model-uncertainty *S* combine to determine the trade-off between prediction-looseness and robustness. Squared-input vectors *π* that are oriented along eigenvectors of *S* with large eigenvalues (near *s*_{K}) will entail large slope of *λ* versus . On the other hand, input vectors that ‘activate’ relatively certain modes of *S* (those with small eigenvalues near *s*_{1}) will induce lower looseness-costs for model-robustness.

Relation (5.5) has practical implications for modelling the behaviour of the system. Let *V*_{1} be the sub-space spanned by eigenvectors of *S* having small eigenvalues, and let *V*_{2} be the complementary sub-space. If the relevant inputs lie in *V*_{1}, then the model can be robustified more than if the relevant inputs lie in *V*_{2}, at the same level of prediction-looseness. This implies that the analyst's judgements regarding achievable robustness and its cost in prediction-looseness, will be influenced by prior judgements about the range of anticipated inputs. This in turn may influence the choice of the model, *q*.

### (c) Sympathetic robustness and looseness

As noted above, the info-gap model in equation (5.2) does not obey the axiom of translation if the input vector *p* varies. Consequently, theorem 4.1 does not apply to changes in the input vector. In fact, we will show that, in a simple realization of the current example, prediction-looseness and robustness can be sympathetic with respect to changes in the input: a change in *p* which improves the robustness (causes to increase) may also improve the prediction-looseness (causes *λ* to decrease).

For simplicity, let *p* and *q* be scalars, so a single input *p* is applied to a system with coefficient *q* (which we assume is positive). In the info-gap model of equation (5.2), *Q* is a scalar and *S*^{−1}=1/*s*^{2}, where *s* is an estimate of the variation of the *Q* values. Define , which is the mean-squared error of the nominal model, . Define , which is the average error of .

The robustness, defined in equation (3.4), is found to be: 5.6

In the particularly simple case that only a single measurement, *y*, is available, so that *N*=1 and , the positive part of equation (5.6) becomes:
5.7
Combining this with equation (5.4), we find that the prediction-looseness is:
5.8

Comparing equations (5.7) and (5.8) we see that robustness and looseness *λ* are *antagonistic* with respect to the model parameter *q*: a change in *q* which augments robustness also increases the looseness, as anticipated by theorem 4.1. This is illustrated in figure 3, where we see that the slopes of and *λ* always have the same sign. The difference in the magnitudes of these slopes determines the trade-off cost: how much robustness must be relinquished in order to reduce the prediction-looseness, or vice versa.

If the conditions of theorem 4.1 are not satisfied, then robustness and looseness can be *sympathetic* with respect to change in the input *p*. This means that there is a range of *p* values for which a change in *p* improves both the robustness and the looseness: augments and diminishes *λ*. Figure 4 shows that the slopes of and *λ* have opposite signs for *p*>*y*/*q*. However, if *p*<*y*/*q*, then robustness and looseness are antagonistic with respect to *p*; the slopes of and *λ* have the same sign. Once again the difference between the slopes determines the trade-off cost.

## 6. Conclusion: methodological issues

In this section, we will raise some methodological issues related to the problems of measurement, modelling and prediction which are dealt with in this paper. In §6*a*, we discuss the relation between our results and the legacy of scientific positivism. In §6*b*, we identify some questions which our results motivate regarding induction, warrant and prediction.

### (a) Positivism, prediction and uncertainty

In this paper, we have dealt with severe uncertainty and its implications for modelling and predicting complex behaviour. We have stressed that models err not only because the test data with which they are calibrated are noisy, but also because our choice of model structure is based on incomplete or even fallacious understanding of the processes being modelled.

Knight recognized uncertainty of this sort in economics, calling it ‘true uncertainty’ as opposed to probabilistic risk for which test data and understanding are more extensive (Knight 1921). Knightian uncertainty is characteristic of highly complex and dynamic systems.

The sort of complex dynamism which induces Knightian uncertainty was identified by Popper as causing ‘indeterminism’. For Popper, indeterminism arises in intelligent learning systems (Popper 1982, pp. 62–63). ‘Intelligence’ is meant as the ability to adapt behaviour to experience, regardless of whether this is done willfully or not. ‘Learning’ is meant as the discovery of facts or situations which previously were unknown. A discovery tomorrow cannot be known today. Consequently, the behaviour tomorrow of an intelligent learning system cannot be entirely predicted today because of the behavioural impact of tomorrow's discoveries. Hence, all intelligent learning systems are accompanied by an inherent indeterminism, an inherent recalcitrance to law-like or predictive modelling. Shackle writes similarly of the ‘dissolution of rational determinacy’ (Shackle 1972, ch. 22).

Hayek also focused on the limitations of human understanding of complex systems, especially highly dispersed systems such as, but not limited to, economic markets. It is possible, Hayek contended, to predict tendencies or trends, and to identify principles governing these patterns. However, it is impossible to consistently and reliably forecast or explain the detailed evolution of such systems (Caldwell 2004, pp. 339–340, 346). Poole concurs with Hayek's scepticism and with Shackle–Popper indeterminism:
[A]nyone interested in monetary policy should spend less time on economic forecasts and more time on implications of forecast surprises. … [P]olicy needs to be informed by the best guesses incorporated in forecasts and by knowledge of forecast errors. … And the surprises that create forecast errors also create the need for policy changes that cannot be anticipated in advance because the surprises cannot be anticipated.

(Poole 2004, pp. 1, 4)

Our example in §5 illustrates these limitations in forecasting power for a class of models used widely in engineering, economics and elsewhere.

The upshot of Hayek's position is a rejection of the strong scientific optimism which characterized much activity in social and technological disciplines in the nineteenth and twentieth centuries. This optimism, sometimes referred to as positivism, is the conviction that thorough application of scientific understanding will lead to better control and utilization of any system—technical or human—whatever the goals may be. Positivism underlies activities such as central economic planning over the full gamut from socialism to the modern welfare state, and deferential acceptance of technological and medical prognostication. Knight's uncertainty, Popper's indeterminism and Hayek's scepticism all stand in stark challenge to the positivists’ optimism about the predictive power of models.

There is no doubt that quantitative models sometimes do make impressively accurate and useful predictions concerning highly irregular occurrences. For instance, Bayesian models have been used to anticipate severe forest fires (Beckage & Platt 2004). Severe events are, by definition, rare and unusual, and the ability to use historical data to predict seasons or years in which severe fires will occur is a welcome success of advanced modelling technique. By identifying patterns in related meteorological events, the model predicts with admirable accuracy the propensity for greatly abnormal fires in subsequent periods of three to 12 months. However, in contrast to this successful prediction of a tendency, forecasting the precise acreage of fire devastation is far from reliable. As Hayek anticipated, identification of the main factors governing a complex phenomenon (e.g. forest fires) leads to useful predictions of trends, but not to successful detailed quantitative forecasts.

The theorems presented in this paper have a decidedly Hayekian flavour, and are based on info-gap quantification of Knightian uncertainty. Models can be faithful to experience and can indicate future trends. However, complex processes may be incompletely comprehended. When this is the case, theorem 3.1 states that model-fidelity is lost as we increase the robustness to our ignorance of that complexity. Furthermore, theorem 4.1 asserts that, as we enhance our robustness to ignorance, we lose predictive focus; models whose fidelity is reliable will tend to be poor predictors if info-gaps accompany our understanding. Combining these constraints we find that, for complex and imperfectly understood processes, we will face challenges when models are expected to do more than to predict general trends. Hayekian scepticism about optimistic positivism seems well founded in light of our theoretical results. On the other hand, the robustness and loosesness functions developed in this paper provide quantitative tools for assisting the analyst in identifying models which have adequate fidelity, robustness and predictive focus.

### (b) Induction, warrant and prediction

*Hume and the problem of induction.* The conflict between robustness, fidelity and prediction-looseness is reminiscent of Hume's critique of empirical induction (Hume 1777). Our analysis shows that past measurements, accompanied by incomplete understanding of the measured process, cannot unequivocally establish true predictions of the behaviour of the system. However, the issue is not inherently temporal. The problem is not that *past* measurements do not logically bind *future* behaviour. The problem is epistemic. We use test data to select a model of system behaviour. But data alone are insufficient for the selection process, which depends also on the analyst's understanding of the measured process. Experience shows that this understanding is incomplete. Also, like Hume, we claim that no amount of data can confirm the contention that no aspect of the process has been overlooked.

Hume's critique is clearly related to the antagonism between robustness and prediction-looseness. However, can this antagonism be reduced to the classical problem of scientific induction? To explore this question further, we consider the issue of warrant for a proposition.

*Haack, warrant and prediction.* Our main result, theorem 4.1, establishes general conditions under which a physical model can be highly warranted by virtue of high fidelity to test data and high robustness to epistemic uncertainty, only by limiting its predictive power. The questions we wish to raise (but not answer) are: Is ‘warrant’ overrated as a tool for model selection? Philosophers have been greatly interested in model selection in natural science. Is model selection in natural science fundamentally different from model selection in utilitarian disciplines, such as engineering, economics, medicine or law? Robust fidelity is anti-correlated with predictive power. Can concepts of warrant for a model be modified to correlate better with predictive power?

Our only contribution to these questions is to explain that robust fidelity can be understood at least partially as a type of warrant within the context of one of the most attractive of contemporary epistemological theories of warrant.

Susan Haack, in expounding her epistemological theory of ‘foundherentism’ (Haack 1993), explains that
how warranted an empirical claim is depends on how well it is supported by experiential evidence and background beliefs, how reasonable those background beliefs are, independent of the belief in question, and how much of the relevant evidence the evidence includes.

… Briefly and very roughly, how well evidence supports a claim depends on how well the claim is explanatorily integrated with the evidence.

(Haack 2001, p. 11)

The relation between foundherentism as a holistic theory of warrant, and info-gap robust-satisficing as a methodology for severely testing hypotheses, is not simple or unambiguous (Ben-Haim 2006, pp. 332–333). However, the following claims seem reasonable:

— If physical model has high fidelity to the test data, then it is ‘supported by experiential evidence’. All ‘relevant evidence’ has been included if any outliers in the data have been excluded as irrelevant only on substantive evidential grounds.

— The concept of ‘background beliefs’ can be interpreted in the present context to refer to the array of competing models which collectively represent the universe of potential and actual understandings of the processes involved. Consequently, if the robustness of to epistemic uncertainty is large, meaning that could be modified in many ways without significantly diminishing its fidelity, then is ‘supported by … background beliefs’.

— The physical models

*M*contained in the info-gap model at low*α*are quite similar to, and thus ‘dependent’ on, . However, at large horizon of uncertainty, the set contains an infinity of models which differ greatly from the ‘centrepoint’ model . Thus, at high robustness, many models lie in the universe of discourse that are highly unrelated to, and thus ‘independent’ of, .

When modelling imperfectly understood processes, the upshot of all this is that a model that is warranted by large robustness at high fidelity, seems to satisfy Haack's criteria for epistemological warrant. But high fidelity at high robustness is anti-correlated with high predictive power (under the stipulations of theorem 4.1). What good is robust fidelity in particular, and epistemological warrant in general, if predictive power is not only not entailed, but in fact counter-indicated?

## Acknowledgements

The authors are indebted to Prof. Susan Haack for her stimulating correspondence.

- Received January 18, 2011.
- Accepted August 15, 2011.

- This journal is © 2011 The Royal Society