## Abstract

Integration of sampled data arises in many practical applications, where the integrand function is available from experimental measurements only. One extensive field of research is the problem of pest monitoring and control where an accurate evaluation of the population size from the spatial density distribution is required for a given pest species. High aggregation population density distributions (peak functions) are an important class of data that often appear in this problem. The main difficulty associated with the integration of such functions is that the function values are usually only available at a few locations; therefore, new techniques are required to evaluate the accuracy of integration as the standard approach based on convergence analysis does not work when the data are sparse. Thus, in this paper, we introduce the new concept of ultra-coarse grids for high aggregation density distributions. Integration of the density function on ultra-coarse grids cannot provide the prescribed accuracy because of insufficient information (uncertainty) about the integrand function. Instead, the results of the integration should be treated probabilistically by considering the integration error as a random variable, and we show how the corresponding probabilities can be calculated. Handling the integration error as a random variable allows us to evaluate the accuracy of integration on very coarse grids where asymptotic error estimates cannot be applied.

## 1. Introduction

Integration of sampled data arises in a wide class of problems, as in many practical applications, the integrand function is available from experimental measurements only and is therefore given by a discrete set of function values. Numerous examples include acoustics and signal processing [1,2], image reconstruction [3], microbiology [4], ecological applications [5], etc. A general problem of numerical integration has a long and successful history, and many accurate and efficient methods for integration of sampled data have been designed and documented in the literature [6–9]. Meanwhile the increasing complexity of practical problems has resulted in the recent need to revise existing algorithms as new problems have emerged. One such problem is that of insect pest monitoring, where an accurate evaluation of the population size from the spatial density distribution is required for a given biological species [10].

Let us consider a particular problem of pest insect monitoring in a single agricultural field as an instructive example. An accurate estimate of the total number of pest insects is crucial for making a reliable decision about the use of pesticides in the area where the crop is grown [11]. The data for evaluation of the pest population size is usually collected by trapping. Insect traps are installed at the nodes of a uniform Cartesian grid in the agricultural field, they are exposed for a certain time, and then the traps are emptied and the insects caught are counted [12]. Under the assumption that the number of insects in each trap gives us the true value of the population density obtained at the location of the trap [13], the methods of numerical integration on uniform grids can be employed to estimate the total number of pest insects from the discrete density distribution [14]. However, as we show below, the application of well-known methods, such as the Newton–Cotes integration rules on a uniform grid, is not straightforward at all.

The main difficulty associated with the estimation of the total number of pest insects from trap counts is that the number *N* of traps cannot be made large enough to ensure that the integral estimate is accurate. Installment of many traps per unit agricultural area would, in itself, bring considerable damage to the agricultural product. Also, trapping is costly and labour consuming, and it introduces a disturbance to agricultural procedures. Hence, the problem of numerical integration has to be solved for a small number *N* of traps.

From a computational viewpoint, the evaluation of the population size when the number *N* of traps is small presents the problem of numerical integration of the integrand function on a very coarse uniform Cartesian grid. Moreover, grid adaptation is not possible in the problem as the grid should be generated only once, and *N* cannot be further increased. This restriction appears because a repeated trapping with an increased number of traps is not available in ecological applications due to the impossibility of reproducing the initial conditions.

Under the restrictions outlined above, the two following questions arise.

— What is the minimum number

*N*_{t}of traps required to achieve desirable accuracy?— What can be an alternative measure of accuracy on a coarse grid of traps where

*N*<*N*_{t}?

Although the processing of sparse data has intensively been studied in various problems from physics and engineering [15–18], to the best of our knowledge, numerical integration of sparse data has not been discussed in the literature. An attempt to address the questions above has been made in our recent work [14,19,20]. In particular, it has been demonstrated in [14] that the answers to those questions depend strongly on the integrand function under consideration. Namely, the accuracy of integration remains acceptable, even on very coarse grids, if the pest population is distributed more or less over the whole area, no matter whether this distribution is close to homogeneous or has a complex heterogeneous structure. On the contrary, the accuracy is very poor when one has to integrate a high aggregation density distribution (i.e. the density function with a single maximum whose support subdomain is small in comparison with the domain area) on a coarse grid.

The high aggregation density distributions are an important class of data that may appear in ecosystems under various conditions. For instance, one common scenario of biological invasion is that the pest species starts spreading from a small localized area and invades the entire domain as time progresses [10]. As the entire pest population is confined to a single subregion within an agricultural field, and the pest population is zero outside that subregion at the beginning of the invasion process, the population density is described as a peak function from a mathematical viewpoint. Obviously the decision about the application of pesticides is best made before the patch of high density spreads over the whole agricultural field. Hence, timely and accurate evaluation of the total number of pest insects at earlier stages of biological invasion is very beneficial for the cultivation of the agricultural product. At the same time, the application of numerical integration methods is significantly hampered by the fact that the exact location of the peak is not known in the problem. Thus, instead of installing the traps locally (i.e. in the initially infested area) to increase the accuracy of integration, a uniform grid of traps has to be generated over the entire domain. That makes numerical integration of peak functions a very difficult task, as the information about the density function on such a sparse grid may be insufficient for reasonably accurate integration. Namely, it was discussed in [14,20] that the whole peak may be located between nodes of a sparse grid and will therefore be completely missed. This can in turn severely affect the accuracy of the pest population size evaluation. Hence, the questions highlighted above require careful attention when peak functions are considered on coarse grids, and in this paper, we develop a novel probabilistic approach to work with sparse data.

The paper is organized as follows. In §2, we briefly revisit the problem of numerical integration and introduce the concept of ultra-coarse grids for peak functions. An ultra-coarse grid is defined as a grid where the desirable accuracy of integration can only be achieved with probability smaller than 1. It will be shown that integration on ultra-coarse grids cannot guarantee the prescribed accuracy because of the insufficient information (uncertainty) about the integrand function. Instead, the results of the integration should be treated probabilistically by considering the integration error as a random variable with a high magnitude. We calculate the probability of an accurate integral estimate in §3, while in §4 we consider the transition from ultra-coarse grids to coarse grids where the desirable accuracy can be achieved with the probability equal to 1. Numerical examples are considered in §5. A discussion of our results is provided in §6. Our research of ultra-coarse grids will be focused on the one-dimensional case, but the approach we present in the paper can be extended to two-dimensional density distributions.

## 2. Numerical integration of sparse data

The invasion regime when a spreading pest population forms a strongly heterogeneous spatial distribution, has its one-dimensional counterpart when a peak density function appears somewhere at the unit interval *D*=[0,1] [19]. Thus, in the one-dimensional case, a high aggregation density distribution *u*(*x*) can be modelled by the following peak function with the support *D*_{u}=[*x*_{I},*x*_{II}] on the unit interval
2.1where we assume that the function *f*(*x*) has a single maximum at point *x**=0.5(*x*_{I}+*x*_{II}). At this stage, we are not interested in a more detailed definition of the function *f*(*x*), as particular examples of *u*(*x*) will be considered further in the text.

One computational problem related to the study of ecologically meaningful density distributions is that the integral is not available in closed form. Hence, before moving to the discussion of a more general case, we consider several examples where the exact value of the integral is available and the integration error can therefore be computed. A convenient example of a peak function is given by the normal distribution
2.2shown in figure 1*a* for the peak width *δ*=6*σ*=0.25. Let us numerically integrate (2.2) over the unit interval *D*=[0,1]. Consider a uniform grid generated in the domain *D* as *x*_{i+1}=*x*_{i}+*h*, *i*=1,…,*N*−1, where the grid step size is *h*=1/(*N*−1). We employ the midpoint rule of integration [21] as a baseline integration method in our problem. Once a computational grid has been generated, the integral *I* is computed by the compound midpoint rule as
2.3where *w*_{i}=*h* for the interior nodes *i*=2,…,*N*−1 and *w*_{i}=*h*/2 at boundary points *i*=1,*N*.

In our discussion of the accuracy of numerical integration on coarse grids, we will also compute the integral by the compound Simpson method where the number *N* is required to be an odd number, *N*=2*m*+1, and the weights in (2.3) are given by *w*_{i}=4*h*/3, *i*=2,4,…,2*m*, *w*_{i}=2*h*/3, *i*=3,5,…,2*m*−1 and *w*_{i}=*h*/3, *i*=1, *i*=*N*. Finally, the third method we employ for numerical integration is a so-called ‘statistical method’ widely used in ecological applications [22]. The method evaluates the integral as
2.4where *A* is the given area. Since *A*=1, when we integrate over the unit interval, the method (2.4) is reduced to the evaluation of the mean density of the pest population.

We define the integration error as
2.5where *I* is the exact integral and is the approximate integral computed by the chosen method of numerical integration. The accuracy of integration should be
2.6where *τ* is a specified tolerance. It is important to note here that in ecological applications, the accuracy requirements on coarse grids are essentially different from those arising in the conventional problem of numerical integration, as the tolerance 0.2<*τ*<0.5 is considered as acceptable [23,24]. However, we will see further in the text that even such relatively low accuracy of computations is not always achievable when the number of grid nodes is small.

Let us compute the integration error (2.5) for the density distribution (2.2). Consider first the midpoint rule (2.3) of integration. The error (2.5) of the midpoint rule is shown as a function of the number *N* of grid nodes in figure 1*b*. It can be seen from the figure that the integration error of the midpoint rule is not controlled by uniform grid refinement on very coarse grids that are in the focus of ecological research. Namely, the error *e*∼1 remains beyond the acceptable range *e*∼0.2–0.5 on grids with , where is the upper bound for the realistic number of traps to be considered in ecological applications [25,26].

As the uniform refinement does not decrease the error on coarse grids (see figure 1*b*), the obvious decision would be to apply a more accurate method of numerical integration (i.e. the Simpson method; see [21]) in order to improve the accuracy of integration on grids with small *N*. However, it can be seen from figure 1*b*, where the error of the Simpson method is shown on a sequence of uniformly refined grids, that the Simpson method is not more accurate than the midpoint rule on coarse grids. On the other hand, the method (2.4) is theoretically less accurate than (2.3) [21], but it cannot be said from figure 1*b* that the method (2.3) is better than (2.4) when coarse grids are considered.

Let us recall that the exact location of the ‘peak subdomain’ *D*_{u} is unknown to us. Meanwhile, the value of the approximate integral depends obviously on how many grid nodes are stationed inside the subdomain *D*_{u} on a coarse grid. In order to better understand how the integration error (2.5) depends on the location of the peak with respect to the position of the grid nodes, we now consider the peak location *x** in the function (2.2) as a uniformly distributed random variable. Let us fix the number of grid nodes as *N*=5 and randomly move the peak (2.2) over the domain *D* 10 times. We integrate the function (2.2) every time that we move the peak, i.e. for each of the 10 realizations *n*_{r} of the random variable *x**. The results of numerical integration are shown in figure 1*c*, where the integration error (2.5) is computed for each of the 10 locations of the peak on the coarse grid of five nodes. For instance, when *n*_{r}=3 the peak's location is *x**=0.7013, while for *n*_{r}=7 the same peak is located at *x**=0.4188, etc. It is readily seen from the graph of figure 1*c* that for the midpoint rule of integration, the error (2.5) depends essentially on the peak location as the error varies as 0.0173≤*e*≤1.379 on the grid with the fixed number of nodes. The same observation with regard to the integration error is true for the Simpson rule and the integration rule (2.4), where it can be seen from the figure that the error's magnitude varies significantly as we vary the peak location.

It follows from the results of the test case (2.2) that the accuracy of computation cannot be determined when the high aggregation density distributions are considered on grids with a small number of nodes. Since the integration error depends on a random location of the point *x**, the error itself becomes a random variable and a probabilistic approach should be used to evaluate the accuracy. We will refer to the grids, where the accuracy of numerical integration cannot be determined, as ultra-coarse grids. Correspondingly, the questions formulated in §1 are reformulated on ultra-coarse grids as follows.

— Given the number

*N*of grid nodes on a uniform grid (the grid step size*h*=const.), what is the probability of the event*e*≤*τ*_{0}, where the integration error*e*is given by (2.5) and*τ*_{0}is the chosen tolerance?— What is the threshold number

*N*_{t}of grid nodes when the probability*p*of achieving an integration error (2.5) smaller than the given tolerance*τ*_{0}becomes*p*=1?

In the next two sections of our paper, we answer the questions above when peak functions are integrated using the general midpoint rule (2.3).

## 3. Ultra-coarse grids

Let us expand the density function *u*(*x*) given by (2.1) at the maximum point *x** as

We assume that the remainder *R*(*x*) can be neglected in the vicinity of the peak, so that the integrand *u*(*x*) becomes
3.1where , *B*=*u*(*x**)>0. The integral *I* is then given by
3.2where the peak width *δ* is defined as
3.3

Consider a uniform grid of *N* nodes generated in the domain [0,1] as *x*_{i+1}=*x*_{i}+*h*, *i*=1,…,*N*−1, where the grid step size is *h*=1/(*N*−1). We begin our study of ultra-coarse grids with the case when the grid step size *h*>*δ*. In other words, we require that the grid is so coarse that either no grid nodes or one grid node fall within the peak subdomain. Since the absence of grid nodes within the peak support is a degenerate case, below we consider one grid node *x*_{i} located in the subdomain [*x*_{I},*x*_{II}] (see figure 2*a*).

Let the grid step size be
3.4where the parameter *α*>1. Since a quadratic function is symmetric, it is sufficient to consider the subinterval [*x**,*x*_{II}]. The location of the grid node *x*_{i} relative to the location of the peak *x** is then given by
3.5The midpoint rule approximation of the integral for the geometry *α*>1 is shown in figure 2*a*. The integral computed by the midpoint rule when we use the approximation (3.1) for *u*(*x*) is as follows:
3.6Hence, the integration error (2.5) is

Without loss of generality, let us choose the tolerance as . Taking into account (3.3) and (3.4) and solving the inequalities
3.7for the node location *γ*, we obtain the following condition:
3.8where
3.9for the tolerance .

For the sake of the discussion in §4, it is worth noting here that the inequality *γ*_{I}(*α*)≤*γ*(*α*) implies that . If *α*<*α*_{I}, then always holds, and the inequalities (3.8) should be replaced as
3.10Similarly, the inequality *γ*(*α*)≤*γ*_{II}(*α*) requires , the condition that always holds under our previous assumption *α*>1.

The curves *γ*_{I}(*α*) and *γ*_{II}(*α*) are shown in figure 3*a*. The conditions (3.9) define the parameter range where the integral is computed with the required accuracy *τ*_{0}. Consider a peak of width *δ* and let us fix , so that the grid step size becomes fixed as . Compute and on the grid of size . The inequalities (3.8) provide the error *e*≤*τ*_{0} for any (see figure 3*a*).

Let us assume that the location *x** of the peak maximum can be found at any point of the domain [0,1] with equal probability, that is the random variable *x** is uniformly distributed. It then readily follows from the above consideration that for the peak width *δ*, the probability *p*(*e*≤*τ*_{0},*α*) of achieving the desired accuracy *e*≤*τ*_{0} on a uniform grid with grid step size *h* is computed as
where the entire range of *γ* is given by *γ*_{min}=0, . Since the tolerance *τ*_{0} is always fixed as , we further omit it in the definition of the probability function and consider the probability *p*=*p*(*α*). Straightforward analysis of expressions (3.9) shows that the probability *p*(*α*) of achieving the integration error *e*≤*τ*_{0} remains *p*(*α*)<1 on any ultra-coarse grid where the condition *α*>1 (i.e. *h*>*δ*) holds.

We summarize the findings of this section as follows.

— We have shown that the integration error should be handled as a random variable on ultra-coarse grids where the data available for integration are sparse.

— We have considered a quadratic approximation of the integrand function and found the probability

*p*of an accurate answer when a quadratic polynomial is integrated on an ultra-coarse grid. The relative position of a grid node to the peak function was parametrized by the ratio*α*of the grid step size*h*and the width of the peak function*δ*. It was shown that the probability*p*of achieving the prescribed accuracy is*p*<1 for all*α*>1 (i.e. when either zero or one grid node lie within the support of the peak function).

Meanwhile, the important question that still remains is: if we gradually increase the number of grid nodes *N*, what is the threshold number *N*_{t} for which we have the probability *p*=1 of the accurate answer (2.6)? The answer to this question will be given in §4.

## 4. Transition from ultra-coarse grids to coarse grids

In this section, we investigate the transition from grids where the integration error is a random variable to grids where the condition *e*≤*τ*_{0} always holds for the given tolerance *τ*_{0}. Let us increase the number *N* of grid nodes in order to decrease the grid step size *h* as
4.1where *δ* again is the peak width (3.3). In other words, we now require that when the parametrization (3.4) is used. If the condition (4.1) holds, then either one or two grid nodes belong to the subdomain [*x*_{I},*x*_{II}] (see figure 2*b*).

Consider the location (3.5) of grid node *x*_{i}. The minimum value *γ*_{0} that provides the location of two grid nodes *x*_{i−1} and *x*_{i} in the subdomain [*x*_{I},*x*_{II}] is defined from the conditions *x*_{i−1}=*x**−*δ*/2 and *x*_{i−1}=*x*_{i}−*h*. We have
4.2where the parametrization (3.4) is taken into account.

For *γ*∈[0,*γ*_{0}), only one grid point belongs to the interval [*x*_{I},*x*_{II}], and we can use the result (3.9) to compute the probability of accurate integration. Hence, we now focus on the range when two grid points are captured by the peak, as shown in figure 2*b*.

Let us use again the quadratic approximation (3.1) of the integrand function. Since *g*(*x*_{i−1})=−*A*((*γ*−1)*h*)^{2}+*B* and *g*(*x*_{i})=−*A*(*γh*)^{2}+*B*, the integral is computed by the midpoint rule as follows:

For the sake of simplicity, we require again the tolerance to arrive at the inequalities (3.7). Consider first the inequality . Simple algebraic transformation results in where .

Consider the roots and of the equation *γ*^{2}−*γ*+*C*=0. The non-empty range *γ*∈[*γ*_{III},*γ*_{IV}] exists, if the inequality 4*C*(*α*)≤1 holds. Substituting the above expression for *C*(*α*) into this inequality, we obtain
4.3Numerical solution of equation (4.3) gives us the roots *α*_{1}≈−1.1072, *α*_{2}≈0.2696 and *α*_{3}≈0.8376. Hence, if , the range *γ*∈[*γ*_{III},*γ*_{IV}] will provide us with the integration error , where we should also take into account the restriction . It readily follows from the above computation that and for any .

Consider now the lower boundary *γ*_{0}. The curve *γ*_{III}(*α*) intersects the curve *γ*_{0}(*α*) at the point *α*_{t} (see figure 3*b*). We require *γ*_{0}(*α*_{t})=*γ*_{III}(*α*_{t}) to obtain the equation
4.4The roots of (4.4) in the subinterval are and *α*_{t}≈0.8090. Hence, the curve *γ*_{III}(*α*) lies under the curve *γ*_{0}(*α*) for , and it is above the curve *γ*_{0}(*α*) when *α*∈(*α*_{t},*α*_{3}] (see figure 3*b*). Let us also note that *α*_{t}<*α*_{I}<*α*_{3}.

Finally, we consider the inequality where after some algebraic transformations we arrive at
4.5with *D*(*α*) given by . The requirement 4*D*(*α*)≤1 results in the inequality
which does not have any real roots for *α*>0. Hence, the inequality (4.5) holds for any value of *γ*.

Let us now compute the probability *p*=*p*(*α*)_{theor} of the event that the error is *e*≤*τ*_{0} on a grid with fixed grid step size *h*=*αδ*, where . The entire domain is shown in figure 4*a*, where the curves *γ*(*α*) of figure 3*a*,*b* are now ‘glued’ together. In all cases considered below, it is possible that either one node (when *γ*<*γ*_{0}) or two nodes (when *γ*≥*γ*_{0}) fall within the peak subdomain depending on the value of the parameter *γ*. The probability *p* of accurate approximation is then *p*=*p*_{1}+*p*_{2}, where *p*_{1} is the probability of an accurate estimate when one node is located in the peak subdomain, and *p*_{2} is the probability of an accurate estimate computed when two nodes belong to the peak subdomain. The whole range of *γ* is , and the following cases should be considered for the length of the interval where numerical approximation is accurate.

— , where

*α*_{t}≈0.8090 has been obtained as a solution to equation (4.4) derived under the condition that the tolerance . Since*γ*_{III}(*α*)≤*γ*_{0}(*α*) and , then for any the conditions (3.7) hold. In other words, if there are two grid points in the subdomain [*x*_{I},*x*_{II}], then the integration error is*e*≤*τ*_{0}, no matter how those grid points are located with respect to the maximum point*x**. The probability of the accurate answer is .Consider now

*γ*∈[0,*γ*_{0}), so that just one grid point is located in the peak subdomain. The admissible range of*γ*is then given by the inequality (3.8). Let us investigate the position of the curve*γ*_{II}(*α*) with respect to the curve*γ*_{0}(*α*). Simplifying the equation*γ*_{0}(*α*)=*γ*_{II}(*α*), we obtain the same cubic equation for*α*as equation (4.4). Hence, the three curves*γ*_{0}(*α*),*γ*_{II}(*α*) and*γ*_{III}(*α*) intersect in a single point*α*_{t}, if we consider the semi-open subinterval (see figure 4*b*).As

*γ*_{0}(*α*)≤*γ*_{II}(*α*) for , the upper bound in the inequality (3.8) should be replaced with*γ*_{I}(*α*)≤*γ*(*α*)≤*γ*_{0}(*α*). At the same time, the lower bound in (3.8) requires the restriction*α*≥*α*_{I}, which does not hold for , and therefore the condition (3.10) should be considered. Hence, the inequality (3.8) is transformed as 0≤*γ*(*α*)≤*γ*_{0}(*α*), if there is one grid point in the subdomain [*x*_{I},*x*_{II}]. This means the integration error is*e*≤*τ*_{0}for the grid step size*h*<*α*_{t}*δ*. As the admissible range becomes 0≤*γ*(*α*)≤*γ*_{0}(*α*), the probability is*p*_{1}(*α*)=2*γ*_{0}(*α*). The resulting probability is*p*(*α*)_{theor}=*p*_{1}(*α*)+*p*_{2}(*α*)=1, and the condition*e*≤*τ*_{0}holds for any*γ*if the grid step size is*h*=*αδ*, where for the fixed peak width*δ*.—

*α*∈(*α*_{t},*α*_{I}], where for (see §3). From condition (3.10), we now have*p*_{1}(*α*)=2*γ*_{II}(*α*) for*γ*∈[0,*γ*_{0}) (one node within the peak subdomain) and*p*_{2}(*α*)=1−2*γ*_{III}(*α*) if (two nodes within the peak subdomain). The resulting probability is*p*(*α*)_{theor}=*p*_{1}(*α*)+*p*_{2}(*α*)<1.—

*α*∈(*α*_{I},*α*_{3}], where we have*α*_{3}≈0.8376 from (4.3) for . For this range of*α*, we have*p*_{1}(*α*)=2(*γ*_{II}(*α*)−*γ*_{I}(*α*)), as the inequality (3.8) now holds for any*γ*∈[0,*γ*_{0}) (one node within the peak subdomain). We also have*p*_{2}(*α*)=1−2*γ*_{III}(*α*), , if two nodes fall within the peak. The probability*p*(*α*)_{theor}=*p*_{1}(*α*)+*p*_{2}(*α*)<1.—

*α*>*α*_{3}. We compute the probability*p*as*p*(*α*)_{theor}=*p*_{1}(*α*)=2(*γ*_{II}(*α*)−*γ*_{I}(*α*)) (see also §3). The probability*p*_{2}=0 because of the restriction required for the inequality when .

The function *p*(*α*)_{theor} is shown in figure 4*b*. For the fixed width *δ* of the peak, the parameter *α*_{t} is the threshold value of the grid step size that provides the transition from ultra-coarse grids to coarse grids. On any grid with *α*≤*α*_{t} (domain *D*_{1} in figure 4*a*), the error (2.5) is deterministic in the sense that the condition *e*≤*τ*_{0} always holds. On ultra-coarse grids where *α*>*α*_{t} (domain *D*_{2} in figure 4*a*), the error (2.5) is a random variable as the probability of getting the accurate answer is *p*<1.

Below, we summarize the main results of §4.

— We have increased the number of nodes on a uniform grid, so that either one or two grid nodes are available for approximating the peak function. For the parametrization introduced in §3, this means the consideration of the case . We have then found the threshold number

*N*_{t}of grid nodes such that the integration error*e*is always*e*≤*τ*, if the number of grid nodes*N*≥*N*_{t}. In other words, the probability of obtaining the desirable accuracy of integration is always*p*=1 when we approach the threshold*N*_{t}.— It has been shown that the grid step size

*h*_{t}that corresponds to the threshold number*N*_{t}on a uniform grid is a linear function of the peak width*δ*, that is,*h*_{t}=*α*_{t}*δ*, where*α*_{t}depends on the chosen tolerance*τ*only. The coefficient*α*_{t}has been computed for the tolerance*τ*_{0}=0.25 as*α*_{t}≈0.8090.

Let us note that the introduction of ultra-coarse grids extends the existing grid classification. Namely, we consider fine grids as grids where one can rely upon asymptotic error estimates (usually in the form *e*=*O*(*h*^{k}), e.g. [21]). We can now refer to coarse grids as grids where the asymptotic error estimates may not be applied, but where the probability of obtaining the accuracy goal is *p*=1. Finally, ultra-coarse grids are grids where the error is a random variable with high magnitude and the probability of obtaining the desirable accuracy is *p*<1.

The results above have been obtained under the assumption of a quadratic approximation of the integrand function. Hence, numerical investigation of the problem is required when a peak function has a shape different from quadratic in order to verify our findings. This will be carried out in §5.

## 5. Numerical examples

In this section, we first consider several standard test cases to illustrate our approach to numerical integration on ultra-coarse grids. We then turn our attention to ecologically meaningful density distributions and discuss how the theoretical predictions of §4 work for them.

### (a) Standard test cases

We begin our consideration with a quadratic function (3.1), as our first test case is to verify the probability estimate *p*(*h*)_{theor} derived in §4. Let us fix the peak width *δ* and consider the peak location *x** as a random variable that is uniformly distributed over the interval [*δ*,1−*δ*], as we require that the entire peak is stationed within the unit interval [0,1]. In our test, we provide *n*_{r}=10^{4} realizations of the random variable *x** on a grid with the fixed number *N*_{l} of nodes and compute the integral error (2.5) where we integrate the function (3.1) by the midpoint rule for each realization of *x**. The probability *p*(*h*_{l})_{num} of accurate numerical integration is computed as
5.1where *h*_{l}=1/(*N*_{l}−1) is the grid step size on a grid of *N*_{l} nodes and is the number of realizations for which the integration error is *e*≤*τ*_{0}, . We then increase the number of grid nodes as *N*_{l+1}=*N*_{l}+1 and repeat computation (5.1). We stop when the number *N*_{L} of grid nodes results in a grid with the grid step size *h*_{L}≤*δ*/2.

The quadratic function (3.1) is shown in figure 5*a* for the peak width *δ*=0.06. The probability *p*(*h*)_{num} of accurate numerical integration (2.6) of the function (3.1) is shown in figure 5*b*. We start from the grid of *N*_{1}=5 nodes and end our computations on the grid of *N*_{18}=22 nodes where the condition (4.1) still holds. It can be seen from the figure that all values of the probability *p*(*h*_{l}), *l*=1,…,18, computed by direct evaluation (5.1) lie very close to the theoretical curve *p*(*h*)_{theor}, shown as a dashed curve in the figure.

The results of §§3 and 4 are further illustrated in figure 5*c* where the integration error (2.5) is computed for the function (3.1) on an ultra-coarse grid and a coarse grid. The computation resulting in the graphs shown in figure 5*c* is similar to the test case discussed in §2 (cf. figure 1*c*). Namely, we randomly move the peak (3.1) 10 times (*n*_{r}=10) on a grid with a fixed number of nodes and compute the integration error every time the peak is moved. It can be seen from the figure the error (2.5) depends on the peak location when we integrate the function (3.1) on an ultra-coarse grid with the number of grid nodes *N*=18 (i.e. the grid step size is ). The probability of achieving an accurate answer (2.6) is *p*≈0.3 (see the graph in figure 5*b*). Meanwhile, the error is deterministic on a grid of *N*=25 nodes () and the error is *e*≤*τ*_{0}, no matter where the peak is located.

We now consider several peak functions different from the quadratic function (3.1) in order to understand how the probability estimate obtained for (3.1) will work for them.

A cubic function
5.2presents an interesting test case because the peak is now asymmetric. The function (5.2) is shown in figure 6*a*, where we still keep the peak width *δ*=0.06 for *A*=3×10^{4}. The probability graph for the function (5.2) is shown in figure 6*b*. The graph *p*(*h*)_{num} is in a good agreement with the curve *p*(*h*)_{theor} on very coarse grids (*α*>*α*_{3}) where the probability of accurate integration is small. At the same time, the actual probability in the transition layer is smaller than the probability estimate based on the quadratic approximation. One important observation about the graph *p*(*h*)_{num} of figure 6*b* is that the grid step size *h*^{num}_{t} for which the error (2.5) becomes deterministic (i.e. *p*(*h*^{num}_{t})=1 and *p*(*h*)<1 for any *h*>*h*^{num}_{t}) is smaller than the theoretical estimate
5.3obtained for the quadratic approximation of the integrand function.

The normal distribution (2.2) already discussed in §2 gives us an example of a peak function that is different from zero everywhere in the domain *x*∈[0,1] (see figure 6*c*). However, when we integrate the density function (2.2), 99.7 per cent of the mass will be concentrated within the peak of width *δ*=6*σ*, and we can therefore expect a random integration error when we integrate the normal distribution on ultra-coarse grids. The graph *p*(*h*)_{num} computed for the function (2.2) with peak width *δ*=0.06 is shown in figure 6*d*. It can be seen from the figure that the entire probability graph *p*(*h*)_{num} is now shifted with respect to the curve *p*(*h*)_{theor}. The maximum deviation is *d*_{p}=0.8129, and the threshold value of the grid step size is *h*^{num}_{t}≈0.5*δ* instead of *h*_{t}≈0.8*δ* obtained for the quadratic function.

A smaller threshold grid step size *h*^{num}_{t} is a consequence of the interpolation error *e*_{int}=*O*(*δ*^{3}) that we introduce when we approximate the integrand function by a quadratic polynomial. As the interpolation error depends on the peak width *δ*, our probability estimate *p*(*h*)_{theor} should be accurate enough when *δ* is small (a narrow peak), and it will differ from the actual probability *p*(*h*)_{num} when we increase the peak width *δ*. Below, we study this dependence in more detail by considering the Lorentz distribution—another peak function that often appears in various fields of physics and interdisciplinary research [27],
5.4

Let us vary the peak width in (5.4) and compute the probability graph *p*(*h*)_{num} for each fixed value *δ* by employing the procedure (5.1). A family of probability curves is shown in figure 7 for the function (5.4) with the peak width *δ*=0.06 (a baseline peak), the peak width *δ*=0.1 (a wide peak) and the peak width *δ*=0.01 (a narrow peak). For the sake of convenience, we show the ‘transition layer’ in figure 7*a*, while the ‘tails’ of the probability curves are shown in figure 7*b*. It can be seen from the figure that the computed probability curve *p*(*h*)_{num} gets closer to the probability graph *p*(*h*)_{theor}^{1} when we decrease the peak width *δ*. From the ecological viewpoint, the most important conclusion we derive from the consideration of the probability graphs shown in figure 7 is that the difference between the predicted value (5.3) and the actual threshold value obtained by direct computation decreases when we decrease *δ*. In other words, we can rely upon the estimate (5.3) to determine how many grid nodes are required to guarantee the accuracy (2.6) when a narrow peak is integrated. Thus, our next test is to compute the function and to compare it with the theoretical estimate (5.3).

The results of the computation of *h*^{num}_{t} for the function (5.4) where the peak width *δ* varies are shown in figure 7*c*. The value *h*^{num}_{t} is computed for each fixed peak width *δ* from the condition that *p*(*h*^{num}_{t})=1 and *p*(*h*)<1 if *h*>*h*^{num}_{t}. The computed threshold grid step size is in very good agreement with the estimate (5.3) for narrow peaks (*δ*<0.1). At the same time, the theoretical curve (5.3) lies above the actual values when wide peaks are considered and therefore the estimate (5.3) can be used in ecological problems as the upper bound for the threshold value *h*_{t}.

The standard numerical test cases discussed in this section confirmed the following.

— Replacing the integrand function by a quadratic polynomial can be considered a reliable approximation when the probability of accurate integration is computed.

— The formula (5.3) gives us a good estimate of the threshold grid step size

*h*_{t}when a narrow peak is integrated.— The same formula (5.3) can be used as the upper bound for

*h*_{t}if a wide peak is considered.

Once the estimate (5.3) of the threshold grid step size *h*_{t} has been verified for standard numerical test cases, our next goal is to check how formula (5.3) will work when ecologically meaningful peak functions are considered. This will be carried out in §5*b*.

### (b) Ecological test cases

We use the pest density distributions *u*(*x*) generated from the spatially explicit predator–prey model describing spatiotemporal dynamics of a pest insect population [28,29],
5.5The model (5.5) presents a system of coupled diffusion–reaction equations written in dimensionless variables. The functions *u*(*x*,*t*) and *v*(*x*,*t*) are the densities of the pest insect (considered as the prey) and its consumer (the predator), respectively, at time *t*>0 and position *x*. Coefficient *d* describes species diffusivity due to the movement of the individuals, *p* is the half-saturation prey density, *k* is the food assimilation efficiency coefficient and *m* is the predator mortality.

Ideally, our results on the accuracy of integration on ultra-coarse grids should be checked against appropriate ecological data. However, one serious obstacle in the problem is that our further discussion will require the handling of data on a sequence of refined grids, and it is not possible to fulfil this requirement when field data are considered. Meanwhile the model (5.5) and its two-dimensional counterpart [28] have been validated against experimental data to demonstrate that the model is in good agreement with the results of field measurements [5]. Hence, we believe the model (5.5) can be used for the generation of ecologically realistic data.

Equations (5.5) provide a rich variety of spatiotemporal distributions of the pest density function *u*(*x*,*t*). In particular, it is well known [5] that the properties of the spatial distribution *u*(*x*) considered at a given time *t* are determined by the diffusion *d* in the problem and that the initial conditions *u*(*x*,0), *v*(*x*,0) can evolve into the one-peak spatial pattern if the diffusion is *d*≪1 (see [30] for a more detailed discussion on the pattern formation). Several examples of one-peak density distributions are shown in figure 8, where the functions *u*_{1}(*x*), *u*_{2}(*x*) and *u*_{3}(*x*) have been obtained from the numerical solution of equation (5.5) with the boundary conditions d*u*/d*x*=0 at *x*=0 and *x*=1 for the diffusion coefficient *d*=10^{−4}, *d*=10^{−5} and *d*=10^{−6}, respectively.^{2}

One important observation that can be made from figure 8 is that the peak width *δ* depends on the diffusion coefficient *d*. A simple estimate of the function *δ*(*d*) discussed in [20] can be written as
5.6where the coefficient *ω* depends on the system's parameters. An extensive numerical study performed in [30,31] revealed that, in the predator–prey system (5.5), the value *ω* is relatively robust to changes in the parameter values, and can typically be considered as *ω*≈25. Hence, we can evaluate the upper bound for the threshold grid step size *h*_{t} as
5.7where the coefficient *C*=*α*_{t}*ω*.

Consider the function *u*_{1}(*x*) shown in figure 8*a*. The diffusion coefficient is *d*=10^{−4}, hence the estimate (5.7) gives us the grid step size as *h*_{t}≈0.2. The corresponding number of grid nodes is *N*_{t}≈6.

Let us compute the integration error (2.5) by the midpoint rule on a sequence of refined grids, where we add just one node to the grid at each consecutive refinement step. The results of numerical integration of the function *u*_{1}(*x*) are shown in table 1.

It can be seen from the table that our estimate of the threshold grid step size is in surprisingly good agreement with the results of computation. The actual threshold number obtained in computations is *N*_{t}=6 and, while the integration error keeps oscillating on grids with *N*>*N*_{t} nodes, the condition (2.6) always holds on those grids.

Let us repeat the previous computational procedure for the function *u*_{2}(*x*) shown in figure 8*b*. The diffusion coefficient *d*=10^{−5}, for which the density distribution *u*_{2}(*x*) has been generated, gives us the lower bound for the threshold number of nodes estimated as *N*_{t}≈17. The results of numerical integration of the function *u*_{2}(*x*) are shown in table 2 where the threshold number obtained in computations is *N*_{t}=21. This number of traps is not realistic in real-life computations, and other factors should be taken into account when a decision is made on the number of traps to be installed. One such factor that will be considered in future work is that the integral value depends obviously on the peak width, and the true value of the integral can be small if a narrow peak is considered. A small value of the integral (i.e. a small number of pest insects) means, in turn, that the risk of making a wrong decision about pesticide application is low. Hence, we can increase the tolerance *τ*_{0} in the problem (e.g. *τ*_{0}=0.5 or even *τ*_{0}=1 in some cases) and use the technique discussed in §§3 and 4 in order to re-compute our estimate of the threshold number *N*_{t} for the new tolerance.

Finally, let us have a look at a very narrow peak *u*_{3}(*x*) shown in figure 8*c*. The diffusion coefficient *d*=10^{−6} gives us the estimate *N*_{t}≈51. Clearly, this number of traps is beyond any reasonable range and cannot be used in real-life applications. However, as the integral is very small (for the density distribution *u*_{3}(*x*), we have ), our recommendation to ecologists would probably be not to take any action until the time evolves and the peak function gets wider. On the other hand, an important conclusion derived from the extreme case of the distribution *u*_{3}(*x*) is that installing a reasonable number of traps (e.g. *N*∼10) with the aim of trying to evaluate the total number of insects at a very early stage of the biological invasion is senseless, as the probability of a correct answer is very small (see the discussion of the case *h*>*δ* in §3).

## 6. Conclusions

We have considered a problem of numerical integration for high aggregation density distributions (peak functions) on coarse uniform grids. Our study has originally been prompted by the needs of ecological monitoring and control where minimization of the number of measurements made in order to accurately reconstruct a density distribution remains an important problem.

The main results of the paper are as follows.

— We have introduced the concept of ultra-coarse grids. An ultra-coarse grid is defined as a grid where the integration error is a random variable with high magnitude because of the insufficient information (uncertainty) about the integrand function.

— The definition of an ultra-coarse grid implies that we have to evaluate the probability

*p*of achieving an integration error smaller than the given tolerance rather than to evaluate the error itself. In our paper, we have obtained a probability estimate based on a quadratic approximation of the integrand function. We then carried out a number of numerical test cases to demonstrate that our probability estimate gives us a reliable upper bound for the threshold grid step size*h*_{t}for which the probability*p*(*h*_{t})=1. Thus, the approach suggested in the paper can be used to evaluate the minimum number*N*_{t}, such that the desirable accuracy of integration is guaranteed on a grid of*N*_{t}nodes.— It has been shown in the paper that conventional methods of error control and analysis do not work on ultra-coarse grids. It has been discussed in §2 that the uniform refinement of an ultra-coarse grid does not reduce the integration error, as the integration error on a refined grid can be even bigger than the error on the original grid unless the number

*N*_{t}of nodes is reached. However, increasing the number*N*of grid nodes does increase the probability of an accurate integral estimate and is therefore desirable.

A general remark should be made here with regard to whether the information about the population density at a given time and location can be adequately obtained from trap counts at all, so that the approach developed in our paper can be applied. Indeed, as the traps are emptied only once in a while (e.g. weekly), the question is how much the population density distribution can possibly change over this time. The answer to this question depends, to a certain extent, on the biological and behavioural traits of the monitored species. However, we mention here that the spatial scale of variations in the population density distribution for walking insects, i.e. those that are usually sampled with pitfall traps, is known to be 30–40 m [32]. Meanwhile, typical dispersal distances for walking insects are estimated to be 1 m or less per day [33], which obviously corresponds to a diffusivity of the order of 1 m^{2} day^{−1}. Over 1 week of trap exposure, that results in a diffusion length of m, which is about one order of magnitude less than the spatial scale of inherent variation.

One important direction of future research is to investigate the transition from ultra-coarse to coarse grids for other classes of functions. Peak functions present just one class of density distributions, and a similar problem of accurate integral evaluation arises when other spatially heterogeneous functions are considered. In particular, we are interested in rapidly oscillating functions that often appear in ecological applications. Also, our future work will be to extend our results to the two-dimensional case, as we want to apply our approach to real-life problems. Since field measurements are carried out on a regular basis in ecological applications, heuristic estimates are available for many common pest species distributions and practical recommendations with regard to the number *N*_{t} of traps that have already been obtained based on these heuristic estimates. We intend to check those recommendations against a theoretical probability estimate of the threshold number *N*_{t} after our evaluation technique is extended to two-dimensional problems.

The examples of integration of ecologically meaningful distributions we discussed in the paper can only be considered as the very first steps on our way to design a robust procedure for risk evaluation in agricultural applications. At the same time, our results demonstrate that a conventional approach that ecologists use in their problems has to be revisited. A standard procedure of the risk evaluation in pest management is to compare an estimate of the total number of pest insects with a certain critical number and to make a decision about the use of pesticides based on that comparison. Ecologists readily admit that there is uncertainty surrounding the estimation of the pest abundance, which may become worse as the number of samples decreases [34]. However, to the best of our knowledge, the error in the pest abundance estimation has never been considered in the ecological literature as a random variable in order to take into account the risk factor related to the uncertainty in integral evaluation when the number *N* of traps is small. Taking this risk factor into account when designing an appropriate methodology for decision making in pest insects management should constitute a topic of our future work.

## Footnotes

↵1 We consider the theoretical curves

*p*(*h*)_{theor}instead of a single curve*p*(*α*)_{theor}in order to be able to compare them with the results of direct computation for each value of*δ*.↵2 The other parameters of the one-dimensional system of equations as well as various initial conditions applied in numerical solution are discussed in detail in [19].

- Received November 6, 2012.
- Accepted January 31, 2013.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.