## Abstract

Contrary to what is commonly thought, it is possible to obtain convergent results with negative (rather than positive) penalty functions. This has been shown and proven on various occasions for vibration analysis, but in this contribution it will also be shown and proven for systems of linear equations subjected to one or more constraints. As a key ingredient in the developed arguments, a pseudo-force is identified as the derivative of the constrained degree of freedom with respect to the inverse of the penalty parameter. Since this pseudo-force can be proven to be constant for large absolute values of the penalty parameter, it follows that the exact solution is bounded by the results obtained with negative and positive penalty parameters. The mathematical proofs are presented and two examples are shown to illustrate the principles.

## 1. Introduction

In many applications of engineering, the following two ingredients are encountered: a linear system of equations and a number of constraints. The linear system of equations is often derived as the spatial discretization of one or more partial differential equations (p.d.e.), possibly linearized in case the p.d.e. are nonlinear. Popular discretization methods are the finite-difference method, the finite volume method and the finite element method. The p.d.e. are usually subjected to constraints, such as Dirichlet conditions or links between various degrees of freedom (d.o.f.) of the discretized system.

Several methods exist to account for the constraint relations (Barlow 1982). A *direct imposition* of the constraint can be obtained if a suitable transformation of the system of equations is performed, i.e. by moving terms from the left-hand side to the right-hand side of the system of equations. The *Lagrange multiplier* method adds unknowns to the system of equations but offers flexibility in handling nonlinear constraints (Gutiérrez & Askes 2002). An approximate imposition of the constraints can also be obtained through penalty functions. Although the latter method is less accurate than direct imposition or Lagrange multipliers (that is, the constraint equation is not enforced identically, but rather in an approximate sense), advantages of the penalty function method are that the size of the system matrix is maintained and multiple constraints acting on one node can be handled. In the sequel, we will concentrate on penalty functions used to enforce constraints on a linear system of equations.

Imposing a constraint through penalty functions implies that the constraint equation is multiplied with a user-defined number (known as the penalty parameter), and that the resulting terms are added to the system of equations. The penalty parameter is normally taken as a large positive number in order to ensure positive-definiteness of the resulting system of equations (Bathe 1996; Hughes 2000; Zienkiewicz & Taylor 2000). Indeed, an exact enforcement of the constraint equation can be obtained asymptotically by letting the penalty parameter approach infinity. Since this is not feasible in practice, it is normally ensured that the penalty parameter is much larger than the diagonal matrix component that corresponds to the constrained d.o.f. (Bathe 1996). However, too large values of the penalty parameter destroy the conditioning of the matrix that may, in turn, prohibit the finding of the solution to the system. Moreover, the various diagonal entries of the matrix may differ significantly in magnitude, thus precluding a re-scaling of the system equations. Thus, it can be a difficult task to find a proper value for the penalty parameter that is (i) much larger than the associated diagonal matrix entry, yet (ii) not too large to cause numerical problems due to the limitation of the machine precision of the computer.

Positive-definiteness of the system matrix is the prerequisite for many proofs of stability and uniqueness; therefore the desired positive-definiteness has always dictated that *positive* penalty parameters are taken. However, it has recently been shown and proven that convergent results can also be obtained by taking *negative* values for the penalty parameter—the term ‘convergent’ implying here that for increasing (absolute) values of the penalty parameters the results converge towards the exact results obtained with the direct imposition method.

We use the term ‘exact’ here to indicate that the constraints are fulfilled identically. Penalty functions provide an approximation to the exact fulfilment of the constraints. In case the linear system of equations is obtained via discretization of the p.d.e., another approximation is involved in going from the (exact) continuum formulation to the (approximate) discrete form. However, the latter distinction between ‘exact’ and ‘approximate’ will not be considered in this study.

In a number of applications, Ilanko has demonstrated the versatility of using negative penalties (Ilanko 2002*a*,*b*, 2003, 2005; Ilanko & Tucker 2005). The two main observations of these works as regards convergence of the results are:

by letting the absolute value of the penalty parameter approach infinity, the exact solution is retrieved with positive as well as with negative penalties;

the exact solution is bracketed by the results obtained with positive and with negative penalties.

These findings have also been confirmed for a simple boundary-value problem using a meshless discretization method (Askes & Pannachet 2005). Convergence and boundedness theorems are available for the solution of eigenvalue equations depicting structural vibration and buckling problems (Ilanko 2002*a*, 2003). Proofs have also been published recently to show that the deflection profile of linear elastic structures subject to positive and negative penalty functions also converge to those of the constrained structure (Ilanko 2005). However, as acknowledged in Ilanko (2005), a rigorous proof for the *bounding nature* of the penalized solution is still lacking. This, then, is the purpose of the present contribution, whereby the focus is limited to linear systems of equations.

## 2. Constrained systems of equations

We will consider a system of linear equations written as(2.1)Adopting the terminology of structural analysis, is the stiffness matrix, are the nodal d.o.f. (e.g. displacements and/or rotations) and is the force vector. System (2.1) is accompanied by a set of constraints (for instance Dirichlet conditions) such that exists. The constraints are enforced by means of penalty functions, by which the modified version of equation (2.1) reads(2.2)For clarity of the presentation, we will initially assume a single constraint (the more general case with multiple constraints is addressed in §2*d*). The constraint is assumed to act on the *n*th d.o.f. and is written as(2.3)with a user-defined value such as a prescribed displacement. In that case,(2.4)while all other entries of and are equal to zero. In equation (2.4) and hereafter, the scalar *α* denotes the penalty parameter.

### (a) Convergence of the constrained degree of freedom

Next, the *n*th equation of system (2.2) is written out in full:(2.5)which can be re-ordered as(2.6)where is defined as the force conjugate to the constraint. The existence of in equation (2.1) implies that the left-hand side of equation (2.6) is definite. Therefore, the right-hand side must have a limit as , that is(2.7)where may be recognized as the force conjugate to the constraint given (2.3) in case this constraint is enforced identically.

From equation (2.5), can be solved as(2.8)To demonstrate that both positive penalties and negative penalties lead asymptotically to the exact solution, it is convenient to consider the results not as a function of the penalty parameter *α* itself but rather as a function of its inverse *α*^{−1}, as was recently argued for the concept of artificial inertia (Williams & Ilanko in press). In other words, the convergence of positive penalties is verified for , whereas the convergence of negative penalties is verified for . From equation (2.8) it follows that(2.9)which serves to verify that(2.10)

### (b) Boundedness of the constrained degree of freedom

It remains to be proven that the exact solution is bracketed by the results obtained with positive and negative penalties. To this end, the *slope* of the curve versus is investigated: a *constant slope* around indicates that the results converge from *opposite sides* for positive and negative penalties. The total derivative of equation (2.5) with respect to is obtained as(2.11)Dividing by *α* and substituting equation (2.6), this expression can be rewritten as(2.12)Clearly, the right-hand side of equation (2.12) expresses a quantity of the force-type, hence the derivative on the left-hand side can be understood as a pseudo-force.

For it follows from equation (2.12) that the slope of the versus diagram approaches the force which is conjugated to the constraint. With equation (2.7), the asymptotic counterpart of equation (2.12) can be elaborated as(2.13)Since is constant and finite, the slope of the versus curve is constant and finite around . Hence, it follows that as the magnitude of the penalty parameter approaches infinity, the results approach the exact solution from opposite directions. This in turn implies that the results for can be retrieved by taking a finite negative value and a finite positive value for , the results of which can be interpolated to obtain the results for .

### (c) Convergence and boundedness of other degrees of freedom

It is also of importance to verify whether the same observations hold for the other degrees of freedom (d.o.f.). While the constraint is still imposed on the *n*th d.o.f., the *m*th equation of system (2.2) is considered, assuming :(2.14)For the d.o.f. it follows that(2.15)Note that the penalty parameter *α* does not appear in this expression, therefore it trivially follows that(2.16)which proves convergence of for both signs of the penalty parameter.

For proofs of boundedness, the derivative of equation (2.14) with respect to is considered:(2.17)Next, the limit case that is taken, that is(2.18)where we have substituted equation (2.13). Since the right-hand side of equation (2.18) is independent of *α*, the sum on the left-hand side must also be independent of *α*. This situation occurs in all equations of the system. If all combinations of summed terms are to be independent of *α*, each *individual* derivative must be independent of *α*. Hence, each derivative is constant for .

In contrast to equation (2.13), the pseudo-force that is cannot be expressed in closed-form unless the system matrix is explicitly inverted. However, for bracketing the exact solution it is sufficient to establish that is constant.

### (d) Multiple constraints

In general, linear systems of equations are subjected to multiple constraints, which means that more than one equation in expression (2.2) contains terms including a penalty parameter. However, this does not influence our proofs:

Even if other d.o.f. are penalized, they would still enter the

*n*th equation without a penalty term. Therefore, they would enter the summation on the right-hand side of equation (2.9) and not affect convergence of .For the same reason, the other penalized terms would enter the summation on the right-hand side of equation (2.12) and not affect the boundedness of .

Equation (2.15) does not depend on the number of penalized d.o.f., therefore convergence of the unconstrained d.o.f. is not affected.

All constrained d.o.f. would move to the right-hand side of equation (2.18), thereby generating a summation of stiffnesses×constraint forces. However, this summation would still be constant, thus the boundedness of unconstrained d.o.f. is not affected.

## 3. Critical values for the penalty parameter

In the above-mentioned proofs it was assumed that all exist. In the earlier works of Ilanko (Ilanko 2005; Ilanko & Tucker 2005), existence of has been shown to hinge on the magnitude of the penalty parameter being larger than the magnitude of the so-called *critical penalty parameter*, denoted here as . The critical penalty parameter denotes the value of *α* beyond which convergence of the results are monotonic. The question remains how to quantify the critical penalty parameter for linear systems of equations.

Returning to equation (2.9), a first estimate for the critical penalty parameter would be to take the negative diagonal entry of the stiffness matrix that corresponds to the constrained d.o.f. This would be correct for a single d.o.f. system, where no other stiffness contributions affect the constrained d.o.f. . Indeed, can be obtained by setting all other d.o.f. to zero, which eliminates the interaction between d.o.f. and renders the system overly stiff. However, the effective stiffness for is *lowered* once the interaction with other d.o.f. is accounted for. This also implies that the actual value of is lowered through the stiffness contributions of the other d.o.f., but as a bound one can still use the diagonal entry:(3.1)Therefore, a safe choice for negative values of the penalty parameter is . Closed-form expressions (rather than bounds) for the critical penalty parameter are not available unless the stiffness matrix is inverted in symbolic form.

To establish that is being bounded from below by , we have used the basic principles from structural analysis. However, this need not be valid for other linear systems of equations. Alternative physical principles may then be required to find an estimate for .

## 4. Examples

The developed concepts will be illustrated analytically by means of an Euler–Bernoulli beam and numerically by means of a plane truss.

### (a) Euler–Bernoulli beam

The first example concerns an Euler–Bernoulli beam and is depicted in figure 1. The beam stiffness is *EI* and its length equals *L*. A force *F* is acting at the centre of the beam. At the left end, the beam is clamped, whereas a spring with stiffness *α* is placed at the right end in order to model a pin support. The nodes are numbered from left to right, and the considered d.o.f. are the deflection *w* and the rotation *θ* for each node.

Within the indicated coordinate system, the system of equilibrium equations (after accounting for the encastre at node 1) reads(4.1)Via some straightforward algebra, the d.o.f. can be elaborated as(4.2)where is the penalty parameter normalized with respect to the other stiffness parameters.

In figure 2, the results of equation (4.2) are plotted for a range of positive and negative values of the inverse normalized penalty parameter . The deflections are normalized with and the rotations with . As can be verified, all four curves are continuous around the asymptotic values obtained with . Moreover, the slopes of all four curves attain constant values around . This demonstrates that the results obtained with positive and negative penalty parameters both converge towards the exact solution, and that the exact solution is bracketed by the results of positive and negative penalties.

Analytical values for the slopes around can be obtained by elaborating the derivatives of the d.o.f. with respect to the inverse penalty parameter , e.g. for the constrained d.o.f. :(4.3)The factor is the asymptotic value of the spring force, i.e. the reaction force that would appear if the springs were replaced by a pin support. This is indeed the value of the slope of the solid line in figure 2 at .

Since in this particular example the stiffness matrix has been inverted in symbolic form, all d.o.f. are obtained as closed-form expressions in terms of the penalty parameter. Hence, it is possible to find analytical solutions for the slopes , and straightforwardly from equation (4.2). However, this will in general not be possible as explained in §2*c*.

Similarly, the followed analytical solution method enables identification of the critical normalized penalty parameter as . Comparing this value with the diagonal entry from equation (4.1), it can be verified that these values are in accordance with equation (3.1).

### (b) Plane truss

The second example concerns a plane truss as depicted in figure 3. All members have a cross-sectional area *A*=25×10^{−4} m^{2} and a Young's modulus *E*=200×10^{9} N m^{−2}. The dimensions of the truss are set by *L*=2 m, and the applied load *F*=10×10^{3} N. Two sets of methods to model the constraints have been considered. Firstly, the lower pin was modelled with penalty functions in both directions while the upper pin was accounted for via the direct imposition method. In the second analysis, both pins were modelled with penalty functions in both directions.

Figure 4 shows the horizontal displacement of the lower pin for a range of penalty parameters. Note that the displacement has been normalized with and the penalty parameter *α* with . For this d.o.f., there is no difference between the two ways to model the upper pin since the force conjugate to the constraint is also the reaction force which must satisfy global equilibrium. A simple inspection of equilibrium reveals that the force conjugated to the horizontal displacement in the lower pin equals 3*F*=30×10^{3} N. After normalization, this is indeed equal to the slope of the curve at .

In addition, the vertical displacement at the tip (i.e. where the force *F* is applied) has also been analysed. The results are plotted in figure 5, and for this d.o.f., there is a difference between the two ways to model the upper pin. The value of this particular d.o.f. is set by the way the constraints are imposed. If the upper pin is modelled with the direct imposition method, the results converge faster towards the asymptotic solution. However, either way the results converge for positive and negative values of the penalty parameter, and the asymptotic results are bracketed by the two ranges of penalty parameters.

## 5. Discussion

In this contribution, we have proven that convergent results can be obtained for linear systems of equations with negative penalty functions as well as with positive penalty functions. Moreover, the exact solutions are bounded from opposite sides by the results obtained with positive and negative penalties. This can be of use for large-scale computations: if the conditioning of the matrix and the machine precision prohibit the use of *very large penalty parameters* (‘very large’ meaning here much larger than the corresponding entries of the stiffness matrix), an alternative technique can be used to enforce constraints by means of penalty methods. A *moderate* value for the penalty parameter can be selected (‘moderate’ meaning larger, but not much larger, than the corresponding entries of the stiffness matrix), after which two strategies are possible:

The analysis is carried out twice: once with the penalty parameter taken positive and once taken negative. An accurate estimate for the asymptotic results can be obtained by a

*linear interpolation*of the two sets of results in the d.o.f. versus inverse penalty parameter plots.The analysis is carried out by taking the penalty parameter either negative or positive. The asymptotic results can be obtained by a

*linear extrapolation*of the results in the d.o.f. versus inverse penalty parameter plots, whereby the force conjugated to the constraint can be used to estimate the slope in these plots.

In this study, the use of a *direct solver* has been assumed. At present, many robust direct solvers exist that can handle linear systems of equations that are not necessarily positive-definite. In contrast, most *iterative solvers* require that the system is positive-definite. However, the performance of iterative solvers is also sensitive to ill-conditioning of the matrix, hence the combination of penalty functions (be it positive or negative) with an iterative solver may not be the optimal choice.

## Acknowledgements

We are indebted to Professor Fred W. Williams of Cardiff University (UK) for suggesting checking the results against the *inverse* of the penalty parameter—this has been a key factor in developing our arguments.

## Footnotes

↵† Formerly at: Department of Mechanical Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand.

- Received September 22, 2005.
- Accepted March 16, 2006.

- © 2006 The Royal Society