## Abstract

We explore the mathematical and numerical aspects of reconstructing a potential energy profile of a molecular bond from its rupture time distribution. While reliable reconstruction of gross attributes, such as the height and the width of an energy barrier, can be easily extracted from a single first passage time (FPT) distribution, the reconstruction of finer structure is ill-conditioned. More careful analysis shows the existence of optimal bond potential amplitudes (represented by an effective Peclet number) and initial bond configurations that yield the most efficient numerical reconstruction of simple potentials. Furthermore, we show that reconstruction of more complex potentials containing multiple minima can be achieved by simultaneously using two or more measured FPT distributions, obtained under different physical conditions. For example, by changing the effective potential energy surface by known amounts, additional measured FPT distributions improve the reconstruction. We demonstrate the possibility of reconstructing potentials with multiple minima, motivate heuristic rules-of-thumb for optimizing the reconstruction, and discuss further applications and extensions.

## 1. Introduction

In many applications, one wishes to infer properties of a material or a process in an interior region of a sample not readily accessible to experimental probes. Examples of such inverse problems involving boundary data include radiological imaging, where radiation passing through tissues is detected outside the sample, electrical impedance tomography, where potentials are measured on the exterior of a body, and seismology, where reflected waves are measured at the earth’s surface. Such problems are often ill-conditioned: there may be several different interior structures that yield nearly the same measured boundary data.

One type of ‘boundary’ data that often arises in stochastic models is a first passage time distribution (FPTD), describing the probability of a random variable first reaching a particular value within a certain time window. Here, the boundary data are the probability fluxes out of the domain. Figure 1*a* shows individual trajectories of a one-dimensional stochastic process and their corresponding first passage times. The FPTD is shown in figure 1*b* along with its Laplace transform in the inset. These types of first passage problems arise in many biophysical contexts. For example, the voltage across a nerve cell membrane fluctuates due to noisy inputs from other neurons, and can be described by a biased random walk determined by a constitutive voltage–current relationship intrinsic to the cell (Tuckwell *et al*. 2003) When the fluctuating voltage exceeds a threshold, the potential rapidly spikes before resetting. The interspike times define the first passage times of the fluctuating voltage from which one might wish to reconstruct the neuron’s inherent current–voltage relationship.

Stochastic inverse problems are typically ill-posed: there may be several different interior structures that could yield identical or nearly identical measured boundary data. Nonetheless, for many physical systems, reconstruction of constitutive relations from measured data can be cast in Sturm–Liouville form with an unknown spatially dependent coefficient (McLaughlin 1986; Levitan 1987). Given the eigenvalues of the problem and assuming a symmetric coefficient function, its full reconstruction is unique (Borg 1946). However, one eigenvalue spectrum is insufficient to determine a general non-symmetric coefficient (Borg 1946). For stochastic problems, the spectrum of the corresponding Sturm–Liouville problem cannot be readily extracted from data and algorithms developed specifically for reconstruction through the eigenvalues (Rundell & Sacks 1992*a*,*b*; Brown *et al.* 2003) are of limited use in our stochastic problem. This motivates the development of new algorithms and techniques that deal directly with the boundary data.

In this paper, we investigate a stochastic inverse problem in the context of another system commonly encountered in biophysics: macromolecule unfolding and molecular adhesion bond rupturing. Macromolecular bond displacements are often described by a single bond coordinate, represented by a fluctuating Brownian ‘particle’ in a one-dimensional energy landscape. The metastable bond is considered to be broken the instant the bond coordinate reaches a critical extension. This problem is of great interest in single-molecule biophysics, particularly in the context of dynamic force spectroscopy (DFS; Evans *et al.* 1995). In DFS, a pulling force protocol is applied to the bond and the force at the instant of rupture is recorded. The mean rupture force by itself would give very little information about the molecular potential (Schlierf & Rief 2006) since many different potentials would yield the same mean rupture force. How much of the bond potential can be recovered from the measured rupture force *distribution*? All of the recent theoretical treatments of this problem have either analysed the forward problem (Heymann & Grübmuller 2000), used physical approximations to derive simple force-dependent and time-dependent dissociation rates (Bell 1978; Walton *et al.* 2008) and/or considered simple 2–3 parameter single minimum potentials (Hummer & Szabo 2003; Dudko *et al*. 2008; Freund 2009). The rate of force increase as a function of displacement (the rupture stiffness) has also been incorporated into a procedure to fit basic parameters of simple potentials (Fuhrmann *et al.* 2008). However, by imposing such simple two or three parameter forms for the reconstructed potential, one loses details such as multiple minima.

Here, we approach the inverse problem by allowing a wider class of potentials, including those with multiple minima. Within a class of potentials, we numerically determine the ones that best fit the entire measured FPTD. Although the difficulty of extracting eigenvalues from FPTD data is avoided, the inverse problem remains intrinsically ill-conditioned and it is not surprising that almost all studies have focused on reconstructing only two or three attributes of the stochastic process, typically, the energy barrier height and width. In §2, we formulate the problem through the backward equation of a Brownian process with a potential energy-derived drift. In §3, we decompose the drift function into basis functions and develop an iterative optimization procedure to find the coefficients of these basis functions. In §4, we show that using a single FPTD restricts the type of potentials we can reconstruct. We also show how the inverse problem can be optimized by tuning the amplitude of the unknown potential and the initial bond displacement. Another key finding is that multiple FPTDs greatly facilitate the reconstruction, allowing us to accurately determine potentials with multiple minima. We propose experimental protocols that can be used to generate these additional FPTDs. Finally in §5, we discuss limitations of our method and possible refinements.

## 2. Stochastic theory and the inverse problem

The general problem of stochastic bond rupturing is geometrically complex, particularly when considering deformations associated with large macromolecules carrying many degrees of freedom. Although in principle these systems can be modelled by stochastic processes in higher dimensions, for simplicity, we restrict our mathematical analysis to a one-dimensional Brownian motion described by a diffusivity *D*(*x*) and a convective drift −*D*(*x*)(*k*_{B}*T*)^{−1}(d*Φ*(*x*)/d*x*) proportional to the force derived from a time-independent molecular bond potential *Φ*(*x*), and to the mobility *D*(*x*)(*k*_{B}*T*)^{−1}. Although we restrict ourselves to time-independent potentials, corresponding to static forces, the rupture force distribution can be transformed into a first rupture time distribution (FPTD) in the quasi-adiabatic limit (Dudko *et al*. 2008). The continuous Brownian process can be described by the probability *P*(*y*,*t*|*x*) d*y* for the bond coordinate to be between positions *y* and *y*+d*y* at time *t*, given that it started at position *x* at initial time *t*=0. This probability density obeys the backward equation (Gardiner 2004)
2.1
Since the bond is irreversibly ruptured when stretched past a known position *y*=*L*, we impose the absorbing boundary condition *P*(*y*=*L*,*t*|*x*)=0. The bond survival probability at time *t*, given that it started initially at position *x* is found from integrating the probability density over all the final coordinates that an unruptured bond can take, i.e. . From *S*(*x*,*t*), we define the FPTD *w*(*x*,*t*)=−∂_{t}*S*(*x*,*t*), which obeys
2.2
subject to initial condition *w*(*x*,0)=0, and boundary conditions ∂_{x}*w*(*x*,*t*)|_{x=0}=0 and *w*(*x*=*L*,*t*)=*δ*(*t*). The condition ∂_{x}*w*(*x*,*t*)|_{x=0}=0 represents a reflecting boundary at *x*=0 and ensures a non-negative bond coordinate. Henceforth, we will only consider the problem in which the Brownian motion starts from a specific, known position *x*. Our analysis can be straightforwardly generalized to the case where a given initial distribution of starting positions is experimentally imposed.

In the forward problem, *Φ*(*x*) and *D*(*x*) are given and one solves equation (2.2) to find the function *w*(*x*,*t*), as shown in figure 2. Each fixed-*x* slice of the surface *w*(*x*,*t*) represents the FPTD for a particle that started the random walk at a specific position *x*.

In the inverse problem, the functions *Φ*(*x*) and *D*(*x*) are unknown and need to be determined from an experimentally measured or simulated FPTD and a known starting position *x*. In general, the unique pointwise reconstruction of both *D*(*x*) and *Φ*(*x*) from FPT data is impossible (Bal & Chou 2003). At best, only half of either *D*(*x*) or *Φ*(*x*) can be uniquely determined from a single FPTD (Chen *et al*. 1985; Bal & Chou 2003). It has been shown that if we assume *D* is a known constant, then *Φ*(*x*) is uniquely identifiable from a single FPTD provided it is already known over a certain interval within (0,1] (Bal & Chou 2003).

## 3. Reconstruction algorithm

Since in this problem, *Φ*(*x*) is not known on any requisite interval, it is not even clear whether *Φ*(*x*) can be uniquely reconstructed. Nevertheless in this section we express *Φ*(*x*) as a superposition of (smooth) basis functions and attempt to reconstruct the coefficients, deferring a rigorous analysis of our algorithm for a future publication. We find that representing *Φ*(*x*) using a relatively small number of basis functions renders the problem computationally tractable, yielding a unique solution in many cases.

Henceforth, we non-dimensionalize the problem by measuring distance in units of *L*, time in units of *L*^{2}/*D*, and the potential *Φ*(*x*) in units of the thermal energy *k*_{B}*T*. Finally, to avoid numerically representing the *δ*-function in the boundary condition *w*(1,*t*)=*δ*(*t*), we work with the Laplace transform , which obeys the infinite set (for each ) of uncoupled ODEs
3.1
subject to the associated Laplace-transformed boundary conditions and .

In equation (3.1), we have defined the dimensionless drift *U*(*x*)≡−d*Φ*(*x*)/d*x*. Equation (3.1) is a differential equation in the initial bond position for the Laplace-transformed rupture time distribution .

Mathematically, our objective is to reconstruct *U*(*x*) and then extract, modulo an irrelevant constant, the unknown potential *Φ*(*x*). However, to do so, we must choose a reduced representation of *U*(*x*) that renders, for a chosen value of *x*, and all , as close as possible to the measured or simulated Laplace-transformed FPTD . Since it is known that approximating a function using a basis of monomials leads to a very ill-conditioned problem (Keller & Isaacson 1975), we represent the convection in terms of orthonormal polynomials:
3.2
where {*a*_{i}}≡**a** is the vector of expansion coefficients, and the first few orthonormal basis functions are
3.3
Our method for reconstructing *U*(*x*) consists of using a spectral method (Trefethen 2000) to repeatedly solve the forward problem equation (3.1) to refine our estimate for the potential.^{1}

Starting with an initial guess for the drift (say, *U*(*x*)=0, the ‘null’ hypothesis), we solve equation (3.1) for many positive values of *s* to obtain a numerical approximation for . We then compute the ‘distance’ between the and the given data using the objective function^{2}
3.4
where *g*(*s*) is a function that weights FPT data differently for different *s*. By appropriately adjusting *U*(*x*), implemented through small changes in **a**, *Π*(**a**) is decreased. The incremental adjustments in **a** are repeated until *Π*(**a**) is minimized. Although many different algorithms can be used to minimize *Π*(**a**), we first consider *g*(*s*)=1 and choose a safe-guarded Newton strategy that relies essentially on computing the Hessian of *Π*(**a**). Details of the algorithm are described in appendix A.

## 4. Results and discussion

We test our algorithm and discuss reconstructing the drift function *U*(*x*) from (i) a single, perfectly measured distribution of rupturing times, and (ii) multiple perfectly measured distributions of rupturing times, realized under different experimental conditions. We first generate *perfect* ‘data’ by solving the forward problem using a hypothetical target potential function *Φ**(*x*) (and corresponding ). After generating the data , we pretend we did not know the coefficients **a***, and try to reconstruct them by minimizing *Π*(**a**) through successive iterations *k* of the numerical algorithm detailed in 5. Starting from an initial guess for **a**(*k*=0)=**0**, we investigate if and how **a**(*k*) approaches **a***, and the number of coefficients *a*_{i} that can be reliably reconstructed. Using a single FPTD, we find that reconstruction of *Φ**(*x*) is badly conditioned for , but that using two or more distinct FPTDs allows us to easily find five or more coefficients of *Φ**(*x*) in many cases.

### (a) Single measurement

We first assume a target potential corresponding to a target drift function *U**(*x*) parametrized by . Figure 3*a* shows that starting with the initial guess *U*(*x*)=0, minimizing the objective function equation (3.4) leads to accurate convergence to the unknown target drift *U**(*x*) within ∼10 iterations. We find in figure 3*b* that a five-parameter potential is typically a marginal case in that it can only be occasionally reconstructed, and only after a large number of iterations. However, we are typically not able to accurately reconstruct a potential with six parameters (see figure 3*c*), regardless of the number of iterations.

When the unknown target drift function *U**(*x*) is structurally more complex, extremely slow or non-convergence to **a*** arises because the curvature of *Π*(**a**) near the true minimum, in at least one direction, becomes extremely small. Since our minimization algorithm relies on essentially inverting the *n*×*n* Hessian matrix (see appendix A), the mathematical feasibility of the reconstruction is limited by its condition number *κ*≡*λ*_{max}/*λ*_{min}. Here, *λ*_{max} and *λ*_{min} are the largest and smallest eigenvalues of *H*, representing the largest and smallest curvatures of *Π*(**a**) at **a***, along the corresponding eigendirections, respectively. In addition to increasing the number of eigendirections, increasing *n* rapidly decreases, in particular, the *minimum* curvature *λ*_{min}, thereby increasing *κ*≡*λ*_{max}/*λ*_{min} and making the minimum in *Π*(**a**) harder to find. As suggested in appendix B, larger values of *i* and *j*, corresponding to more oscillatory basis functions, reduce the magnitude of *H*_{ij}. This property renders the problem badly conditioned and is the underlying mathematical reason for the difficulty of extracting more than three parameters from a given potential landscape. To explicitly illustrate the ill-conditioning of the problem, we plot in figure 4*a* the three-parameter objective function (with *g*(*s*)=1) as a function of the parameters *a*_{0} and *a*_{1}. Although the global minimum in *Π*(**a**) occurs at , it is clear that the curvature near the minimum is extremely small along at least one direction, making the minimum difficult to find numerically.

Since all three target potentials considered in figure 3 were chosen to have |**a***|^{2}=1, figure 3 fairly compares the reconstruction of different-shaped potential functions with equal amplitude While increasing the dimension *n* makes the problem more ill-conditioned, for fixed *n*, reconstruction efficiency may nonetheless depend on the typical magnitude of the potential to be reconstructed. To compare reconstructions of potentials of different expected magnitudes, we define the amplitude factor
4.1
for each target drift function. While the amplitude *A** needs to be found from reconstructing the values of , its value sets the scale of the unknown drift function relative to thermal diffusion and defines an effective Peclet number for this problem. Experimentally, the shapes of potentials are fixed by molecular details; however, the Peclet number *A** is inversely proportional to temperature and can in principle be tuned experimentally.

Figure 4*b* shows that for a fixed-shape target drift function *U**(*x*) of the form , and a single FPTD measurement, the inverse of the condition number is maximized in the limit . Therefore, the problem has the best conditioning and the most efficient mathematical reconstruction in the zero Peclet number limit, when the potentials are weak. Computationally, a single FPTD data set arising from a vanishingly small drift perturbing the purely diffusive problem gives the most numerical ‘signal’ for reconstructing the coefficients of *U**(*x*). Although the magnitudes of *a*_{i} are vanishingly small, their incremental effect on reducing the condition number *κ* is nonetheless greatest in this limit. This optimal limit arises from a mathematical analysis and is not predicted by physical considerations. However, system and experimental constraints may preclude measurement of effective potentials at extremely low Peclet numbers *A** (high temperatures), suggesting that an optimal, intermediate temperature may still arise in practice.

In figure 5, the behaviour of *κ*^{−1} as a function of a fixed starting position is even more intriguing. For constant *g*(*s*) and all potentials we tested, the optimal value of the starting position occurs roughly near *x*=0.7–0.9. This starting position is close to the rupture point at *x*=1 and is somewhat insensitive to the amplitude *A**, except for very large *A**. The robustness of this optimal starting position arises from analysing the Hessian matrix, in particular the dependence of its condition number on *x*.

From the form of *H*_{ij} (see equation (B2)), we can show numerically that *λ*_{min} has a maximum for *x*≈0.7–0.9 and that the behaviour of *κ*^{−1} is rather insensitive to changes in *λ*_{max}; thus, *κ*^{−1} is typically maximal near *x*=0.75. For the three qualitatively different potentials used in figure 5, the optimal starting positions all fall approximately within *x*= 0.7–0.9 for a wide range of amplitudes *A**. In these examples, the best conditioning occurs in the limit , consistent with figure 4*b*. For non-constant *g*(*s*), the approximate optimal starting position *x* typically ranges from 0.5 to 0.9, depending on the form of *g*(*s*) (cf. figure 7 in appendix B).

In physical problems such as bond rupturing, the initial position cannot be exactly determined, and is more likely to be drawn from an imposed starting position distribution. For example, if the bond is held by a strong optical or mechanical trap before being released, the distribution of starting positions would typically be gaussian in *x*, reflecting a quadratic confining potential. The ease of reconstruction for a process started from a distribution of starting positions would be measured by an effective condition number that is a weighted average of the condition numbers associated with each specific starting position. Finally, for typical macromolecular bonds, the physical binding energies are on the order of many *k*_{B}*T*, implying that the amplitude . In this limit, the condition number is large (cf. figures 4 and 5), and reconstruction is difficult. Note that the successful reconstructions illustrated in figure 3 were of drift functions of unit amplitude. Since a single measurement is typically insufficient to reconstruct the potentials well beyond three coefficients, particularly for potential with large associated *A**, we consider how additional data can be used to refine the reconstruction of *U**(*x*).

### (b) Multiple measurements

Even after optimizations with respect to Peclet number *A**(*e.g.* ) and starting position *x*, reconstruction of *Φ**(*x*) from a single FPTD measurement is extremely difficult. However, as indirectly suggested by the analysis of varying *A** and *x*, an unknown potential can be changed by a specified amount to yield a FPTD different from that of the original unchanged potential. By imposing any number of perturbations, *multiple* FPTD data can be measured and used to aid the reconstruction of the original potential. Therefore, we propose three protocols for modifying the potential to be reconstructed. Experimentally, these protocols correspond to changing the system temperature, applying and then quickly removing a force to change the starting position, and adding an applied force at the start of the stochastic process. Mathematically, these perturbations correspond to specific changes in *A**, *x*, and the form of the potential *Φ**, respectively. The multiple FPT distributions, measured under different conditions, can then be combined into a multi-distribution objective function. We summarize the protocols below:

*— Changing amplitude via temperature*. One way to obtain additional data is by changing the amplitude (or effective Peclet number)*A** of the unknown drift. For each distinct value of*A**, a separate FPTD can be measured. These different FPTDs all arise from bond potentials with the same underlying shape, and can be used together to better reconstruct*Φ**(*x*). While the absolute value of*A** needs to be determined from the reconstruction of**a***, the relative temperature at which a second measurement is taken can be used to determine the ratio .*— Tuning starting positions*. By adding a force to the system*before*the start of the process, one can adjust the initial position*x*of the bond. At*t*=0, this force is released, and the stochastic process proceeds under the original target drift*U**(*x*), provided the potential relaxes quickly to*Φ**(*x*). Stochastic bond dynamics starting at different positions*x*yield different measured FPT distributions.*— Adding probe forces*. Finally, one can add known potentials to the original target potential immediately*after*the start of the stochastic process to obtain additional FPT distribution data. Here, , where Δ*Φ*(*x*) is known. The associated drift then changes according to , where Δ*U*(*x*) is implemented through a known change in the expansion coefficients Δ**a**and represents an externally applied force imposed by e.g., a pulling device such as an AFM tip or an optical tweezer. The associated external potential in such cases may be of the form Δ*Φ*(*x*)=−*F*_{ext}*x*−*Kx*^{2}/2, where*F*_{ext}is the externally applied time-independent force and*K*is the elastic response of the pulling device. In this case, the new total bond potential*Φ**+Δ*Φ*(*x*) induces a drift*U**(*x*)+Δ*U*defined by**a***+Δ**a**where Δ*a*_{0}=*F*_{ext}+*K*/2 and . This new drift gives rise to another, different FPTD.

An objective function that incorporates all *M* FPT distributions measured under *M* different conditions described above can be defined as
4.2
where *x*_{m}, Δ**a**_{m}, and denote the known starting position, added pulling force, and relative temperature of the *m*th measurement, respectively. The function *g*_{m}(*s*) weights each of the *m* measurements differently. If the data in equation (4.2) were generated from a target drift *U**(*x*), as is the case in our analyses, then it is defined as . Measured data should be obtained with different starting position *x*_{m}, applied force Δ**a**_{m}, and/or different temperature ratios with at least one of the known parameters different among the *M* measurements. Multiple data sets provide additional constraints, increasing the curvature of the objective function near **a***. In general, the smallest eigenvalue *λ*_{min} of the Hessian matrix associated with *Π*_{M} increases with *M*. Upon minimizing the multi-FPTD objective function *Π*_{M}, we obtain **a***.

To illustrate how additional data can improve potential reconstruction, we compare how including two FPT distributions (*M*=2) in our objective function *Π* simultaneously affects reconstruction relative to using each FPTD separately. The two distributions will arise from two ideal measurements taken under two different conditions within each of the proposed experimental protocols described above. In figure 6*a*, we attempt to reconstruct the five-parameter, double-well potential *Φ**(*x*)=−28*x*+(1451/10)*x*^{2}−307*x*^{3}+290*x*^{4}−100*x*^{5} using two ‘temperatures’. This potential corresponds to We then generated data associated with *Φ**(*x*)/5, corresponding to Reconstruction of the original *Φ**(*x*) using each individual FPTD fails, as does using both FPTD data sets. In figure 6*b*, we see that when using data sets corresponding to either initial position *x*_{1}=0.182 or *x*_{2}=0.727, the algorithm fails to reconstruct the target potential. However, using both initial positions together allows us to accurately determine *Φ**(*x*). Similarly, adding the perturbing potential Δ*Φ*_{2}=−*x*^{2}/2 (Δ*U*_{2}=*x*) provides another FPTD that allows accurate reconstruction of a double-well potential (figure 6*c*).

## 5. Summary and conclusions

We have analysed the mathematical aspects of reconstructing the drift of a one-dimensional stochastic process from perfectly measured first passage time distributions. In practice, insufficient number of bond rupture events are currently measured to enable quantitative potential reconstruction; therefore, we used numerically generated data to illustrate our main mathematical results. For single distributions, only very coarse attributes (approximately three parameters) can be reconstructed. We demonstrate how to optimize the efficiency of the reconstruction by controlling the effective amplitude or Peclet number *A** and starting position *x* of the stochastic process. If only one FPTD can be measured, our analysis suggests that and *x*∼0.75 are the most probable parameters to give the best chance for reconstructing relatively simple potentials. However, these findings were found numerically by assuming *perfect* data, uniform diffusivity, precisely defined starting positions *x*, and a constant weighting function *g*(*s*). In practice, finite-time resolution, noisy data, and other experimental limitations may be accounted for by a more suitable weighting function *g*(*s*) (or *g*(*t*) if *Π* were a functional of *w*(*x*,*t*) and the data *W*(*x*,*t*)). We show in appendix B that the optimal parameters *A** and *x* can change when a non-constant weighting function *g*(*s*) is used. While the optimal values of and *x*∼0.75 (for *g*(*s*)=1) were found numerically, without physically realistic limitations, they nonetheless provide a possible experimental starting point.

We also showed that additional measurements in the form of multiple FPTDs can be used to provide dramatically better conditioning of the problem, allowing finer details of the drift function to be extracted. The total objective function including the constraints from all *M* measurements has a sharper minimum, increasing the efficiency of standard optimization algorithms. We proposed three ways of obtaining additional measurements under different experimental conditions: tuning the effective amplitude or Peclet number *A** through the system temperature, adjusting the starting position *x* via an initially applied force, and adding a known ‘probe’ force through a potential Δ*Φ*. This later potential can be realized in a number of ways, from directly mechanically pulling on the bond, to using mutagenesis to systematically change local properties of the bond energy profile. Such mutant bonds may provide additional FPTD data facilitating reconstruction of the original ‘wild-type’ potential.

A number of refinements are suggested by our analysis, and some are discussed in the appendices. For example, rather than Laplace-transforming the data, one can directly fit to *W*(*x*,*t*). Although this approach is computationally more expensive, it would allow us to treat time-dependent potentials *U*(*x*,*t*), and directly analyse dynamic force spectroscopy experiments (Evans *et al*. 1995; Heymann & Grubmüller 2000; Fuhrmann *et al*. 2008), or scenarios in which the temperature is changed in a time-dependent way (Getfert & Reimann 2009). Moreover, specific to the bond rupture problem, data involving bond coordinates as a function of time, if accurately measured, can also be incorporated into an objective function. These additional data may be useful in combating the noise problem and help improve the overall conditioning. Our assumption of an effective one-dimensional potential of mean force can also be relaxed to include higher dimensional stochastic processes if internal degrees of freedom at a point which defines bond rupturing can also be experimentally resolved.

Other more mathematical refinements and extensions can also be implemented, including exploring effects of using different basis functions for *U*(*x*), using more sophisticated optimization methods, quantifying the reconstruction efficiency for large *M*, defining the experimentally imposed weighting function *g*(*s*), reconstruction of the diffusivity *D*(*x*) itself, and systematically exploring the effects of noise in the data. Here, a Bayesian approach to estimate likelihood functions for the coefficients **a***, or information criteria to choose the size *n* of the basis expansion might be useful (Getfert *et al*. 2009).

## Acknowledgements

The authors thank S. Getfert, A. Fuhrmann, and A. Landsman for helpful comments. This work was supported by NSF grant DMS-0349195 and NIH grant K25 AI058672.

## Appendix A. Numerical methods

#### (a) Numerical scheme for the solution of the backward equation

In the forward problem, with *D*(*x*) and *Φ*(*x*) given, equation (2.2) can be solved using standard finite difference schemes. Figure 2*b* shows numerically computed FPTDs for two different starting positions with *D*=*L*=1, *U*(*x*)=4−10*x*+6*x*^{2}. The singular boundary condition *w*(*x*=1,*t*)=*δ*(*t*) is treated by taking Laplace transforms in time of equation (2.2) and solving equation (3.1) for discretized values of the Laplace-transformed variable *s*. Moreover, the numerical solution of equation (3.1) for a set of values *s* can be found much more quickly than solving the full partial differential equation on a large *x*−*t* grid. For *D*=1 and a given drift function *U*(*x*), we use a spectral method (Trefethen 2000) to solve the Laplace-transformed backward equation (3.1). First, the spatial domain is mapped from [0,1] to [−1,1] using a change of variable. Then, the function is represented by and interpolated between the *N* Chebyshev points with polynomials. The resulting *N*×*N* system of equations for are
A1
where the matrix *Q*_{ij} is
A2
where **D**_{N} is the usual *N*×*N* pseudospectral differentiation matrix (Trefethen 2000) and the diagonal matrix **U** is defined by
A3
We used *N*=51 spectral points in all of our computations within the iterative algorithm. Fixed-*x* slices of the numerically obtained functions are qualitatively similar to the plots shown in the inset of figure 2*b*.

#### (b) Evaluation of the objective function

In our analysis, we generate data by numerically computing distributions derived from a target drift function *U**(*x*) (defined by its polynomial coefficients **a***). Since the data are generated numerically, we use in the objective function and write
A4
Note that the ℓth moment of the FPTD is given by
A5
Therefore for two FPTDs with identical first few moments, their first few derivatives are also identical. Such FPTDs can only be distinguished by their differences at larger values of *s* (and the function *g*(*s*) in equation (A4) can be used to weight these differences accordingly). Only information contained in the tails of the Laplace-transformed distributions can distinguish two FPT distributions with equal lower moments. Therefore for the algorithm described in appendix B to be effective, a numerical approximation to the integral in equation (A4) must evaluate the integrand for sufficiently large values of *s*. This can be done by mapping to *ξ*∈[0,1] through a change in variable *s*(*ξ*)=*ξ*/(1−*ξ*), and computing
A6
In order to choose *g*(*s*) such that *Π*(**a**) remains convergent, we assume that *g*(*s*) has no singularities in and consider the behaviour of the integrand at the end points *s*=0 and through the asymptotic expansions
A7
and
A8
Since *w*(*t*=0;**a**)=*w*(*t*=0;**a***)=0 for any two sets of drift coefficients **a** and **a***, the asymptotic expansion (A7) implies that as . If the first *k* time derivatives of *w*(*t*;**a**) and *w*(*t*;**a***) match, equation (A7) implies that . For a weighting function of the form *g*(*s*)=*s*^{q}, the integrand in equation (A6) is *O*((1−*ξ*)^{2−q}) as *ξ*→1 and is integrable at *ξ*=1 provided *q*<3.

Since , the asymptotic expansion (A8) implies that as *s*→0. If the first *k* *s*-derivatives agree (i.e. the first *k* moments of *w*(*t*;**a**) and *w*(*t*;**a***) are identical), equation (A8) implies that . If *g*(*s*)=*s*^{q} then the integrand in equation (A6) is *O*(*ξ*^{2+q}) as *ξ*→0 and is integrable at *ξ*=0 provided *q*>−3.

To summarize, if we take *g*(*s*)=*s*^{q}, then convergence of the integral in equation (A6) requires −3<*q*<3. When evaluating *Π*(**a**), a fourth-order open trapezoid rule (that does not require evaluation of the integrand at the end points) was used and typically 100–1000 uniformly spaced trapezia were found to give sufficient accuracy for the plots shown in figures 3 and 6.

#### (c) Minimization of *Π*(**a**)

The linear system (3.1) must be solved for many different values of *s* so that it can be used in the discrete approximation to the objective function (3.4). To minimize equation (3.4), we use a safe-guarded Newton strategy:
A9
Here, **G** is a positive definite matrix and *σ*>0 is the step size chosen to minimize *Π*(**a**(*k*+1)) along the descent direction **G**^{−1}∇_{a}*Π*.

To compute **G**, we adopt the following procedure. If the Hessian *H*_{ij}≡∂_{ai}∂_{aj}*Π*(**a**) is positive definite, we set **G**≡**H**. If the Hessian **H** is not positive definite, we choose a small Tikhonov regularization (Vogel 2002) parameter *α* such that **H**+*α***I** is safely positive definite (**I** is the identity matrix) and set **G**=**H**+*α***I**. In either case, **G**^{−1} is also positive definite and moving in the direction of −*σ***G**^{−1}∇_{a}*Π* is guaranteed to decrease *Π*(**a**(*k*+1)) for sufficiently small *σ*. The value of *σ* is found by performing an exact line search to minimize *Π*(**a**(*k*+1)) along the descent direction. All Jacobian and Hessian matrices are approximated numerically using a suitably small *δ***a**, typically on the order of 10^{−5}. The algorithm terminates when the relative change in the objective function is less than 10^{−3}.

We represent the drift function *U*(*x*) using orthonormal polynomials on [0,1] given in equation (3.3). Often, the potential and drift arising from molecular interactions diverge as (such as in the Lennard-Jones potential). In this case, *Φ*(*x*) and *U*(*x*) could be represented using basis functions with the correct divergent behaviour for *x*→0. Although the Laplace-transformed backward equation (3.1) has an irregular singular point in this case, the spectral method retains its ability to find solutions as long as *x*=0 is not included in the Chebyshev grid.

Finally, although the possibility of multiple minima in *Π* exists, we find that given our example potentials and our initial guess *U*(*x*)=0, we always either converged to the correct minimum, or failed to converge because of an extremely flat minimum. Convergence to an incorrect minimum was rarely observed in our studies.

## Appendix B. Analysis of eigenvalues and condition numbers

Since the ease of minimizing *Π*(**a**) is quantified by the condition number *κ*≡*λ*_{max}/*λ*_{min} of the Hessian **H**, we now consider the behaviour of the minimum and maximum eigenvalues *λ*_{min} and *λ*_{max}. If *κ*=*O*(1), the problem is well-conditioned; if *κ*≫1, the problem is badly conditioned. Clearly, the one parameter optimization problem has *κ*≡1 and is well-conditioned. However, when the number of parameters increases, it is desirable to find conditions under which *κ*^{−1} is maximized.

#### (a) Small Peclet number analysis

We now analyse *κ* in the *A**≪1 limit. The components of the Hessian at **a*** (the drift coefficients of the target potential) are given by
B1
For weak potentials, we find solutions to in the limit. If *U*(*x*)=*O*(*A**), we expand the solution in the form where , and use the boundary conditions , and (*m*≥1) where primes denote differentiation with respect to *x*. The first two terms in the expansion are
and
where the Green’s function
satisfies *G*′′−*sG*=−*δ*(*x*−*x*′) and *x*_{<}(*x*_{>}) is the lesser(greater) value of *x*,*x*′. Upon using the full orthonormal polynomial expansion , we find explicitly
B2
Since all elements of **H** are independent of the drift coefficients *a*_{i}, they are also independent of *A**, and so are all eigenvalues. Therefore, as a function of *A**, the inverse condition number 1/*κ*≡*λ*_{min}/*λ*_{max} approaches a constant as . Moreover, for all forms of *g*(*s*) tested, we find numerically that *κ*^{−1} is maximal in the limit. These results confirm the numerical data in figures 4*b* and 5.

Within the limit, different weightings *g*(*s*) can also be used to better maximize *κ*^{−1}. Using the asymptotic form (B2) for the Hessian and *g*(*s*)=*s*^{q} (−3<*q*<3), we plot *κ*^{−1} as a function of starting position *x* and exponent *q* in figure 7. Note that *q*>0 puts more weight into the tails of , accentuating differences in higher moments of the FPTD. If *q*<0, more weight is given to small values of *s*. For each value of *q*, there is an optimal starting position *x** that minimizes the condition number and greatly improves the efficiency of reconstructing the bond potential. Although in §4 we discussed changing the starting positions to optimize the reconstruction, in cases where the starting position cannot be controlled, another strategy may be to estimate an optimal *q*=*q** from figure 7 and minimize *Π*(**a**) using the weighting function *g*(*s*)=*s*^{q*}. We also experimented with weighting functions of the form *g*(*s*)=*e*^{−ps} for *p*>0: see figures 7*c*,*d*. This class of weighting functions seems to give poorer conditioning compared with *g*(*s*)=*s*^{q} since *κ*^{−1} is generally smaller.

#### (b) Conditioning with multiple data sets

In figure 8, we assumed *g*(*s*)=1 and plot the inverse condition number *κ*^{−1}=*λ*_{min}/*λ*_{max} as a function of the Peclet number *A**. The potential used was proportional to the one in figure 3*a*: . The starting position was *x*=0.433.

The thin solid curve in figure 8 is taken from figure 4*b* and corresponds to *κ*^{−1} when only one FPTD data set (*M*=1) is used. We compare these values to those when two FPTDs (*M*=2) are used as data in *Π*_{M}(**a**). The second data set was generated using each of the protocols discussed in §4*b*. The dotted, dashed-dotted and dashed curves were generated from the Hessian of *Π*_{M}(**a**) by using both *A** and *A**/5, two starting positions *x*_{1}=0.433 and *x*_{2}=0.65, and both the original potential *Φ**(*x*) and *Φ**(*x*)−5*x*^{2}/2, respectively. In each case we see a different improvement in the conditioning at each value of *A**.

Since the largest eigenvalue *λ* does not change much with increasing *M*, the improvement in conditioning is due mostly to increasing the smallest eigenvalue *λ* (not shown). Reducing the Peclet number *A** only improves the conditioning for large *A**. Increasing the contraction factor in the two values of *A** (e.g. from *A**→*A**/5 to *A**→*A**/10) further improves the conditioning by increasing *κ*^{−1} at ever smaller values of *A**. In this example, changing the starting position is perhaps the most reliable way of facilitating the reconstruction: the condition numbers *κ* are decreased by at least an order of magnitude over a wide range of *A** and the improvement becomes even better for *larger* *A**. Finally adding a probe force greatly enhances the reconstruction for moderate *A**=*O*(1) and there is now an optimal *A** (and hence system temperature) where the reconstruction is easiest. This is in contrast to the *M*=1 case, here we always find a monotonically increasing *κ*^{−1} as *A** decreases.

## Appendix C. Reconstruction of many-parameter potentials

Using additional FPT data, we demonstrate the feasibility of reconstructing complex potentials that are described by many parameters.

In figure 9 we successfully reconstruct 7-parameter and 8-parameter potential wells containing multiple minima using *g*(*s*)=1 and three (*M*=3) FPTDs. For both (*a*) the 7-parameter and (*b*) the 8-parameter potential, we see that any combination of two first passage time distribution data sets fails to accurately reproduce the original *Φ**(*x*). However, using the three defined data sets (*M*=3), we were able converge to the correct *Φ**(*x*) in fewer than 20 iterations. Importantly, we were able to quantitatively resolve the multiple minima. It should be noted, however, that with the current optimization algorithm we were only able to obtain about 2 significant digits of accuracy on the coefficients *a*_{i} for these potentials.

## Footnotes

↵† Present Address: Department of Mathematical Sciences, University of Delaware, Newark, DE 19716-2553, USA.

↵1 More general potentials and drift functions that diverge at

*x*=0 lead to highly singular differential equations, but can still be solved using spectral methods (Trefethen 2000).↵2 If we had chosen to work in the time domain, a reasonable objective function that measures the difference between measured and computed FPTDs would be .

- Received February 13, 2010.
- Accepted May 6, 2010.

- © 2010 The Royal Society