Nonlinear Least Squares Regression (Curve Fitter)

Background ||| Techie-Stuff ||| Instructions ||| Syntax Rules ||| Model Library

This page lets you fit any function of up to eight parameters to a set of data. Just specify the function, the data points, and initial guesses to the parameters. When you click the Iterate button, the JavaScript program refines these estimates to produce what should be a better set of parameters. This process is iterative, and with good guesses (and good luck) usually converges to the least squares solution in five to ten iterations. This program can also fit nonlinear Least-Absolute-Value curves and Percentile Curves (having a specified fraction of the points below the curve).


Background Info (just what is nonlinear curve-fitting, anyway?):

Simple linear curve fitting deals with functions that are linear in the parameters, even though they may be nonlinear in the variables. For example, a parabola y=a+b*x+c*x*x is a nonlinear function of x (because of the x-squared term), but fitting a parabola to a set of data is a relatively simple linear curve-fitting problem because the parameters enter into the formula as simple multipliers of terms that are added together. Another example of a linear curve-fitting problem is y= a+b*Log(x)+c/x; the terms involve nonlinear functions of the independent variable x, but the parameters enter into the formula in a simple, linear way.

Unfortunately, many functions that arise in real world situations are nonlinear in the parameters, like the curve for exponential decay y=a*Exp(-b*x), where b is "wrapped up" inside the exponential function. Some nonlinear functions can be linearized by transforming the independent and/or dependent variables. The exponential decay curve, for example, can be linearized by taking logarithms: Log(y)=a'-b*x . The a' parameter in this new equation is the logarithm of a in the original equation,so once a' has been determined by a simple linear curve-fit, we can just take its antilog to get a.

But we often encounter functions that cannot be linearized by any such tricks, a simple example being exponential decay that levels off to some unknown value: y=a*Exp(-b*x)+c. Applying a logarithmic transformation in this case produces Log(y-c)=a'-b*x. This linearizes b, but now c appears inside the logarithm; either way, we're stuck with an intrinsically nonlinear parameter estimation problem, which is considerably more difficult than linear curve-fitting. That's the situation this web page was designed to handle.

For a more in-depth treatment of this topic, check out Dr. Harvey Motulsky's new web site: Curvefit.com -- a complete guide to nonlinear regression. Most of the information here is excerpted from Analyzing Data with GraphPad Prism, a book that accompanies the program GraphPad Prism. You can download this book as a pdf file.


Techie-stuff (for those who might be interested):

This page contains a straightforward, no-frills JavaScript implementation of the method of differential corrections, which involves expanding the function to be fitted in a Taylor series around current estimates of the parameters, retaining first-order (linear) terms, and solving the resulting linear system for incremental changes to the parameters. The program computes finite-difference approximations to the required partial derivatives, then uses a simple elimination algorithm to invert and solve the simultaneous equations. Central-limit estimates of parameter standard errors are obtained from the diagonal terms of the inverse of the normal equations matrix. The covariance matrix is computed by multiplying each term of the inverse normal matrix by the weighted error-variance. It is used to estimate parameter error correlations and to compute confidence bands around the fitted curve. These show the uncertainty in the fitted curve arising from sampling errors in the estimated parameters, and do not include the effects of errors in the independent and dependent variables. The page also computes a generalized correlation coefficient, defined as the square root of the fraction of total y variance explainable by the fitted function.

Unequal weighting is accomplished by specifying the standard error associated with the y variable. Constant errors, proportional errors, or Poisson (square root) errors can be specified by a menu, and don't have to be entered with the data. Standard errors can also be entered along with the x and y variables. Finally, replicate y measurements can be entered; the program will compute the average and standard error of the mean.

Also available are a number of simple variable transformations (log, reciprocal, square root), which might simplify the function to be fitted and improve the convergence, stability and precision of the iterative algorithm. If a transformation is applied to the y variable, the program will adjust the weights appropriately.

The page also fits least-absolute-value curves by applying an iterative reweighting scheme by which each point is given a standard error equal to the distance of that point from the fitted curve. An option allows differential weighting of above-curve points vs. below-curve points to achieve a specified split of points above and below the curve (a percentile curve fit).

No special goal-seeking methods, precision-preserving techniques (such as pivoting), convergence-acceleration, or iteration-stabilizing techniques (other than a simple, user-specified fractional adjustment), are used. This method may not succeed with extremely ill-conditioned systems, but it should work with most practical problems that arise in real-world situations.

The current implementation is limited to eight parameters and eight independent variables. These arbitrary limits could be increased without much trouble if necessary. I don't know what the maximum number of data points is; it's probably dependent on your browser's maximum string size, since the contents of the Data and Results windows are treated as large text strings. I've used this page to fit 500-point datasets with no problems.

The fields below are pre-loaded with a simple example: the temperature of a cup of water as it cools from boiling hot to room temperature over the course of an hour, being fit to Newton's Law of Cooling:
Temp = ( T0 - Troom ) * Exp( - k * Time ) + Troom


Instructions:

  1. Enter the number of data points:
  2. Enter the number of independent variables: (0 - 8; usually 1)
  3. Enter the number of parameters: (1 - 8)
  4. Enter the formula for the function to be fitted:
    Y =
    Be sure to adhere to the expression syntax rules described below. (In particular, note that you cannot use ^ for exponentiation; you must use the Power function instead.) Enter only the part after the = sign (if you're fitting y=a+b*x, just enter a+b*x in the box). Refer to the parameters as a, b, c, d, e, f, g, and h, or as p1, p2, p3, ..., p8. If there are fewer than eight parameters in the equation, start with a or p1, and don't skip letters in the sequence (for example, if there are 3 parameters, don't use a, c, and d). If there's a single independent variable, call it x (or x1 or t); if there are more than one, use x1 (or x or t), x2, x3, ..., x8.
  5. Type (or paste) the [x,y] data:

    Use a separate row for each data point. The independent variable (x) or variables (x1, x2, etc.) should come first, followed by the dependent variable (y), followed (if precision was specified as Data above) by the Std Err of y. If precision was specified as Replicates), several y values may be entered. Values should be separated by commas. You can copy data from another program, like a spreadsheet, and paste it into the window above. It may come in as tab-delimited text (without commas), but this will not be a problem; the program will convert tabs to commas during the first iteration.
  6. You may optionally specify the y-variable precision (Std Error), x- and y-variable transformations, and Least-Absolute-Value curve fitting.
  7. Enter your best guesses for the parameters:
    a (or p1)= ± , p=
    b (or p2)=
    ± , p=
    c (or p3)=
    ± , p=
    d (or p4)= ±
    , p=
    e (or p5)= ±
    , p=
    f (or p6)= ±
    , p=
    g (or p7)= ±
    , p=
    h (or p8)= ±
    , p=
    If there are fewer than eight parameters, set the extra boxes to zero. (Any values you enter in the boxes for these "extra" parameters beyond the number specified in Step 3 will be "made available" for use in the formula, but will not be adjusted by the iterative process.) Don't put anything into the boxes to the right of the ± sign; that's where the program will put the standard errors and significance level of the parameters.
  8. Click the button to perform a single iteration cycle, and observe how the parameters change in the boxes above. Also look at the RMS Error and the Output area below.
    If you encounter problems getting the parameters to converge, you can specify a fractional adjustment factor here: Values less than 1.0 will apply only that fraction of the calculated adjustment to the parameters, making the convergence slower but more stable. Change this value back to 1.0 once the iterations seem to be converging.

    RMS Err = Weighted Standard Error of Estimate: the average scatter of the points from the fitted curve. This is relative to the estimated error of the points themselves. Values much above 1 indicate the curve is not fitting the points to within their intrinsic errors.

    Note that after each iteration step the calculated yc values, differences, correlation coefficient and RMS Error are all based on the parameter values at the end of the previous iteration step. Always click the Iterate button an extra time after convergence has been attained.
  9. If the new parameter values seem reasonable, click the Iterate button again, and continue until the parameters converge.
  10. If any parameters seem to be diverging, enter a more reasonable value and click the Iterate button again.
  11. To print out results, copy and paste the contents of the Output window above into a word processor or text editor, then Print. For best appearance, use a fixed-width font like Courier.
  12. You can also transfer your results to a spreadsheet like Excel for graphing. Select the lines in the output window that have x, yo, yc, etc. (including the line with the column headers), Copy to the clipboard, paste into Excel, then use Excel's "Text to Columns" feature to split the data into separate columns. You can then have Excel prepare a graph that shows the observed points, calculated curve, and even confidence bands around the fitted curve!
    It may be easier to first check the Tab-delimited output box (below) and then click the Iterate button one more time. The output in the box above will look a lot uglier, but will paste into Excel much more nicely, and you won't have to go through Excel's Text -to-Columns process.


Optional Features:

The following optional features can be invoked. They have been collected here to avoid cluttering up the main part of the computational page.

(Optional) click here for Tab-delimited output (more convenient for pasting the results into spreadsheets).

(Optional) Specify the Standard Error associated with the Y variable:
Equal: all points are equally precise;
Relative: Std Err of each Y value is proportional to Y variable itself;
Counts: Std Err = square root of Y; this is appropriate if Y represents the # of occurrences of something (such as radioactive decay events);
Data: Std Err is specified in the data window as a separate column of numbers, immediately to the right of the Y values;
Replicates: Specify this if you have entered several Y values.
(When in doubt, choose Equal.)

(Optional) Specify any transformations to applied to the dependent and/or independent variables:

A nonlinear curve fitter generally doesn't require that you transform the data. But transformations might make the function more nearly linear, making the curve-fitting process more stable and faster to converge. The program will automatically adjust the weighting to compensate for y-variable transformations.

(Optional) click here for Centered Approximation to Partial Derivatives (more accurate, but slower).

(Optional Feature -- Use at your own risk!) Click here for Least-Absolute -Value curve fitting. This option attempts to minimize the sum of the absolute values of the (yo-yc) differences, rather than the sum of the squares of the differences. Least-Abs fitting bears the same relationship to Least Squares fitting that the median of a set of numbers bears to the mean. The Least-Abs curve is much less affected by outliers than the Least Squares curve. It will also have the property that about 50% of the points will fall above the curve and 50% below. Alternatively, you can specify the percentage of points you want to fall below the curve here: (as a number greater than 0 and less than 100). This allows you to fit percentile curves to your data!
Convergence may be very slow and/or erratic (parameters bouncing around aimlessly).
Setting the Adjustment Factor to 0.2 or 0.5 may (or may not) improve convergence.
An N-parameter curve will usually pass almost exactly through N of the points.

Expression Syntax:

Operators: + - * / and parentheses (note that there is no ^ for raising to a power; instead use the Power function, described below)
Constants: Pi [=3.14...], Deg [=180/Pi = 57.2...]
Conditional Expressions: (Condition) ? ValueIfTrue : ValueIfFalse
Built-in Functions... [Unless otherwise indicated, all functions take a single numeric argument, enclosed in parentheses after the name of the function.]
Algebraic: Abs, Sqrt, Power(x,y) [= x raised to power of y)], Fact [factorial], Min(x,y) [= the lesser of x or y], Max(x,y) [= the greater of x or y]
Transcendental: Exp, Ln [natural], Log10, Log2
Trigonometric: Sin, Cos, Tan, Cot, Sec, Csc
Inverse Trig: ASin, ACos, ATan, ACot, ASec, ACsc
Hyperbolic: SinH, CosH, TanH, CotH, SecH, CscH
Inverse Hyp: ASinH, ACosH, ATanH, ACotH, ASecH, ACscH
Statistical: Norm, Gauss, Erf, ChiSq(csq,df), StudT(t,df), FishF(F,df1,df2)
Inverse Stat: ANorm, AGauss, AErf, AChiSq(p,df), AStudT(p,df), AFishF(p,df1,df2)

Note: This program is not case-sensitive, so you can refer to the variables as x, X, X1, x2, t, or T, etc., and the parameters as p1, P1, p2, or P2, etc.; and the function names can be written as (for square root, for example) sqrt, SQRT, Sqrt, SqRt, sQRt, etc.

Note: The trig functions work in radians. For degrees, multiply or divide by the Deg variable. For example: Sin(30/Deg) will return 0.5, and ATan(1)*Deg will return 45.

Note: The factorial function is implemented for all real numbers -- integers or non-integers, positive, or negative. For non-integers its accuracy is about 6 significant figures. For negative integers it returns either a very large positive or negative number, or a division-by-zero error.

Note: The statistical functions Norm and StudT return 2-tail p-values (eg: Norm(1.96)=0.05), while ChiSq and FishF return 1-tail values. This is consistent with the way these functions are most frequently used in statistical testing. Gauss(x) returns the integral from -infinity to x of Exp(-z*z/2)/Sqrt(2*Pi) with respect to z. It is closely related to the Norm function, differing only in the range of integration (being a "left integral" rather than a "both tails" integral). Its primary use on this page is for fitting s-shaped "probit" functions. Similarly, Erf(x) returns the Error function, which is very closely related to Gauss.


Return to the Interactive Statistics page or to the JCP Home Page

Send e-mail to John C. Pezzullo at statpages.org@gmail.com