Curve Fit
Curve Fit
Toolbox
                  For Use with MATLAB
                                    ®
User’s Guide
Version 1
How to Contact The MathWorks:
www.mathworks.com                      Web
comp.soft-sys.matlab                   Newsgroup
508-647-7000 Phone
508-647-7001 Fax
For contact information about worldwide offices, see the MathWorks Web site.
                                                                                                                i
                       Moving Average Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . .              2-12
                       Lowess and Loess: Local Regression Smoothing . . . . . . . . . . .                              2-14
                       Savitzky-Golay Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            2-19
                       Example: Smoothing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .               2-21
                                                                                                Fitting Data
                3
                    The Fitting Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
ii   Contents
    Selected Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-76
                                                               Function Reference
4
    Functions — Categorical List . . . . . . . . . . . . . . . . . . . . . . . . . .                   4-2
      Fitting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   4-2
      Getting Information and Help . . . . . . . . . . . . . . . . . . . . . . . . . . .               4-2
      Getting and Setting Properties . . . . . . . . . . . . . . . . . . . . . . . . . .               4-2
      Preprocessing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         4-2
      Postprocessing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        4-3
      General Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      4-3
Index
                                                                                                             iii
iv   Contents
                                                                                 1
Getting Started with the
Curve Fitting Toolbox
This chapter describes a particular example in detail to help you get started with the Curve Fitting
Toolbox. In this example, you will fit census data to several toolbox library models, find the best fit,
and extrapolate the best fit to predict the US population in future years. In doing so, the basic steps
involved in any curve fitting scenario are illustrated. These steps include
What Is the Curve Fitting     The toolbox and the kinds of tasks it can perform
Toolbox? (p. 1-2)
Opening the Curve Fitting     The Curve Fitting Tool is the main toolbox interface.
Tool (p. 1-4)
Importing the Data (p. 1-5)   The data must exist as vectors in the MATLAB workspace. After
                              importing, you can view the data, mark data points to be excluded
                              from the fit, and smooth the data.
Fitting the Data (p. 1-7)     Explore various parametric and nonparametric fits, and compare fit
                              results graphically and numerically.
Analyzing the Fit (p. 1-17)   Evaluate (interpolate or extrapolate), differentiate, or integrate the fit.
Saving Your Work (p. 1-19)    Save your work for documentation purposes or for later analysis.
1   Getting Started with the Curve Fitting Toolbox
                              Click the GUI Help buttons to learn how to proceed. Additionally, you can
                              follow the examples in the tutorial sections of this guide, which are all GUI
                              oriented.
1-2
                                                     What Is the Curve Fitting Toolbox?
To explore the command line environment, you can list the toolbox functions by
typing
  help curvefit
You can change the way any toolbox function works by copying and renaming
the M-file, and then modifying your copy. However, these changes will not be
reflected in the graphical environment.
You can also extend the toolbox by adding your own M-files, or by using it in
combination with other products such as the Statistics Toolbox or the
Optimization Toolbox.
                                                                                    1-3
1   Getting Started with the Curve Fitting Toolbox
                              • Visually explore one or more data sets and fits as scatter plots.
                              • Graphically evaluate the goodness of fit using residuals and prediction
                                bounds.
                              • Access additional interfaces for
                                 - Importing, viewing, and smoothing data
                                 - Fitting data, and comparing fits and data sets
                                 - Marking data points to be excluded from a fit
                                 - Selecting which fits and data sets are displayed in the tool
                                 - Interpolating, extrapolating, differentiating, or integrating fits
                              You open the Curve Fitting Tool with the cftool command.
                                  cftool
1-4
                                                                              Importing the Data
The workspace now contains two new variables, cdate and pop:
           You can import data into the Curve Fitting Tool with the Data GUI. You open
           this GUI by clicking the Data button on the Curve Fitting Tool. As shown
           below, the Data GUI consists of two panes: Data sets and Smooth. The Data
           Sets pane allows you to
           • Import predictor (X) data, response (Y) data, and weights. If you do not
             import weights, then they are assumed to be 1 for all data points.
           • Specify the name of the data set.
           • Preview the data.
                                                                                             1-5
1   Getting Started with the Curve Fitting Toolbox
                              To load cdate and pop into the Curve Fitting Tool, select the appropriate
                              variable names from the X Data and Y Data lists. The data is then displayed
                              in the Preview window. Click the Create data set button to complete the data
                              import process.
1-6
                                                                                         Fitting the Data
             • Specify the fit name, the current data set, and the exclusion rule.
             • Explore various fits to the current data set using a library or custom
               equation, a smoothing spline, or an interpolant.
             • Override the default fit options such as the coefficient starting values.
             • Compare fit results including the fitted coefficients and goodness of fit
               statistics.
             • Keep track of all the fits and their data sets for the current session.
             • Display a summary of the fit results.
             • Save or delete the fit results.
               Note that this action always defaults to a linear polynomial fit type. You use
               New Fit at the beginning of your curve fitting session, and when you are
               exploring different fit types for a given data set.
             2 Because the initial fit uses a second degree polynomial, select quadratic
               polynomial from the Polynomial list. Name the fit poly2.
             3 Click the Apply button or select the Immediate apply check box. The
               library model, fitted coefficients, and goodness of fit statistics are displayed
               in the Results area.
                                                                                                      1-7
1   Getting Started with the Curve Fitting Toolbox
                                  For fits of a given type (for example, polynomials), you should use Copy Fit
                                  instead of New Fit because copying a fit retains the current fit type state
                                  thereby requiring fewer steps than creating a new fit each time.
                              The Fitting GUI is shown below with the results of fitting the census data with
                              a quadratic polynomial.
1-8
                                                                              Fitting the Data
The data, fit, and residuals are shown below. You display the residuals as a line
plot by selecting the menu item View->Residuals->Line plot from the Curve
Fitting Tool.
The residuals indicate that a better fit may be possible. Therefore, you should
continue fitting the census data following the procedure outlined in the
beginning of this section.
The residuals from a good fit should look random with no apparent pattern. A
pattern, such as a tendency for consecutive residuals to have the same sign, can
be an indication that a better model exists.
                                                                                              1-9
1   Getting Started with the Curve Fitting Toolbox
                              When you fit higher degree polynomials, the Results area displays this
                              warning:
                                  Equation is badly conditioned. Remove repeated data points
                                  or try centering and scaling.
                              The warning arises because the fitting procedure uses the cdate values as the
                              basis for a matrix with very large values. The spread of the cdate values
                              results in scaling problems. To address this problem, you can normalize the
                              cdate data. Normalization is a process of scaling the predictor data to improve
                              the accuracy of the subsequent numeric computations. A way to normalize
                              cdate is to center it at zero mean and scale it to unit standard deviation.
                                  (cdate - mean(cdate))./std(cdate)
                              To normalize data with the Curve Fitting Tool, select the Center and scale X
                              data check box.
                              Note Because the predictor data changes after normalizing, the values of the
                              fitted coefficients also change when compared to the original data. However,
                              the functional form of the data and the resulting goodness of fit statistics do
                              not change. Additionally, the data is displayed in the Curve Fitting Tool using
                              the original scale.
                              • The fits and residuals for the polynomial equations are all similar, making it
                                difficult to choose the best one.
1-10
                                                                                 Fitting the Data
• The fit and residuals for the single-term exponential equation indicate it is a
  poor fit overall. Therefore, it is a poor choice for extrapolation.
Use the Plotting GUI to remove exp1 from the scatter plot display.
                                                                                                1-11
1   Getting Started with the Curve Fitting Toolbox
                              Because the goal of fitting the census data is to extrapolate the best fit to
                              predict future population values, you should explore the behavior of the fits up
                              to the year 2050. You can change the axes limits of the Curve Fitting Tool by
                              selecting the menu item Tools->Axes Limit Control.
                              The census data and fits are shown below for an upper abscissa limit of 2050.
                              The behavior of the sixth degree polynomial fit beyond the data range makes it
                              a poor choice for extrapolation.
                              As you can see, you should exercise caution when extrapolating with
                              polynomial fits because they can diverge wildly outside the data range.
1-12
                                                                          Fitting the Data
                                                                                      1-13
1   Getting Started with the Curve Fitting Toolbox
                              The numerical fit results are shown below. You can click the Table of Fits
                              column headings to sort by statistics results.
                              The SSE for exp1 indicates it is a poor fit, which was already determined by
                              examining the fit and residuals. The lowest SSE value is associated with poly6.
                              However, the behavior of this fit beyond the data range makes it a poor choice
                              for extrapolation. The next best SSE value is associated with the fifth degree
                              polynomial fit, poly5, suggesting it may be the best fit. However, the SSE and
                              adjusted R-square values for the remaining polynomial fits are all very close to
                              each other. Which one should you choose?
1-14
                                                                       Fitting the Data
To resolve this issue, examine the confidence bounds for the remaining fits. By
default, 95% confidence bounds are calculated. You can change this level by
selecting the menu item View->Confidence Level from the Curve Fitting Tool.
The p1, p2, and p3 coefficients for the fifth degree polynomial suggest that it
overfits the census data. However, the confidence bounds for the quadratic fit,
poly2, indicate that the fitted coefficients are known fairly accurately.
Therefore, after examining both the graphical and numerical fit results, it
appears that you should use poly2 to extrapolate the census data.
Note The fitted coefficients associated with the constant, linear, and
quadratic terms are nearly identical for each polynomial equation. However,
as the polynomial degree increases, the coefficient bounds associated with the
higher degree terms increase, which suggests overfitting.
                                                                                  1-15
1   Getting Started with the Curve Fitting Toolbox
                              The cfit object display includes the model, the fitted coefficients, and the
                              confidence bounds for the fitted coefficients.
                                  fittedmodel1
                                  fittedmodel1 =
                                       Linear model Poly2:
                                         fittedmodel1(x) = p1*x^2 + p2*x + p3
                                       Coefficients (with 95% confidence bounds):
                                         p1 =    0.006541 (0.006124, 0.006958)
                                         p2 =      -23.51 (-25.09, -21.93)
                                         p3 = 2.113e+004 (1.964e+004, 2.262e+004)
                                  goodness1 =
                                             sse:    159.0293
                                         rsquare:    0.9987
                                             dfe:    18
                                      adjrsquare:    0.9986
                                            rmse:    2.9724
                              The output1 structure contains additional information associated with the fit.
                                  output1
                                  output1 =
                                         numobs:     21
                                       numparam:     3
                                      residuals:     [21x1 double]
                                       Jacobian:     [21x3 double]
                                       exitflag:     1
                                      algorithm:     'QR factorization and solve'
1-16
                                                                                                Analyzing the Fit
                                                                                                            1-17
1   Getting Started with the Curve Fitting Toolbox
                              The extrapolated values and the census data set are displayed together in a
                              new figure window.
                                  analysisresults1 =
                                        xi: [6x1 double]
                                      yfit: [6x1 double]
1-18
                                                                                     Saving Your Work
          Before performing any of these tasks, you may want to remove unwanted data
          sets and fits from the Curve Fitting Tool display. An easy way to do this is with
          the Plotting GUI. The Plotting GUI shown below is configured to display only
          the census data and the best fit, poly2.
                                                                                                 1-19
1   Getting Started with the Curve Fitting Toolbox
                              The session is stored in binary form in a cfit file, and contains this
                              information:
                              To avoid saving unwanted data sets, you should delete them from the Curve
                              Fitting Tool. You delete data sets using the Data Sets pane of the Data GUI. If
                              there are fits associated with the unwanted data sets, they are deleted as well.
                              You can load a saved session by selecting the menu item File->Load Session
                              from the Curve Fitting Tool. When the session is loaded, the saved state of the
                              Curve Fitting Tool display is reproduced, and may display the data, fits,
                              residuals, and so on. If you open the Fitting GUI, then the loaded fits are
                              displayed in the Table of Fits. Select a fit from this table to continue your curve
                              fitting session.
1-20
                                                                    Saving Your Work
Generating an M-File
You may want to generate an M-file so that you can continue data exploration
and analysis from the MATLAB command line. You can run the M-file without
modification to recreate the fits and results that you created with the Curve
Fitting Tool, or you can edit and modify the file as needed. For detailed
descriptions of the functions provided by the toolbox, refer to Chapter 4,
“Function Reference.”
If you have many data sets to fit and you want to automate the fitting process,
you should use the Curve Fitting Tool to select the appropriate model and fit
options, generate an M-file, and then run the M-file in batch mode.
Save your work to an M-file by selecting the menu item File->Save M-file from
the Curve Fitting Tool. The Save M-File dialog is shown below.
The M-file can capture this information from the Curve Fitting Tool:
You can recreate the saved fits in a new figure window by typing the name of
the M-file at the MATLAB command line. Note that you must provide the
appropriate data variables as inputs to the M-file. These variables are given in
the M-file help.
                                                                                   1-21
1   Getting Started with the Curve Fitting Toolbox
                              For example, the help for the censusfit M-file indicates that the variables
                              cdate and pop are required to recreate the saved fit.
                                  help censusfit
                                       Number of datasets:    1
                                       Number of fits: 6
1-22
                                                                              2
Importing, Viewing, and
Preprocessing Data
This chapter describes how to import, view, and preprocess data with the Curve Fitting Toolbox. You
import data with the Data GUI, and view data graphically as a scatter plot using the Curve Fitting
Tool. The main preprocessing steps are smoothing, and excluding and sectioning data. You smooth
data with the Data GUI, and exclude and section data with the Exclude GUI. The sections are as
follows.
Importing Data Sets         Select workspace variables that compose the data set, list all imported
(p. 2-2)                    and generated data sets, and delete one or more data sets.
Viewing Data (p. 2-6)       View the data graphically as a scatter plot.
Smoothing Data (p. 2-9)     Reduce noise in a data set using moving average filtering, lowess or
                            robust lowess, loess or robust loess, or Savitzky-Golay filtering.
Excluding and Sectioning    Mark individual data points (outliers) to be excluded from a fit, or mark
Data (p. 2-25)              a range of data points (sectioning) to be excluded from a fit.
Additional Preprocessing    Additional preprocessing steps not available through the Data GUI,
Steps (p. 2-40)             such as transforming the response data and removing Infs, NaNs, and
                            outliers from a data set.
Selected Bibliography       Resources for additional information.
(p. 2-42)
2   Importing, Viewing, and Preprocessing Data
The Data Sets pane is shown below followed by a description of its features.
                             Construct and
                             name the data set.
2-2
                                                                   Importing Data Sets
                                                                                   2-3
2   Importing, Viewing, and Preprocessing Data
                                The predictor and response data are displayed graphically in the Preview
                                window. Weights and data points containing Infs or NaNs are not displayed.
                                You should specify a meaningful name when you import multiple data sets.
                                If you do not specify a name, the default name, which is constructed from the
                                selected variable names, is used.
2-4
                                                                    Importing Data Sets
   The Data sets list box displays all the data sets added to the toolbox. Note
   that you can construct data sets from workspace variables, or by smoothing
   an existing data set.
   If your data contains Infs or complex values, a warning message such as the
   message shown below is displayed.
The Data Sets pane shown below displays the imported ENSO data in the
Preview window. After you click the Create data set button, the data set enso
is added to the Data sets list box. You can then view, rename, or delete enso by
selecting it in the list box and clicking the appropriate button.
                                                                                    2-5
2   Importing, Viewing, and Preprocessing Data
      Viewing Data
                             The Curve Fitting Toolbox provides two ways to view imported data:
GUI toolbar
Tools menu
2-6
                                                                             Viewing Data
You can change the color, line width, line style, and marker type of the
displayed data points using the right-click menu shown below. You activate
this menu by placing your mouse over a data point and right-clicking. Note that
a similar menu is available for fitted curves.
Right-click menu
The ENSO data is shown below after the display has been enhanced using
several of these tools.
                                                                                          2-7
2   Importing, Viewing, and Preprocessing Data
                             • Data set — Lists the names of the viewed data set and the associated
                               variables. The data is displayed graphically below this list.
                               The index, predictor data (X), response data (Y), and weights (if imported)
                               are displayed numerically in the table. If the data contains Infs or NaNs,
                               those values are labeled “ignored.” If the data contains complex numbers,
                               only the real part is displayed.
                             • Exclusion rules — Lists all the exclusion rules that are compatible with the
                               viewed data set. When you select an exclusion rule, the data points marked
                               for exclusion are grayed in the table, and are identified with an “x” in the
                               graphical display. To exclude the data points while fitting, you must select
                               the exclusion rule in the Fitting GUI.
                               An exclusion rule is compatible with the viewed data set if their lengths are
                               the same, or if it is created by sectioning only.
2-8
                                                                                Smoothing Data
Smoothing Data
          If your data is noisy, you might need to apply a smoothing algorithm to expose
          its features, and to provide a reasonable starting approach for parametric
          fitting. The two basic assumptions that underlie smoothing are
          • The relationship between the response data and the predictor data is
            smooth.
          • The smoothing process results in a smoothed value that is a better estimate
            of the original value because the noise has been reduced.
            The smoothing process attempts to estimate the average of the distribution
            of each response value. The estimation is based on a specified number of
            neighboring response values.
          You can think of smoothing as a local fit because a new response value is
          created for each original response value. Therefore, smoothing is similar to
          some of the nonparametric fit types supported by the toolbox, such as
          smoothing spline and cubic interpolation. However, this type of fitting is not
          the same as parametric fitting, which results in a global parameterization of
          the data.
          Note You should not fit data with a parametric model after smoothing,
          because the act of smoothing invalidates the assumption that the errors are
          normally distributed. Instead, you should consider smoothing to be a data
          exploration technique.
          There are two common types of smoothing methods: filtering (averaging) and
          local regression. Each smoothing method requires a span. The span defines a
          window of neighboring points to include in the smoothing calculation for each
          data point. This window moves across the data set as the smoothed response
          value is calculated for each predictor value. A large span increases the
          smoothness but decreases the resolution of the smoothed data set, while a
          small span decreases the smoothness but increases the resolution of the
          smoothed data set. The optimal span value depends on your data set and the
          smoothing method, and usually requires some experimentation to find.
                                                                                           2-9
2   Importing, Viewing, and Preprocessing Data
                             Note that you can also smooth data using a smoothing spline. Refer to
                             “Nonparametric Fitting” on page 3-69 for more information.
                             You smooth data with the Smooth pane of the Data GUI. The pane is shown
                             below followed by a description of its features.
Data sets
                             Smoothing method
                             and parameters
2-10
                                                                      Smoothing Data
Data Sets
• Original data set — Select the data set you want to smooth.
• Smoothed data set — Specify the name of the smoothed data set. Note that
  the process of smoothing the original data set always produces a new data
  set containing smoothed response values.
                                                                                  2-11
2   Importing, Viewing, and Preprocessing Data
                               - Click Delete to delete one or more data sets. To select multiple data sets,
                                 you can use the Ctrl key and the mouse to select data sets one by one, or
                                 you can use the Shift key and the mouse to select a range of data sets.
                               - Click Save to workspace to save a single data set to a structure.
                                                    1
                                 y s ( i ) = ------------------ ( y ( i + N ) + y ( i + N – 1 ) + … + y ( i – N ) )
                                             2N + 1
                             where ys(i) is the smoothed value for the ith data point, N is the number of
                             neighboring data points on either side of ys(i), and 2N+1 is the span.
                             The moving average smoothing method used by the Curve Fitting Toolbox
                             follows these rules:
                             Note that you can use filter function to implement difference equations such
                             as the one shown above. However, because of the way that the end points are
                             treated, the toolbox moving average result will differ from the result returned
                             by filter. Refer to “Difference Equations and Filtering” in the MATLAB
                             documentation for more information.
                             For example, suppose you smooth data using a moving average filter with a
                             span of 5. Using the rules described above, the first four elements of ys are
                             given by
                                ys(1)     =   y(1)
                                ys(2)     =   (y(1)+y(2)+y(3))/3
                                ys(3)     =   (y(1)+y(2)+y(3)+y(4)+y(5))/5
                                ys(4)     =   (y(2)+y(3)+y(4)+y(5)+y(6))/5
2-12
                                                                               Smoothing Data
Note that ys(1), ys(2), ... ,ys(end) refer to the order of the data after sorting,
and not necessarily the original order.
The smoothed values and spans for the first four data points of a generated
data set are shown below.
40 40
20 20
 0                                              0
     0    2       4         6     8                 0      2     4         6     8
                      (a)                                            (b)
80                                             80
         Data                                           Data
         Smoothed value                                 Smoothed value
60                                             60
40 40
20 20
 0                                              0
     0    2       4         6     8                 0      2     4         6     8
                      (c)                                            (d)
Plot (a) indicates that the first data point is not smoothed because a span
cannot be constructed. Plot (b) indicates that the second data point is
smoothed using a span of three. Plots (c) and (d) indicate that a span of five
is used to calculate the smoothed value.
                                                                                         2-13
2   Importing, Viewing, and Preprocessing Data
                             1 Compute the regression weights for each data point in the span. The weights
                                are given by the tricube function shown below.
                                             x–x 3 3
                                 w i = ⎛ 1 – -------------i ⎞
                                       ⎝      d(x) ⎠
                                x is the predictor value associated with the response value to be smoothed,
                                xi are the nearest neighbors of x as defined by the span, and d(x) is the
                                distance along the abscissa from x to the most distant predictor value within
                                the span. The weights have these characteristics:
                               - The data point to be smoothed has the largest weight and the most
                                 influence on the fit.
                               - Data points outside the span have zero weight and no influence on the fit.
2-14
                                                                         Smoothing Data
If the smooth calculation involves the same number of neighboring data points
on either side of the smoothed data point, the weight function is symmetric.
However, if the number of neighboring points is not symmetric about the
smoothed data point, then the weight function is not symmetric. Note that
unlike the moving average smoothing process, the span never changes. For
example, when you smooth the data point with the smallest predictor value,
the shape of the weight function is truncated by one half, the leftmost data
point in the span has the largest weight, and all the neighboring points are to
the right of the smoothed value.
The weight function for an end point and for an interior point is shown below
for a span of 31 data points.
1.2
0.2
 0
           0         20             40              60        80   100
1.2
0.2
 0
           0         20             40              60        80   100
                                                                                    2-15
2   Importing, Viewing, and Preprocessing Data
                             Using the lowess method with a span of five, the smoothed values and
                             associated regressions for the first four data points of a generated data set are
                             shown below.
                                                                   Lowess Smoothing
                             80                                               80
                                      Data                                             Data
                                      Smoothed value                                   Smoothed value
                             60                                               60
40 40
20 20
                              0                                                0
                                  0    2         4         6   8                   0    2       4         6   8
                                                     (a)                                            (b)
                             80                                               80
                                      Data                                             Data
                                      Smoothed value                                   Smoothed value
                             60                                               60
40 40
20 20
                              0                                                0
                                  0    2         4         6   8                   0    2       4         6   8
                                                     (c)                                            (d)
                             Notice that the span does not change as the smoothing process progresses from
                             data point to data point. However, depending on the number of nearest
                             neighbors, the regression weight function might not be symmetric about the
                             data point to be smoothed. In particular, plots (a) and (b) use an asymmetric
                             weight function, while plots (c) and (d) use a symmetric weight function.
                             For the loess method, the graphs would look the same except the smoothed
                             value would be generated by a second-degree polynomial.
2-16
                                                                         Smoothing Data
2 Compute the robust weights for each data point in the span. The weights are
  given by the bisquare function shown below.
                                     2
        ⎧ ( 1 – ( r i ⁄ 6MAD ) 2 )       r i < 6MAD
   wi = ⎨
        ⎩               0                r i ≥ 6MAD
  ri is the residual of the ith data point produced by the regression smoothing
  procedure, and MAD is the median absolute deviation of the residuals:
MAD = median ( r )
  The median absolute deviation is a measure of how spread out the residuals
  are. If ri is small compared to 6MAD, then the robust weight is close to 1. If
  ri is greater than 6MAD, the robust weight is 0 and the associated data point
  is excluded from the smooth calculation.
3 Smooth the data again using the robust weights. The final smoothed value
  is calculated using both the local regression weight and the robust weight.
                                                                                      2-17
2   Importing, Viewing, and Preprocessing Data
                             The smoothing results of the lowess procedure are compared below to the
                             results of the robust lowess procedure for a generated data set that contains a
                             single outlier. The span for both procedures is 11 data points.
                               0
                                   0       1         2          3         4          5          6
                                                                    (a)
                               5
                                                                                                    residuals
                              −5
                                   0       1         2          3         4          5          6
                                                                    (b)
                              10
                                                                                              data
                                                                                              robust lowess
                               0
                                   0       1         2          3         4          5          6
                                                                    (c)
                             Plot (a) shows that the outlier influences the smoothed value for several
                             nearest neighbors. Plot (b) suggests that the residual of the outlier is greater
                             than six median absolute deviations. Therefore, the robust weight is zero for
                             this data point. Plot (c) shows that the smoothed values neighboring the
                             outlier reflect the bulk of the data.
2-18
                                                                      Smoothing Data
Savitzky-Golay Filtering
Savitzky-Golay filtering can be thought of as a generalized moving average.
You derive the filter coefficients by performing an unweighted linear least
squares fit using a polynomial of a given degree. For this reason, a
Savitzky-Golay filter is also called a digital smoothing polynomial filter or a
least squares smoothing filter. Note that a higher degree polynomial makes it
possible to achieve a high level of smoothing without attenuation of data
features.
The Savitzky-Golay filtering method is often used with frequency data or with
spectroscopic (peak) data. For frequency data, the method is effective at
preserving the high-frequency components of the signal. For spectroscopic
data, the method is effective at preserving higher moments of the peak such as
the line width. By comparison, the moving average filter tends to filter out a
significant portion of the signal's high-frequency content, and it can only
preserve the lower moments of a peak such as the centroid. However,
Savitzky-Golay filtering can be less successful than a moving average filter at
rejecting noise.
The Savitzky-Golay smoothing method used by the Curve Fitting Toolbox
follows these rules:
The plot shown below displays generated Gaussian data and several attempts
at smoothing using the Savitzky-Golay method. The data is very noisy and the
peak widths vary from broad to narrow. The span is equal to 5% of the number
of data points.
                                                                                  2-19
2   Importing, Viewing, and Preprocessing Data
                                                        Savitzky−Golay Smoothing
                             80
                                                                                                noisy data
                             60
40
20
                              0
                                  1        2       3          4          5         6        7                8
                                                                  (a)
                             80
                                                                                           data
                             60                                                            S−G quadratic
                             40
20
                              0
                                  1        2       3          4          5         6        7                8
                                                                  (b)
                             80
                                                                                             data
                             60                                                              S−G quartic
                             40
20
                              0
                                  1        2       3          4          5         6        7                8
                                                                   (c)
                             Plot (a) shows the noisy data. To more easily compare the smoothed results,
                             plots (b) and (c) show the data without the added noise.
                             Plot (b) shows the result of smoothing with a quadratic polynomial. Notice
                             that the method performs poorly for the narrow peaks. Plot (c) shows the
                             result of smoothing with a quartic polynomial. In general, higher degree
                             polynomials can more accurately capture the heights and widths of narrow
                             peaks, but can do poorly at smoothing wider peaks.
2-20
                                                                        Smoothing Data
                                                                                         2-21
2   Importing, Viewing, and Preprocessing Data
                             The Smooth pane shown below displays all the new data sets generated by
                             smoothing the original ENSO data set. Whenever you smooth a data set, a new
                             data set of smoothed values is created. The smoothed data sets are
                             automatically displayed in the Curve Fitting Tool. You can also display a single
                             data set graphically and numerically by clicking the View button.
2-22
                                                                          Smoothing Data
Use the Plotting GUI to display only the data sets of interest. As shown below,
the periodic structure of the ENSO data set becomes apparent when it is
smoothed using a moving average filter with the default span. Not
surprisingly, the uncovered structure is periodic, which suggests that a
reasonable parametric model should include trigonometric functions.
Refer to “General Equation: Fourier Series Fit” on page 3-52 for an example
that fits the ENSO data using a sum of sine and cosine functions.
                                                                                          2-23
2   Importing, Viewing, and Preprocessing Data
                             The saved structure contains the original predictor data x and the smoothed
                             data y.
                                smootheddata1
                                smootheddata1 =
                                    x: [168x1 double]
                                    y: [168x1 double]
2-24
                                                                      Excluding and Sectioning Data
           • Marking Outliers — Outliers are defined as individual data points that you
             exclude because they are inconsistent with the statistical nature of the bulk
             of the data.
           • Sectioning — Sectioning excludes a window of response or predictor data.
             For example, if many data points in a data set are corrupted by large
             systematic errors, you might want to section them out of the fit.
           For each of these methods, you must create an exclusion rule, which captures
           the range, domain, or index of the data points to be excluded.
           To exclude data while fitting, you use the Fitting GUI to associate the
           appropriate exclusion rule with the data set to be fit. Refer to “Example: Robust
           Fit” on page 3-62 for more information about fitting a data set using an
           exclusion rule.
                                                                                                 2-25
2   Importing, Viewing, and Preprocessing Data
                             You mark data to be excluded from a fit with the Exclude GUI, which you open
                             from the Curve Fitting Tool. The GUI is shown below followed by a description
                             of its features.
Exclusion rule.
                               Exclude individual
                               data points.
                             Exclusion Rule
                             • Exclusion rule name — Specify the name of the exclusion rule that
                               identifies the data points to be excluded from subsequent fits.
                             • Existing exclusion rules — Lists the names of all exclusion rules created
                               during the current session. When you select an existing exclusion rule, you
                               can perform these actions:
                               - Click Copy to copy the exclusion rule. The exclusions associated with the
                                 original exclusion rule are recreated in the GUI. You can modify these
                                 exclusions and then click Create exclusion rule to save them to the copied
                                 rule.
                               - Click Rename to change the name of the exclusion rule.
                               - Click Delete to delete the exclusion rule. To select multiple exclusion
                                 rules, you can use the Ctrl key and the mouse to select exclusion rules one
                                 by one, or you can use the Shift key and the mouse to select a range of
                                 exclusion rules.
                               - Click View to display the exclusion rule graphically. If a data set is
                                 associated with the exclusion rule, the data is also displayed.
2-26
                                                           Excluding and Sectioning Data
Marking Outliers
Outliers are defined as individual data points that you exclude from a fit
because they are inconsistent with the statistical nature of the bulk of the data,
and will adversely affect the fit results. Outliers are often readily identified by
a scatter plot of response data versus predictor data.
Marking outliers with the Curve Fitting Toolbox follows these rules:
                                                                                      2-27
2   Importing, Viewing, and Preprocessing Data
2-28
                                                                              Excluding and Sectioning Data
Two types of influential data points are shown below for generated data. Also
shown are cubic polynomial fits and a robust fit that is resistant to outliers.
100
                                                   These outliers adversely
                                                   affect the fit.
 50
      0      1          2          3   4       5           6          7        8       9       10
                                                   (a)
150
                                                                                             data
                                                                                             cubic fit
100
100
 50
      0      1          2          3   4       5           6          7        8       9       10
                                                   (c)
Plot (a) shows that the two influential data points are outliers and adversely
affect the fit. Plot (b) shows that the two influential data points are consistent
with the model and do not adversely affect the fit. Plot (c) shows that a robust
fitting procedure is an acceptable alternative to marking outliers for exclusion.
Robust fitting is described in “Robust Least Squares” on page 3-11.
                                                                                                         2-29
2   Importing, Viewing, and Preprocessing Data
                             Sectioning
                             Sectioning involves specifying a range of response data or a range of predictor
                             data to exclude. You might want to section a data set because different parts of
                             the data set are described by different models or many contiguous data points
                             are corrupted by noise, large systematic errors, and so on.
                             Sectioning data with the Curve Fitting Toolbox follows these rules:
                             • If you are only sectioning data and not excluding individual data points, then
                               you can create an exclusion rule without specifying a data set name.
                               Note that you can associate the exclusion rule with any data set provided
                               that the range or domain of the exclusion rule overlaps with the range or
                               domain of the data set. This is useful if you have multiple data sets from
                               which you want to exclude data points using the same range or domain.
                             • Using the Exclude GUI, you specify a range or domain of data to include. The
                               excluded data lies outside this specification.
                               Additionally, you can specify only a single range, domain, or box (range and
                               domain) of included data points. Therefore, at most, you can define two
                               vertical strips, two horizontal strips, or a border of excluded data. Refer to
                               “Example: Excluding and Sectioning Data” on page 2-32 for an example.
                               To exclude multiple sections of data, you can use the excludedata function
                               from the MATLAB command line.
2-30
                                                                                 Excluding and Sectioning Data
Two examples of sectioning by domain are shown below for generated data.
                                                         Sectioning Data
                      18
This setion fit with a
                       16
linear polynomical
                      14
18
16
14
12
                      10
                                    Data
                       8            Linear fit
                                    Cubic fit
                       6
                            0   2           4    6   8         10      12   14      16        18           20
                   The upper shows the data set sectioned by fit type. The section to the left of 4
                   is fit with a linear polynomial, as shown by the bold, dashed line. The section
                   to the right of 4 is fit with a cubic polynomial, as shown by the bold, solid line.
                   The lower plot shows the data set sectioned by fit type and by valid data. Here,
                   the rightmost section is not part of any fit because the data is corrupted by
                   noise.
                   Note For illustrative purposes, the preceding figures have been enhanced to
                   show portions of the curves with bold markers. The Curve Fitting Toolbox
                   does not use bold markers in plots.
                                                                                                                2-31
2   Importing, Viewing, and Preprocessing Data
                             Import the variables month and yy as the new data set enso1, and open the
                             Exclude GUI.
                             Assume that the first and last eight months of the data set are unreliable, and
                             should be excluded from subsequent fits. The simplest way to exclude these
                             data points is to section the predictor data. To do this, specify the range of data
                             you want to include in the Exclude X outside of field of the Section pane.
                             There are two ways to exclude individual data points: using the Check to
                             exclude point table or graphically. For this example, the simplest way to
                             exclude the outliers is graphically. To do this, select the data set name and click
                             the Exclude graphically button, which opens the Select Points for Exclusion
                             Rule GUI.
2-32
                                                          Excluding and Sectioning Data
To mark data points for exclusion in the GUI, place the mouse cursor over the
data point and left-click. The excluded data point is marked with a red X. To
include an excluded data point, right-click the data point or select the Include
Them radio button and left-click. Included data points are marked with a blue
circle. To select multiple data points, click the left mouse button and drag the
selection rubber band so that the rubber band box encompasses the desired
data points. Note that the GUI identifies sectioned data with gray strips. You
cannot graphically include sectioned data.
As shown below, the first and last eight months of data are excluded from the
data set by sectioning, and the two outliers are excluded graphically. Note that
the graphically excluded data points are identified in the Check to exclude
point table. If you decide to include an excluded data point using the table, the
graph is automatically updated.
If there are fits associated with the data, you can exclude data points based on
the residuals of the fit by selecting the residual data in the Y list.
                                                                                    2-33
2   Importing, Viewing, and Preprocessing Data
                             To save the exclusion rule, click the Create exclusion rule button. To exclude
                             the data from a fit, you must select the exclusion rule from the Fitting GUI.
                             Because the exclusion rule created in this example uses individually excluded
                             data points, you can use it only with data sets that are the same size as the
                             ENSO data set.
2-34
                                                           Excluding and Sectioning Data
Import the variables t and noisysine, and fit the data with a single-term sine
equation. The Fitting GUI, Fit Options GUI, and Curve Fitting Tool are shown
below. To display the fit starting values, click the Fit options button. Note that
                                                                                     2-35
2   Importing, Viewing, and Preprocessing Data
                             the amplitude starting point is reasonably close to the expected value, but the
                             frequency and phase constant are not, which produces a poor fit.
2-36
                                                          Excluding and Sectioning Data
1 Create an exclusion rule that includes one or two periods, and excludes the
   remaining data.
                                                                      Exclude data
                                                                      graphically.
                                                                                     2-37
2   Importing, Viewing, and Preprocessing Data
                             2 Create a new fit using the single-term sine equation with the exclusion rule
                                1Period applied.
                                The fit looks reasonable throughout the entire data set. However, because
                                the global fit was based on a small fraction of data, goodness of fit statistics
                                will not provide much insight into the fit quality.
2-38
                                                                  Excluding and Sectioning Data
3 Fit the entire data set using the fitted coefficient values from the previous
  step as starting values.
  The Fitting GUI, Fit Options GUI, and Curve Fitting Tool are shown below.
  Both the numerical and graphical fit results indicate a reasonable fit.
                                                                                           2-39
2   Importing, Viewing, and Preprocessing Data
                             Note You must transform variables at the MATLAB command line, and then
                             import those variables into the Curve Fitting Toolbox. You cannot transform
                             variables using any of the graphical user interfaces.
For example, suppose you want to use the following model to fit your data.
                                                  1
                                y = -------------------------------
                                           2
                                    ax + bx + c
                             If you decide to use the power transform y-1, then the transformed model is
                             given by
                                    –1              2
                                y        = ax + bx + c
                             As another example, the equation
                                               bx
                                y = ae
                             becomes linear if you take the log transform of both sides.
                                ln ( y ) = ln ( a ) + bx
                             You can now use linear least squares fitting procedures.
2-40
                                                         Additional Preprocessing Steps
  Note that the residual plot associated with the Curve Fitting Tool does not
  support transformed scales.
To remove NaNs, you can use the isnan function. For examples that remove
NaNs and outliers from a data set, refer to “Data Preprocessing” in the MATLAB
documentation.
                                                                                   2-41
2   Importing, Viewing, and Preprocessing Data
       Selected Bibliography
                             [1] Cleveland, W.S., “Robust Locally Weighted Regression and Smoothing
                             Scatterplots,” Journal of the American Statistical Association, Vol. 74, pp.
                             829-836, 1979.
                             [2] Cleveland, W.S. and S.J. Devlin, “Locally Weighted Regression: An
                             Approach to Regression Analysis by Local Fitting,” Journal of the American
                             Statistical Association, Vol. 83, pp. 596-610, 1988.
                             [3] Chambers, J., W.S. Cleveland, B. Kleiner, and P. Tukey, Graphical Methods
                             for Data Analysis, Wadsworth International Group, Belmont, CA, 1983.
                             [4] Press, W.H., S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery,
                             Numerical Recipes in C, The Art of Scientific Computing, Cambridge
                             University Press, Cambridge, England, 1993.
                             [5] Goodall, C., “A Survey of Smoothing Techniques,” Modern Methods of Data
                             Analysis, (J. Fox and J.S. Long, eds.), Sage Publications, Newbury Park, CA,
                             pp. 126-176, 1990.
                             [6] Hutcheson, M.C., “Trimmed Resistant Weighted Scatterplot Smooth,”
                             Master’s Thesis, Cornell University, Ithaca, NY, 1995.
                             [7] Orfanidis, S.J., Introduction to Signal Processing, Prentice-Hall, Englewood
                             Cliffs, NJ, 1996.
2-42
                                                                                  3
Fitting Data
Curve fitting refers to fitting curved lines to data. The curved line comes from regression techniques,
a spline calculation, or interpolation. The data can be measured from a sensor, generated from a
simulation, historical, and so on. The goal of curve fitting is to gain insight into your data. The insight
will enable you to improve data acquisition techniques for future experiments, accept or refute a
theoretical model, extract physical meaning from fitted coefficients, and draw conclusions about the
data’s parent population.
This chapter describes how to fit data and evaluate the goodness of fit with the Curve Fitting Toolbox.
The sections are as follows.
The Fitting Process (p. 3-2) The general steps you use when fitting any data set.
Parametric Fitting (p. 3-4)    Fit your data using parametric models such as polynomials and
                               exponentials, specify fit options such as the fitting algorithm and
                               coefficient starting points, and evaluate the goodness of fit using
                               graphical and numerical techniques.
                               Parametric fitting produces coefficients that describe the data globally,
                               and often have physical meaning.
Nonparametric Fitting          Fit your data using nonparametric fit types such as splines and
(p. 3-69)                      interpolants.
                               Nonparametric fitting is useful when you want to fit a smooth curve
                               through your data, and you are not interested in interpreting fitted
                               coefficients.
Selected Bibliography          Resources for additional information.
(p. 3-76)
3   Fitting Data
3-2
                                                                        The Fitting Process
  - Select the name of the current fit. When you click New fit or Copy fit, a
    default fit name is automatically created in the Fit name field. You can
    specify a new fit name by editing this field.
  - Select the name of the current data set from the Data set list. All imported
    and smoothed data sets are listed.
  If you want to exclude data from a fit, select an exclusion rule from the
  Exclusion rule list. The list contains only exclusion rules that are
  compatible with the current data set. An exclusion rule is compatible with
  the current data set if their lengths are identical, or if it is created by
  sectioning only.
3 Select a fit type and fit options, fit the data, and evaluate the goodness of fit.
4 Compare fits.
  - Compare the current fit and data set to previous fits and data sets by
    examining the goodness of fit statistics.
  - Use the Table Options GUI to modify which goodness of fit statistics are
    displayed in the Table of Fits. You can sort the table by clicking on any
    column heading.
  If the fit is good, save the results as a structure to the MATLAB workspace.
  Otherwise, modify the fit options or select another model.
                                                                                        3-3
3   Fitting Data
      Parametric Fitting
                   Parametric fitting involves finding coefficients (parameters) for one or more
                   models that you fit to data. The data is assumed to be statistical in nature and
                   is divided into two components: a deterministic component and a random
                   component.
                      data = deterministic component + random component
                   The deterministic component is given by the fit and the random component is
                   often described as error associated with the data.
                      data = fit + error
                   The fit is given by a model that is a function of the independent (predictor)
                   variable and one or more coefficients. The error represents random variations
                   in the data that follow a specific probability distribution (usually Gaussian).
                   The variations can come from many different sources, but are always present
                   at some level when you are dealing with measured data. Systematic variations
                   can also exist, but they can be difficult to quantify.
                   The fitted coefficients often have physical significance. For example, suppose
                   you have collected data that corresponds to a single decay mode of a radioactive
                   nuclide, and you want to find the half-life (T1/2) of the decay. The law of
                   radioactive decay states that the activity of a radioactive substance decays
                   exponentially in time. Therefore, the model to use in the fit is given by
                                 – λt
                      y = y0 e
                   Both y0 and λ are coefficients determined by the fit. Because T1/2 = ln(2)/λ, the
                   fitted value of the decay constant yields the half-life. However, because the
                   data contains some error, the deterministic component of the equation cannot
                   completely describe the variability in the data. Therefore, the coefficients and
                   half-life calculation will have some uncertainty associated with them. If the
                   uncertainty is acceptable, then you are done fitting the data. If the uncertainty
                   is not acceptable, then you might have to take steps to reduce the error and
                   repeat the data collection process.
3-4
                                                                        Parametric Fitting
Normal Distribution
The errors are assumed to be normally distributed because the normal
distribution often provides an adequate approximation to the distribution of
many measured quantities. Although the least squares fitting method does not
assume normally distributed errors when calculating parameter estimates, the
method works best for data that does not contain a large number of random
errors with extreme values. The normal distribution is one of the probability
distributions in which extreme random errors are uncommon. However,
statistical results such as confidence and prediction bounds do require
normally distributed errors for their validity.
Zero Mean
If the mean of the errors is zero, then the errors are purely random. If the mean
is not zero, then it might be that the model is not the right choice for your data,
or the errors are not purely random and contain systematic errors.
Constant Variance
A constant variance in the data implies that the “spread” of errors is constant.
Data that has the same variance is sometimes said to be of equal quality.
The assumption that the random errors have constant variance is not implicit
to weighted least squares regression. Instead, it is assumed that the weights
provided in the fitting procedure correctly indicate the differing levels of
quality present in the data. The weights are then used to adjust the amount of
influence each data point has on the estimates of the fitted coefficients to an
appropriate level.
                                                                                       3-5
3   Fitting Data
                      r i = y i – ŷ i
                      residual = data – fit
                            ∑              ∑ ( yi – ŷi )
                                   2                        2
                      S =         ri   =
                            i=1            i=1
                   where n is the number of data points included in the fit and S is the sum of
                   squares error estimate. The supported types of least squares fitting include
y = p1 x + p2
3-6
                                                                      Parametric Fitting
To solve this equation for the unknown coefficients p1 and p2, you write S as a
system of n simultaneous linear equations in two unknowns. If n is greater
than the number of unknowns, then the system of equations is overdetermined.
                n
              ∑ ( yi – ( p1 xi + p2 ) )
                                                            2
   S =
             i=1
Because the least squares fitting process minimizes the summed square of the
residuals, the coefficients are determined by differentiating S with respect to
each parameter, and setting the result equal to zero.
                            n
   ∂S
   ∂ p1
        = –2              ∑ xi ( yi – ( p1 xi + p2 ) )          = 0
                         i=1
                           n
     ∂S
     ∂ p2
          = –2              ∑ ( yi – ( p1 xi + p2 ) )           = 0
                           i=1
∑ xi ( yi – ( b1 xi + b2 ) ) = 0
∑ ( yi – ( b1 xi + b2 ) ) = 0
where the summations run from i =1 to n. The normal equations are defined as
        ∑ xi + b2 ∑ xi = ∑ xi yi
                 2
   b1
         b1   ∑ xi + nb2 = ∑ yi
Solving for b1
                    ∑
         n xi yi – xi yi               ∑ ∑
   b 1 = ----------------------------------------------------
                                                      2
                     ∑                 ∑
                            2
             n xi – ( xi )
                                                                                     3-7
3   Fitting Data
                            1
                      b 2 = --- (
                            n       ∑ yi – b1 ∑ xi )
                   As you can see, estimating the coefficients p1 and p2 requires only a few simple
                   calculations. Extending this example to a higher degree polynomial is
                   straightforward although a bit tedious. All that is required is an additional
                   normal equation for each linear term added to the model.
                   In matrix form, linear models are given by the formula
                      y = Xβ + ε
                   where
                       y1           x1 1
                       y2           x2 1
                       y3           x3 1       p1
                                =          ×
                        .            .         p2
                        .            .
                        .            .
                       yn           xn 1
                   The least squares solution to the problem is a vector b, which estimates the
                   unknown vector of coefficients β. The normal equations are given by
                            T              T
                      ( X X )b = X y
3-8
                                                                      Parametric Fitting
           T       –1    T
   b = (X X) X y
In MATLAB, you can use the backslash operator to solve a system of
simultaneous linear equations for unknown coefficients. Because inverting
XTX can lead to unacceptable rounding errors, MATLAB uses QR
decomposition with pivoting, which is a very stable algorithm numerically.
Refer to “Arithmetic Operators” in the MATLAB documentation for more
information about the backslash operator and QR decomposition.
You can plug b back into the model formula to get the predicted response
values, ŷ .
   ŷ = Xb = Hy
               T        –1   T
   H = X( X X ) X
   r = y – ŷ = ( 1 – H )y
Refer to [1] or [2] for a complete description of the matrix representation of
least squares regression.
         ∑ wi ( yi – ŷi )
                                 2
   S =
         i=1
                                                                                     3-9
3   Fitting Data
                   where wi are the weights. The weights determine how much each response
                   value influences the final parameter estimates. A high-quality data point
                   influences the fit more than a low-quality data point. Weighting your data is
                   recommended if the weights are known, or if there is justification that they
                   follow a particular form.
                   The weights modify the expression for the parameter estimates b in the
                   following way,
                                  T     –1   T
                        b = β̂ = ( X WX ) X Wy
                   where W is given by the diagonal elements of the weight matrix w.
                   You can often determine whether the variances are not constant by fitting the
                   data and plotting the residuals. In the plot shown below, the data contains
                   replicate data of various quality and the fit is assumed to be correct. The poor
                   quality data is revealed in the plot of residuals, which has a “funnel” shape
                   where small predictor values yield a bigger scatter in the response values than
                   large predictor values.
                       100
                                                                                    data
                                                                                    fitted curve
                        80
                        60
                   y
40
20
                         0
                             0     1             2       3           4          5                  6
                                                         x
                        15
                                                                                      residuals
                        10
−5
−10
                       −15
                             0     1             2       3           4          5                  6
3-10
                                                                       Parametric Fitting
The weights you supply should transform the response variances to a constant
value. If you know the variances of your data, then the weights are given by
                2
   wi = 1 ⁄ σ
If you don’t know the variances, you can approximate the weights using an
equation such as
                              –1
         ⎛ n              ⎞
   w i = ⎜ --- ( yi – y ) ⎟
           1
             ∑
                         2
         ⎜n               ⎟
         ⎝ i=1            ⎠
This equation works well if your data set contains replicates. In this case, n is
the number of sets of replicates. However, the weights can vary greatly. A
better approach might be to plot the variances and fit the data using a sensible
model. The form of the model is not very important — a polynomial or power
function works well in many cases.
• Least absolute residuals (LAR) — The LAR scheme finds a curve that
  minimizes the absolute difference of the residuals, rather than the squared
  differences. Therefore, extreme values have a lesser influence on the fit.
• Bisquare weights — This scheme minimizes a weighted sum of squares,
  where the weight given to each data point depends on how far the point is
  from the fitted line. Points near the line get full weight. Points farther from
                                                                                    3-11
3   Fitting Data
                     the line get reduced weight. Points that are farther from the line than would
                     be expected by random chance get zero weight.
                     For most cases, the bisquare weight scheme is preferred over LAR because it
                     simultaneously seeks to find a curve that fits the bulk of the data using the
                     usual least squares approach, and it minimizes the effect of outliers.
                     ri are the usual least squares residuals and hi are leverages that adjust the
                     residuals by downweighting high-leverage data points, which have a large
                     effect on the least squares fit. The standardized adjusted residuals are given
                     by
                          r adj
                      u = ----------
                            Ks
                     Note that if you supply your own regression weight vector, the final weight
                     is the product of the robust weight and the regression weight.
3-12
                                                                          Parametric Fitting
4 If the fit converges, then you are done. Otherwise, perform the next iteration
    of the fitting procedure by returning to the first step.
The plot shown below compares a regular linear fit with a robust fit using
bisquare weights. Notice that the robust fit follows the bulk of the data and is
not strongly influenced by the outliers.
    30
             Data
             Regular linear fit
             Robust fit w/bisquare weights
    25
20
    15
y
10
    −5
         0     2        4         6          8   10   12   14   16   18     20
                                                  x
Instead of minimizing the effects of outliers by using robust regression, you can
mark data points to be excluded from the fit. Refer to “Excluding and
Sectioning Data” on page 2-25 for more information.
                                                                                       3-13
3   Fitting Data
                      y = f ( X, β ) + ε
                   where
                   Nonlinear models are more difficult to fit than linear models because the
                   coefficients cannot be estimated using simple matrix techniques. Instead, an
                   iterative approach is required that follows these steps:
                   1 Start with an initial estimate for each coefficient. For some nonlinear
                     models, a heuristic approach is provided that produces reasonable starting
                     values. For other models, random values on the interval [0,1] are provided.
                   2 Produce the fitted curve for the current set of coefficients. The fitted
                     response value ŷ is given by
                      ŷ = f ( X, b )
                     and involves the calculation of the Jacobian of f(X,b), which is defined as a
                     matrix of partial derivatives taken with respect to the coefficients.
3-14
                                                                       Parametric Fitting
3 Adjust the coefficients and determine whether the fit improves. The
  direction and magnitude of the adjustment depend on the fitting algorithm.
  The toolbox provides these algorithms:
  - Trust-region — This is the default algorithm and must be used if you
    specify coefficient constraints. It can solve difficult nonlinear problems
    more efficiently than the other algorithms and it represents an
    improvement over the popular Levenberg-Marquardt algorithm.
  - Levenberg-Marquardt — This algorithm has been used for many years
    and has proved to work most of the time for a wide range of nonlinear
    models and starting values. If the trust-region algorithm does not produce
    a reasonable fit, and you do not have coefficient constraints, you should try
    the Levenberg-Marquardt algorithm.
  - Gauss-Newton — This algorithm is potentially faster than the other
    algorithms, but it assumes that the residuals are close to zero. It’s included
    with the toolbox for pedagogical reasons and should be the last choice for
    most models and data sets.
  For more information about the trust region algorithm, refer to [4] and to
  “Trust Region Methods for Nonlinear Minimization” in the Optimization
  Toolbox documentation. For more information about the
  Levenberg-Marquardt and Gauss-Newton algorithms, refer to “Nonlinear
  Least Squares Implementation” in the same guide. Additionally, the
  Levenberg-Marquardt algorithm is described in [5] and [6].
4 Iterate the process by returning to step 2 until the fit reaches the specified
  convergence criteria.
You can use weights and robust fitting for nonlinear models, and the fitting
process is modified accordingly.
Because of the nature of the approximation process, no algorithm is foolproof
for all nonlinear models, data sets, and starting points. Therefore, if you do not
achieve a reasonable fit using the default starting points, algorithm, and
convergence criteria, you should experiment with different options. Refer to
“Specifying Fit Options” on page 3-23 for a description of how to modify the
default options. Because nonlinear models can be particularly sensitive to the
starting points, this should be the first fit option you modify.
                                                                                     3-15
3   Fitting Data
                   Library Models
                   The parametric library models provided by the Curve Fitting Toolbox are
                   described below.
                   Exponentials
                   The toolbox provides a one-term and a two-term exponential model.
                               bx
                      y = ae
                               bx          dx
                      y = ae        + ce
                   Exponentials are often used when the rate of change of a quantity is
                   proportional to the initial amount of the quantity. If the coefficient associated
                   with e is negative, y represents exponential decay. If the coefficient is positive,
                   y represents exponential growth.
                   For example, a single radioactive decay mode of a nuclide is described by a
                   one-term exponential. a is interpreted as the initial number of nuclei, b is the
                   decay constant, x is time, and y is the number of remaining nuclei after a
                   specific amount of time passes. If two decay modes exist, then you must use the
                   two-term exponential model. For each additional decay mode, you add another
                   exponential term to the model.
                   Examples of exponential growth include contagious diseases for which a cure
                   is unavailable, and biological populations whose growth is uninhibited by
                   predation, environmental factors, and so on.
                   Fourier Series
                   The Fourier series is a sum of sine and cosine functions that is used to describe
                   a periodic signal. It is represented in either the trigonometric form or the
                   exponential form. The toolbox provides the trigonometric Fourier series form
                   shown below,
                                     n
                   where a0 models any DC offset in the signal and is associated with the i = 0
                   cosine term, w is the fundamental frequency of the signal, n is the number of
                   terms (harmonics) in the series, and 1 ≤ n ≤ 8 .
3-16
                                                                       Parametric Fitting
For more information about the Fourier series, refer to “Fourier Analysis and
the Fast Fourier Transform” in the MATLAB documentation. For an example
that fits the ENSO data to a custom Fourier series model, refer to “General
Equation: Fourier Series Fit” on page 3-52.
Gaussian
The Gaussian model is used for fitting peaks, and is given by the equation
                       x–b 2
          n        – ⎛ -------------i⎞
                     ⎝ ci ⎠
   y=    ∑ ai e
         i=1
Polynomials
Polynomial models are given by
         n+1
          ∑ pi x
                   n+1–i
   y =
         i=1
Polynomials are often used when a simple empirical model is required. The
model can be used for interpolation or extrapolation, or it can be used to
characterize data using a global fit. For example, the temperature-to-voltage
                                                                                    3-17
3   Fitting Data
                   Note If you do not require a global parametric fit and want to maximize the
                   flexibility of the fit, piecewise polynomials might provide the best approach.
                   Refer to “Nonparametric Fitting” on page 3-69 for more information.
                   The main advantages of polynomial fits include reasonable flexibility for data
                   that is not too complicated, and they are linear, which means the fitting process
                   is simple. The main disadvantage is that high-degree fits can become unstable.
                   Additionally, polynomials of any degree can provide a good fit within the data
                   range, but can diverge wildly outside that range. Therefore, you should
                   exercise caution when extrapolating with polynomials. Refer to “Determining
                   the Best Fit” on page 1-10 for examples of good and poor polynomial fits to
                   census data.
                   Note that when you fit with high-degree polynomials, the fitting procedure
                   uses the predictor values as the basis for a matrix with very large values, which
                   can result in scaling problems. To deal with this, you should normalize the data
                   by centering it at zero mean and scaling it to unit standard deviation. You
                   normalize data by selecting the Center and scale X data check box on the
                   Fitting GUI.
                   Power Series
                   The toolbox provides a one-term and a two-term power series model.
                               b
                      y = ax
                                   c
                      y = a + bx
                   Power series models are used to describe a variety of data. For example, the
                   rate at which reactants are consumed in a chemical reaction is generally
                   proportional to the concentration of the reactant raised to some power.
3-18
                                                                       Parametric Fitting
Rationals
Rational models are defined as ratios of polynomials and are given by
                  n+1
                    ∑ pi x
                                  n+1–i
            i=1
    y = -------------------------------------------
                                                  -
                             m
                           ∑ qi x
                  m                        m–i
              x       +
                          i=1
Like polynomials, rationals are often used when a simple empirical model is
required. The main advantage of rationals is their flexibility with data that has
complicated structure. The main disadvantage is that they become unstable
when the denominator is around zero. For an example that uses rational
polynomials of various degrees, refer to “Example: Rational Fit” on page 3-41.
Sum of Sines
The sum of sines model is used for fitting periodic functions, and is given by the
equation
                n
   y =        ∑ ai sin ( bi x + ci )
             i=1
where a is the amplitude, b is the frequency, and c is the phase constant for
each sine wave term. n is the number of terms in the series and 1 ≤ n ≤ 8 . This
equation is closely related to the Fourier series described previously. The main
                                                                                     3-19
3   Fitting Data
                   difference is that the sum of sines equation includes the phase constant, and
                   does not include a DC offset term.
                   Weibull Distribution
                   The Weibull distribution is widely used in reliability and life (failure rate) data
                   analysis. The toolbox provides the two-parameter Weibull distribution
                                             b
                                b – 1 – ax
                      y = abx       e
                   where a is the scale parameter and b is the shape parameter. Note that there
                   is also a three-parameter Weibull distribution with x replaced by x – c where c
                   is the location parameter. Additionally, there is a one-parameter Weibull
                   distribution where the shape parameter is fixed and only the scale parameter
                   is fitted. To use these distributions, you must create a custom equation.
                   Note that the Curve Fitting Toolbox does not fit Weibull probability
                   distributions to a sample of data. Instead, it fits curves to response and
                   predictor data such that the curve has the same shape as a Weibull
                   distribution.
                   Custom Equations
                   If the toolbox library does not contain the desired parametric equation, you
                   must create your own custom equation. However, if possible, you should use
                   the library equations because they offer the best chance for rapid convergence.
                   This is because
                   • For most models, optimal default coefficient starting points are calculated.
                     For custom equations, the default starting points are chosen at random on
                     the interval [0,1]. Refer to “Default Coefficient Parameters” on page 3-26 for
                     more information.
                   • An analytic Jacobian is used instead of finite differencing.
                   • When using the Analysis GUI, analytic derivatives are calculated as well as
                     analytic integrals if the integral can be expressed in closed form.
                   Note To save custom equations for later use, you should save the
                   curve-fitting session with the File-> Save Session menu item.
3-20
                                                                      Parametric Fitting
You create custom equations with the Create Custom Equation GUI. The GUI
contains two panes: a pane for creating linear equations and a pane for creating
general (nonlinear) equations. These panes are described below.
Linear Equations
Linear equations are defined as equations that are linear in the parameters.
For example, the polynomial library equations are linear. The Linear
Equations pane is shown below followed by a description of its parameters.
                                                                                   3-21
3   Fitting Data
                   General Equations
                   General (nonlinear) equations are defined as equations that are nonlinear in
                   the parameters, or are a combination of linear and nonlinear in the
                   parameters. For example, the exponential library equations are nonlinear. The
                   General Equations pane is shown below followed by a brief description of its
                   parameters.
3-22
                                                                        Parametric Fitting
Note that even if you define a linear equation, a nonlinear fitting procedure is
used. Although this is allowed by the toolbox, it is an inefficient process and can
result in less than optimal fitted coefficients. Instead, you should use the
Linear Equations pane to define the equation.
Coefficient parameters
The available GUI options depend on whether you are fitting your data using
a linear model, a nonlinear model, or a nonparametric fit type. All the options
described below are available for nonlinear models. Method, Robust, and
coefficient constraints (Lower and Upper) are available for linear models.
Interpolants and smoothing splines include Method, but no configurable
options.
                                                                                      3-23
3   Fitting Data
                   • Any nonlinear custom equation — that is, a nonlinear equation that you
                     write.
                   • Some, but not all, of the nonlinear equations provided with the Curve Fitting
                     Toolbox.
3-24
                                                                   Parametric Fitting
Coefficient Parameters
• Unknowns — Symbols for the unknown coefficients to be fitted.
• StartPoint — The coefficient starting values. The default values depend on
  the model. For rational, Weibull, and custom models, default values are
  randomly selected within the range [0,1]. For all other nonlinear library
  models, the starting values depend on the data set and are calculated
  heuristically.
• Lower — Lower bounds on the fitted coefficients. The bounds are used only
  with the trust region fitting algorithm. The default lower bounds for most
  library models are -Inf, which indicates that the coefficients are
  unconstrained. However, a few models have finite default lower bounds. For
  example, Gaussians have the width parameter constrained so that it cannot
  be less than 0.
• Upper — Upper bounds on the fitted coefficients. The bounds are used only
  with the trust region fitting algorithm. The default upper bounds for all
  library models are Inf, which indicates that the coefficients are
  unconstrained.
For more information about these fit options, refer to “Optimization Options
Parameters” in the Optimization Toolbox documentation.
                                                                                3-25
3   Fitting Data
                   Note that the sum of sines and Fourier series models are particularly sensitive
                   to starting points, and the optimized values might be accurate for only a few
                   terms in the associated equations. For an example that overrides the default
                   starting values for the sum of sines model, refer to “Example: Sectioning
                   Periodic Data” on page 2-35.
3-26
                                                                        Parametric Fitting
• Residuals
• Goodness of fit statistics
• Confidence and prediction bounds
You can group these measures into two types: graphical and numerical. The
residuals and prediction bounds are graphical measures, while the goodness of
fit statistics and confidence bounds are numerical measures.
Generally speaking, graphical measures are more beneficial than numerical
measures because they allow you to view the entire data set at once, and they
can easily display a wide range of relationships between the model and the
data. The numerical measures are more narrowly focused on a particular
aspect of the data and often try to compress that information into a single
number. In practice, depending on your data and analysis requirements, you
might need to use both types to determine the best fit.
Note that it is possible that none of your fits can be considered the best one. In
this case, it might be that you need to select a different model. Conversely, it is
also possible that all the goodness of fit measures indicate that a particular fit
is the best one. However, if your goal is to extract fitted coefficients that have
physical meaning, but your model does not reflect the physics of the data, the
resulting coefficients are useless. In this case, understanding what your data
represents and how it was measured is just as important as evaluating the
goodness of fit.
Residuals
The residuals from a fitted model are defined as the differences between the
response data and the fit to the response data at each predictor value.
   residual = data - fit
You display the residuals in the Curve Fitting Tool by selecting the menu item
View->Residuals.
                                                                                      3-27
3   Fitting Data
                            r = y – ŷ
                   Assuming the model you fit to the data is correct, the residuals approximate
                   the random errors. Therefore, if the residuals appear to behave randomly, it
                   suggests that the model fits the data well. However, if the residuals display a
                   systematic pattern, it is a clear sign that the model fits the data poorly.
                   A graphical display of the residuals for a first degree polynomial fit is shown
                   below. The top plot shows that the residuals are calculated as the vertical
                   distance from the data point to the fitted curve. The bottom plot shows that the
                   residuals are displayed relative to the fit, which is the zero line.
                   12           Data
                                Linear Fit
                   10
                    0
                        0        1           2   3   4   5   6   7   8   9   10    11
                                Residuals
                    3
                    2
                    1
                    0
                   −1
                   −2
                   −3
0 1 2 3 4 5 6 7 8 9 10 11
                   The residuals appear randomly scattered around zero indicating that the
                   model describes the data well.
3-28
                                                                      Parametric Fitting
12       Data
         Quadratic Fit
10
 0
     0    1          2   3   4   5   6    7    8     9    10    11
         Residuals
 3
 2
 1
 0
−1
−2
−3
0 1 2 3 4 5 6 7 8 9 10 11
The residuals are systematically positive for much of the data range indicating
that this model is a poor fit for the data.
                                                                                   3-29
3   Fitting Data
                   For the current fit, these statistics are displayed in the Results list box in the
                   Fit Editor. For all fits in the current curve-fitting session, you can compare the
                   goodness of fit statistics in the Table of fits.
                   Sum of Squares Due to Error. This statistic measures the total deviation of the
                   response values from the fit to the response values. It is also called the summed
                   square of residuals and is usually labeled as SSE.
                                    n
                                   ∑ wi ( yi – ŷi )
                                                        2
                      SSE =
                                  i=1
                   A value closer to 0 indicates a better fit. Note that the SSE was previously
                   defined in “The Least Squares Fitting Method” on page 3-6.
                   R-Square. This statistic measures how successful the fit is in explaining the
                   variation of the data. Put another way, R-square is the square of the correlation
                   between the response values and the predicted response values. It is also called
                   the square of the multiple correlation coefficient and the coefficient of multiple
                   determination.
                   R-square is defined as the ratio of the sum of squares of the regression (SSR)
                   and the total sum of squares (SST). SSR is defined as
                                     n
                                    ∑ wi ( ŷi – y )
                                                        2
                      SSR =
                                   i=1
                   SST is also called the sum of squares about the mean, and is defined as
                                     n
                                    ∑ wi ( yi – y )
                                                        2
                      SST =
                                   i=1
                      R-square = SSR
                                 ------------- = 1 – SSE
                                                     -------------
                                 SST                 SST
                   R-square can take on any value between 0 and 1, with a value closer to 1
                   indicating a better fit. For example, an R2 value of 0.8234 means that the fit
                   explains 82.34% of the total variation in the data about the average.
3-30
                                                                          Parametric Fitting
If you increase the number of fitted coefficients in your model, R-square might
increase although the fit may not improve. To avoid this situation, you should
use the degrees of freedom adjusted R-square statistic described below.
Note that it is possible to get a negative R-square for equations that do not
contain a constant term. If R-square is defined as the proportion of variance
explained by the fit, and if the fit is actually worse than just fitting a horizontal
line, then R-square is negative. In this case, R-square cannot be interpreted as
the square of a correlation.
Degrees of Freedom Adjusted R-Square. This statistic uses the R-square statistic
defined above, and adjusts it based on the residual degrees of freedom. The
residual degrees of freedom is defined as the number of response values n
minus the number of fitted coefficients m estimated from the response values.
   v = n–m
v indicates the number of independent pieces of information involving the n
data points that are required to calculate the sum of squares. Note that if
parameters are bounded and one or more of the estimates are at their bounds,
then those estimates are regarded as fixed. The degrees of freedom is increased
by the number of such parameters.
The adjusted R-square statistic is generally the best indicator of the fit quality
when you add additional coefficients to your model.
                           SSE ( n – 1 )
   adjusted R-square = 1 – -------------------------------
                                SST ( v )
The adjusted R-square statistic can take on any value less than or equal to 1,
with a value closer to 1 indicating a better fit.
Root Mean Squared Error. This statistic is also known as the fit standard error
and the standard error of the regression
   RMSE = s =             MSE
where MSE is the mean square error or the residual mean square
   MSE = SSE
         -------------
              v
A RMSE value closer to 0 indicates a better fit.
                                                                                        3-31
3   Fitting Data
                   Confidence and prediction bounds define the lower and upper values of the
                   associated interval, and define the width of the interval. The width of the
                   interval indicates how uncertain you are about the fitted coefficients, the
                   predicted observation, or the predicted fit. For example, a very wide interval
                   for the fitted coefficients can indicate that you should use more data when
                   fitting before you can say anything very definite about the coefficients.
                   The bounds are defined with a level of certainty that you specify. The level of
                   certainty is often 95%, but it can be any value such as 90%, 99%, 99.9%, and so
                   on. For example, you might want to take a 5% chance of being incorrect about
                   predicting a new observation. Therefore, you would calculate a 95% prediction
                   interval. This interval indicates that you have a 95% chance that the new
                   observation is actually contained within the lower and upper prediction
                   bounds.
3-32
                                                                        Parametric Fitting
Calculating and Displaying Confidence Bounds. The confidence bounds for fitted
coefficients are given by
   C = b±t S
where b are the coefficients produced by the fit, t is the inverse of Student's T
cumulative distribution function, and S is a vector of the diagonal elements
from the covariance matrix of the coefficient estimates, (XTX)-1s2. X is the
design matrix, XT is the transpose of X, and s2 is the mean squared error.
Refer to the tinv function, included with the Statistics Toolbox, for a
description of t. Refer to “Linear Least Squares” on page 3-6 for more
information about X and XT.
The confidence bounds are displayed in the Results list box in the Fit Editor
using the following format.
   p1 = 1.275     (1.113, 1.437)
The fitted value for the coefficient p1 is 1.275, the lower bound is 1.113, the
upper bound is 1.437, and the interval width is 0.324. By default, the
confidence level for the bounds is 95%. You can change this level to any value
with the View->Confidence Level menu item in the Curve Fitting Tool.
You can calculate confidence intervals at the command line with the confint
function.
                                                                                     3-33
3   Fitting Data
                                      2
                      P n, o = ŷ ± t s + xSx'
                                      2
                      P s, o = ŷ ± f s + xSx'
P n, f = ŷ ± t xSx'
                   The simultaneous prediction bounds for the function and for all predictor
                   values are given by
P s, f = ŷ ± f xSx'
                   You can graphically display prediction bounds two ways: using the Curve
                   Fitting Tool or using the Analysis GUI. With the Curve Fitting Tool, you can
                   display nonsimultaneous prediction bounds for new observations with
                   View->Prediction Bounds. By default, the confidence level for the bounds is
                   95%. You can change this level to any value with View->Confidence Level.
                   With the Analysis GUI, you can display nonsimultaneous prediction bounds for
                   the function or for new observations.
                   You can display numerical prediction bounds of any type at the command line
                   with the predint function.
3-34
                                                                                Parametric Fitting
yn + 1 ( xn + 1 ) = f ( xn + 1 ) + en + 1
where f(xn+1) is the true but unknown function you want to estimate at xn+1.
The likely values for the new observation or for the estimated function are
provided by the nonsimultaneous prediction bounds.
If instead you want the likely value of the new observation to be associated
with any predictor value, the previous equation becomes
yn + 1 ( x ) = f ( x ) + e
The likely values for this new observation or for the estimated function are
provided by the simultaneous prediction bounds.
The types of prediction bounds are summarized below.
                                                                                             3-35
3   Fitting Data
                   are wider than the fitted function intervals because of the additional
                   uncertainty in predicting a new response value (the fit plus random errors).
                                                                               y
                         1
                                                                                    0.5
                        0.5                                                          0
                         0                                                         −0.5
                              0       2       4         6       8         10              0         2       4         6       8         10
                                                  x                                                             x
                         1
                                                                                    0.5
                        0.5                                                          0
                         0                                                         −0.5
                              0       2       4         6       8         10              0         2       4         6       8         10
                                                  x                                                             x
3-36
                                                                        Parametric Fitting
After you import the data, fit it using a cubic polynomial and a fifth degree
polynomial. The data, fits, and residuals are shown below. You display the
residuals in the Curve Fitting Tool with the View->Residuals menu item.
Both models appear to fit the data well, and the residuals appear to be
randomly distributed around zero. Therefore, a graphical evaluation of the fits
does not reveal any obvious differences between the two equations.
                                                                                         3-37
3   Fitting Data
                   As expected, the fit results for poly3 are reasonable because the generated data
                   is cubic. The 95% confidence bounds on the fitted coefficients indicate that they
                   are acceptably accurate. However, the 95% confidence bounds for poly5
                   indicate that the fitted coefficients are not known accurately.
                   The goodness of fit statistics are shown below. By default, the adjusted
                   R-square and RMSE statistics are not displayed in the Table of Fits. To
                   display these statistics, open the Table Options GUI by clicking the Table
                   options button. The statistics do not reveal a substantial difference between
                   the two equations.
3-38
                                                                      Parametric Fitting
The 95% nonsimultaneous prediction bounds for new observations are shown
below. To display prediction bounds in the Curve Fitting Tool, select the
View->Prediction Bounds menu item. Alternatively, you can view prediction
bounds for the function or for new observations using the Analysis GUI.
The prediction bounds for poly3 indicate that new observations can be
predicted accurately throughout the entire data range. This is not the case for
poly5. It has wider prediction bounds in the area of the missing data,
apparently because the data does not contain enough information to estimate
the higher degree polynomial terms accurately. In other words, a fifth-degree
polynomial overfits the data. You can confirm this by using the Analysis GUI
to compute bounds for the functions themselves.
The 95% prediction bounds for poly5 are shown below. As you can see, the
uncertainty in estimating the function is large in the area of the missing data.
                                                                                   3-39
3   Fitting Data
                   Therefore, you would conclude that more data must be collected before you can
                   make accurate predictions using a fifth-degree polynomial.
                   In conclusion, you should examine all available goodness of fit measures before
                   deciding on the best fit. A graphical examination of the fit and residuals should
                   always be your initial approach. However, some fit characteristics are revealed
                   only through numerical fit results, statistics, and prediction bounds.
3-40
                                                                                 Parametric Fitting
The workspace now contains two new variables, temp and thermex:
Import these two variables into the Curve Fitting Tool and name the data set
CuThermEx.
For this data set, you will find the rational equation that produces the best fit.
As described in “Library Models” on page 3-16, rational models are defined as
a ratio of polynomials
                      n                n–1
       p1 x + p2 x                              + … + pn + 1
   y = -----------------------------------------------------------------------
                                                                             -
                m                   m–1
            x + q1 x                             + … + qm
where n is the degree of the numerator polynomial and m is the degree of the
denominator polynomial. Note that the rational equations are not associated
with physical parameters of the data. Instead, they provide a simple and
flexible empirical model that you can use for interpolation and extrapolation.
                                                                                              3-41
3   Fitting Data
                   As you can see by examining the shape of the data, a reasonable initial choice
                   for the rational model is quadratic/quadratic. The Fitting GUI configured for
                   this equation is shown below.
3-42
                                                                        Parametric Fitting
The fit clearly misses the data for the smallest and largest predictor values.
Additionally, the residuals show a strong pattern throughout the entire data
set indicating that a better fit is possible.
                                                                                         3-43
3   Fitting Data
                   For the next fit, try a cubic/cubic equation. The data, fit, and residuals are
                   shown below.
The numerical results shown below indicate that the fit did not converge.
3-44
                                                                           Parametric Fitting
Although the message in the Results window indicates that you might improve
the fit if you increase the maximum number of iterations, a better choice at this
stage of the fitting process is to use a different rational equation because the
current fit contains several discontinuities. These discontinuities are due to the
function blowing up at predictor values that correspond to the zeros of the
denominator.
As the next try, fit the data using a cubic/quadratic equation. The data, fit, and
residuals are shown below.
The fit is well behaved over the entire data range, and the residuals are
randomly scattered about zero. Therefore, you can confidently use this fit for
further analysis.
                                                                                              3-45
3   Fitting Data
                   The Create Custom Equation GUI contains two panes: one for creating linear
                   custom equations and one for creating general (nonlinear) custom equations.
                   These panes are described in the following examples.
θα
3-46
                                                                      Parametric Fitting
     y(x) =         ∑     an Pn ( x )
                  n=0
n Pn(x)
0 1
1 x
2 (1/2)(3x2– 1)
3 (1/2)(5x3 – 3x)
4 (1/8)(35x4 – 30x2 + 3)
                                                                                    3-47
3   Fitting Data
                   The first step is to load the 12C alpha-emission data from the file
                   carbon12alpha.mat, which is provided with the toolbox.
                       load carbon12alpha
The workspace now contains two new variables, angle and counts:
                   • angle is a vector of angles (in radians) ranging from 10o to 240o in 10o
                     increments.
                   • counts is a vector of raw alpha particle counts that correspond to the
                     emission angles in angle.
                   Import these two variables into the Curve Fitting Toolbox and name the data
                   set C12Alpha.
                   The Fit Editor for a custom equation fit type is shown below.
3-48
                                                                                              Parametric Fitting
Fit the data using a fourth-degree Legendre polynomial with only even terms:
Because the Legendre polynomials depend only on the predictor variable and
constants, you use the Linear Equations pane on the Create Custom Equation
GUI. This pane is shown below for the model given by y1(x). Note that because
angle is given in radians, the argument of the Legendre terms is given by
cos(θα).
                                                                            Specify a meaningful
                                                                            equation name.
                                                                                                           3-49
3   Fitting Data
                   The fit and residuals are shown below. The fit appears to follow the trend of the
                   data well, while the residuals appear to be randomly distributed and do not
                   exhibit any systematic behavior.
                   The numerical fit results are shown below. The 95% confidence bounds indicate
                   that the coefficients associated with P0(x) and P4(x) are known fairly
                   accurately, but that the P2(x) coefficient has a relatively large uncertainty.
3-50
                                                                                                  Parametric Fitting
   y 2 ( x ) = y 1 ( x ) + a 1 x + a 3 ⎛ ---⎞ ( 5x – 3x )
                                         1        3
                                       ⎝ 2⎠
The Linear Equations pane of the Create Custom Equation GUI is shown below
for the model given by y2(x).
The numerical results indicate that the odd Legendre terms do not contribute
significantly to the fit, and the even Legendre terms are essentially unchanged
from the previous fit. This confirms that the initial model choice is the best one.
                                                                                                                  3-51
3   Fitting Data
                   where ai and bi are the amplitudes, and ci are the periods (cycles) of the data.
                   The question to be answered in this example is how many cycles exist? As a
                   first attempt, assume a 12 month cycle and fit the data using one sine term and
                   one cosine term.
                   If the fit does not describe the data well, add additional sine and cosine terms
                   with unique period coefficients until a good fit is obtained.
                   Because there is an unknown coefficient c1 included as part of the
                   trigonometric function arguments, the equation is nonlinear. Therefore, you
                   must specify the equation using the General Equations pane of the Create
3-52
                                                                       Parametric Fitting
Custom Equation GUI. This pane is shown below for the equation given by
y1(x).
                                                     Specify a meaningful
                                                     equation name.
                                                                                       3-53
3   Fitting Data
                   Note that the toolbox includes the Fourier series as a nonlinear library
                   equation. However, the library equation does not meet the needs of this
                   example because its terms are defined as fixed multiples of the fundamental
                   frequency w. Refer to “Fourier Series” on page 3-16 for more information.
                   The numerical results shown below indicate that the fit does not describe the
                   data well. In particular, the fitted value for c1 is unreasonably small. Because
                   the starting points are randomly selected, your initial fit results might differ
                   from the results shown here.
3-54
                                                                                    Parametric Fitting
The fit appears to be reasonable for some of the data points but clearly does not
describe the entire data set very well. As predicted, the numerical results
indicate a cycle of approximately 12 months. However, the residuals show a
systematic periodic distribution indicating that there are additional cycles that
you should include in the fit equation. Therefore, as a second attempt, add an
additional sine and cosine term to y1(x)
and constrain the upper and lower bounds of c2 to be roughly twice the bounds
used for c1.
                                                                                                        3-55
3   Fitting Data
                   The fit appears to be reasonable for most of the data points. However, the
                   residuals indicate that you should include another cycle to the fit equation.
                   Therefore, as a third attempt, add an additional sine and cosine term to y2(x)
and constrain the lower bound of c3 to be roughly three times the value of c1.
3-56
                                                                            Parametric Fitting
The fit is an improvement over the previous two fits, and appears to account
for most of the cycles present in the ENSO data set. The residuals appear
random for most of the data, although a pattern is still visible indicating that
additional cycles may be present, or you can improve the fitted amplitudes.
In conclusion, Fourier analysis of the data reveals three significant cycles. The
annual cycle is the strongest, but cycles with periods of approximately 44 and
22 months are also present. These cycles correspond to El Nino and the
Southern Oscillation (ENSO).
                                                                                          3-57
3   Fitting Data
The workspace now contains two new variables, xpeak and ypeak:
                   Import these two variables into the Curve Fitting Toolbox and accept the
                   default data set name ypeak vs. xpeak.
                   You will fit the data with the following equation
                                                         x–b 2                           x–b 2
                                                     – ⎛ -------------1-⎞            – ⎛ -------------2-⎞
                                     – bx              ⎝ c1 ⎠                          ⎝ c2 ⎠
                      y ( x ) = ae          + a1 e                          + a2 e
                   where ai are the peak amplitudes, bi are the peak centroids, and ci are related
                   to the peak widths. Because there are unknown coefficients included as part of
                   the exponential function arguments, the equation is nonlinear. Therefore, you
                   must specify the equation using the General Equations pane of the Create
                   Custom Equation GUI. This pane is shown below for y(x).
3-58
                                                                        Parametric Fitting
The data, fit, and numerical fit results are shown below. Clearly, the fit is poor.
Because the starting points are randomly selected, your initial fit results might
differ from the results shown here.
                                                                                      3-59
3   Fitting Data
                   To improve the fit for this example, specify reasonable starting points for the
                   coefficients. Deducing the starting points is particularly easy for the current
                   model because the Gaussian coefficients have a straightforward interpretation
                   and the exponential background is well defined. Additionally, as the peak
                   amplitudes and widths cannot be negative, constrain a1, a2, c1, and c2 to be
                   greater then zero.
                   To define starting values and constraints for unknown coefficients, use the Fit
                   Options GUI, which you open by clicking the Fit options button. The starting
                   values and constraints are shown below.
3-60
                                                                   Parametric Fitting
The data, fit, residuals, and numerical results are shown below.
                                                                                3-61
3   Fitting Data
                   • buchanan is a vector of votes for the Reform Party candidate Pat Buchanan.
                   • bush is a vector of votes for the Republican Party candidate George Bush.
                   • gore is a vector of votes for the Democratic Party candidate Al Gore.
                   For this example, assume that the relationship between the response and
                   predictor data is linear with an offset of zero.
                      buchanan votes = (bush votes)(m1)
                      buchanan votes = (gore votes)(m2)
                   m1 is the number of Bush votes expected for each Buchanan vote, and m2 is
                   the number of Gore votes expected for each Buchanan vote.
3-62
                                                                             Parametric Fitting
To create a first-degree polynomial equation with zero offset, you must create
a custom linear equation. As described in “Example: Fitting with Custom
Equations” on page 3-46, you can create a custom equation using the Fitting
GUI by selecting Custom Equations from the Type of fit list, and then
clicking the New Equation button.
The Linear Equations pane of the Create Custom Equation GUI is shown
below.
                                                           Create a first-degree
                                                           polynomial with zero offset.
Before fitting, you should exclude the data point associated with the absentee
ballots from each data set because these voters did not use the butterfly ballot.
As described in “Marking Outliers” on page 2-27, you can exclude individual
data points from a fit either graphically or numerically using the Exclude GUI.
For this example, you should exclude the data numerically. The index of the
absentee ballot data is given by
   ind = find(strcmp(counties,'Absentee Ballots'))
   ind =
       68
                                                                                          3-63
3   Fitting Data
                   The exclusion rule is named AbsenteeVotes. You use the Fitting GUI to
                   associate an exclusion rule with the data set to be fit.
                   For each data set, perform a robust fit with bisquare weights using the
                   FlaElection equation defined above. For comparison purposes, also perform a
                   regular linear least squares fit. Refer to “Robust Least Squares” on page 3-11
                   for a description of the robust fitting methods provided by the toolbox.
                   You can identify the Palm Beach County data in the scatter plot by using the
                   data tips feature, and knowing the index number of the data point.
                      ind = find(strcmp(counties,'Palm Beach'))
                      ind =
                          50
3-64
                                                                               Parametric Fitting
The Fit Editor and the Fit Options GUI are shown below for a robust fit.
The data, robust and regular least squares fits, and residuals for the buchanan
vs. bush data set are shown below.
                                                                                               3-65
3   Fitting Data
                   The graphical results show that the linear model is reasonable for the majority
                   of data points, and the residuals appear to be randomly scattered around zero.
                   However, two residuals stand out. The largest residual corresponds to Palm
                   Beach County. The other residual is at the largest predictor value, and
                   corresponds to Miami/Dade County.
                   The numerical results are shown below. The inverse slope of the robust fit
                   indicates that Buchanan should receive one vote for every 197.4 Bush votes.
                   The data, robust and regular least squares fits, and residuals for the buchanan
                   vs. gore data set are shown below.
3-66
                                                                    Parametric Fitting
Again, the graphical results show that the linear model is reasonable for the
majority of data points, and the residuals appear to be randomly scattered
around zero. However, three residuals stand out. The largest residual
corresponds to Palm Beach County. The other residuals are at the two largest
predictor values, and correspond to Miami/Dade County and Broward County.
The numerical results are shown below. The inverse slope of the robust fit
indicates that Buchanan should receive one vote for every 189.3 Gore votes.
Using the fitted slope value, you can determine the expected number of votes
that Buchanan should have received for each fit. For the Buchanan versus
Bush data, you evaluate the fit at a predictor value of 152,951. For the
Buchanan versus Gore data, you evaluate the fit at a predictor value of
269,732. These results are shown below for both data sets and both fits.
The robust results for the Buchanan versus Bush data suggest that Buchanan
received 3411 – 775 = 2636 excess votes, while robust results for the Buchanan
versus Gore data suggest that Buchanan received 3411 – 1425 = 1986 excess
votes.
                                                                                 3-67
3   Fitting Data
537
                   Therefore, the voter intention comes into play because in both cases, the
                   margin of victory is less than the excess Buchanan votes.
                   In conclusion, the analysis of the 2000 United States presidential election
                   results for the state of Florida suggests that the Reform Party candidate
                   received an excess number of votes in Palm Beach County, and that this excess
                   number was a crucial factor in determining the election outcome. However,
                   additional analysis is required before a final conclusion can be made.
3-68
                                                                               Nonparametric Fitting
Nonparametric Fitting
            In some cases, you are not concerned about extracting or interpreting fitted
            parameters. Instead, you might simply want to draw a smooth curve through
            your data. Fitting of this type is called nonparametric fitting. The Curve Fitting
            Toolbox supports these nonparametric fitting methods:
            Interpolants
            Interpolation is a process for estimating values that lie between known data
            points. The supported interpolant methods are shown below.
Method Description
                                                                                                 3-69
3   Fitting Data
                   The type of interpolant you should use depends on the characteristics of the
                   data being fit, the required smoothness of the curve, speed considerations,
                   postfit analysis requirements, and so on. The linear and nearest neighbor
                   methods are fast, but the resulting curves are not very smooth. The cubic spline
                   and shape-preserving methods are slower, but the resulting curves are often
                   very smooth.
                   For example, the nuclear reaction data from the file carbon12alpha.mat is
                   shown below with a nearest neighbor interpolant fit and a shape-preserving
                   (PCHIP) interpolant fit. Clearly, the nearest neighbor interpolant does not
                   follow the data as well as the shape-preserving interpolant. The difference
                   between these two fits can be important if you are interpolating. However, if
                   you want to integrate the data to get a sense of the total unormalized strength
                   of the reaction, then both fits provide nearly identical answers for reasonable
                   integration bin widths.
                            350
                                                                                  C12Alpha
                                                                                  nearest
                                                                                  pchip
                            300
250
                            200
                   counts
150
100
50
                              0
                                  0   0.5   1   1.5   2           2.5   3   3.5   4          4.5
                                                          angle
3-70
                                                                   Nonparametric Fitting
Note Goodness of fit statistics, prediction bounds, and weights are not
defined for interpolants. Additionally, the fit residuals are always zero (within
computer precision) because interpolants pass through the data points.
Smoothing Spline
If your data is noisy, you might want to fit it using a smoothing spline.
Alternatively, you can use one of the smoothing methods described in
“Smoothing Data” on page 2-9.
The smoothing spline s is constructed for the specified smoothing parameter p
and the specified weights wi. The smoothing spline minimizes
                                              2    2
                                             ⎛d s ⎞
       ∑                                  ∫
                                2
   p       wi ( y i – s ( xi ) ) + ( 1 – p ) ⎜      ⎟ dx
                                             ⎝ d x2 ⎠
       i
If the weights are not specified, they are assumed to be 1 for all data points.
p is defined between 0 and 1. p = 0 produces a least squares straight line fit to
the data, while p = 1 produces a cubic spline interpolant. If you do not specify
the smoothing parameter, it is automatically selected in the “interesting
range.” The interesting range of p is often near 1/(1+h3/6) where h is the
average spacing of the data points, and it is typically much smaller than the
allowed range of the parameter. Because smoothing splines have an associated
                                                                                    3-71
3   Fitting Data
                   Note The smoothing spline algorithm used by the Curve Fitting Toolbox is
                   based on the csaps function included with the Spline Toolbox. Refer to the
                   csaps reference pages for detailed information about smoothing splines.
                   The nuclear reaction data from the file carbon12alpha.mat is shown below
                   with three smoothing spline fits. The default smoothing parameter (p = 0.99)
                   produces the smoothest curve. The cubic spline curve (p = 1) goes through all
                   the data points, but is not quite as smooth. The third curve (p = 0.95) misses
                   the data by wide margin and illustrates how small the “interesting range” of p
                   can be.
                            350
                                                                                  C12Alpha
                                                                                  p=default
                                                                                  p=1
                            300                                                   p=0.95
250
                            200
                   counts
150
100
50
                            −50
                                  0   0.5   1   1.5   2           2.5   3   3.5   4           4.5
                                                          angle
3-72
                                                                   Nonparametric Fitting
As shown below, you can fit the data with a cubic spline by selecting
Interpolant from the Type of fit list.
The results shown below indicate that goodness of fit statistics are not defined
for interpolants.
                                                                                     3-73
3   Fitting Data
                   As shown below, you can fit the data with a smoothing spline by selecting
                   Smoothing Spline in the Type of fit list.
3-74
                                                                    Nonparametric Fitting
The data and fits are shown below. The default abscissa scale was increased to
show the fit behavior beyond the data limits. You change the axes limits with
Tools->Axes Limit Control menu item.
Note that the default smoothing parameter produces the smoothest curve. As
the smoothing parameter increases beyond the default value, the associated
curve approaches the cubic spline curve.
                                                                                          3-75
3   Fitting Data
       Selected Bibliography
                   [1] Draper, N.R and H. Smith, Applied Regression Analysis, 3rd Ed., John
                   Wiley & Sons, New York, 1998.
                   [2] Bevington, P.R. and D.K. Robinson, Data Reduction and Error Analysis for
                   the Physical Sciences, 2nd Ed., WCB/McGraw-Hill, Boston, 1992.
                   [3] Daniel, C. and F.S. Wood, Fitting Equations to Data, John Wiley & Sons,
                   New York, 1980.
                   [4] Branch, M.A., T.F. Coleman, and Y. Li, “A Subspace, Interior, and
                   Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization
                   Problems,” SIAM Journal on Scientific Computing, Vol. 21, Number 1, pp.
                   1-23, 1999.
                   [5] Levenberg, K., “A Method for the Solution of Certain Problems in Least
                   Squares,” Quart. Appl. Math, Vol. 2, pp. 164-168, 1944.
                   [6] Marquardt, D., “An Algorithm for Least Squares Estimation of Nonlinear
                   Parameters,” SIAM J. Appl. Math, Vol. 11, pp. 431-441, 1963.
                   [7] DuMouchel, W. and F. O’Brien, “Integrating a Robust Option into a
                   Multiple Regression Computing Environment,” in Computing Science and
                   Statistics: Proceedings of the 21st Symposium on the Interface, (K. Berk and L.
                   Malone, eds.), American Statistical Association, Alexandria, VA, pp. 297-301,
                   1989.
                   [8] DeAngelis, D.J., J.R. Calarco, J.E. Wise, H.J. Emrich, R. Neuhausen, and
                   H. Weyand, “Multipole Strength in 12C from the (e,e'α) Reaction for
                   Momentum Transfers up to 0.61 fm-1,” Phys. Rev. C, Vol. 52, Number 1, pp.
                   61-75 (1995).
3-76
                                                                              4
Function Reference
This chapter describes the toolbox M-file functions that you use directly. A number of other M-file
helper functions are provided with this toolbox to support the functions listed below. These helper
functions are not documented because they are not intended for direct use.
                         Preprocessing Data
                         excludedata   Specify data to be excluded from a fit
                         smooth        Smooth the response data
4-2
                                                             Functions — Categorical List
Postprocessing Data
confint            Compute confidence bounds for fitted coefficients
differentiate      Differentiate a fit result object
integrate          Integrate a fit result object
predint            Compute prediction bounds for new observations or for the
                   function
General Purpose
cftool          Open the Curve Fitting Tool
datastats       Return descriptive statistics about the data
feval           Evaluate a fit result object or a fit type object
plot            Plot data, fit, prediction bounds, outliers, and residuals
                                                                                      4-3
4
4-4
                                                                                                  cfit
Purpose       4cfit
              Create a cfit object
Remarks       cfit is called by the fit function. You should call cfit directly if you want to
              assign coefficients and problem parameters to a model without performing a
              fit.
Example       Create a fit type object and assign values to the coefficients and to the problem
              parameter.
                      m = fittype('a*x^2+b*exp(n*x)','prob','n');
                      f = cfit(m,pi,10.3,3);
                                                                                                    4-5
cflibhelp
Purpose       4cflibhelp
              Display information about library models, splines, and interpolants
Syntax        cflibhelp
              cflibhelp group
Description   cflibhelp displays the names, equations, and descriptions for all the fit types
              in the curve fitting library. You can use the fit type name as an input parameter
              to the fit, cfit, and fittype functions.
              cflibhelp group displays the names, equations, and descriptions for the fit
              type group specified by group. The supported fit type groups are given below.
Group Description
              For more information about the toolbox library models, refer to “Library
              Models” on page 3-16. For more information about the toolbox library
              interpolants and splines, refer to “Nonparametric Fitting” on page 3-69.
4-6
                                                                               cflibhelp
Example    Display the names and descriptions for the spline fit type group.
             cflibhelp spline
SPLINES
SPLINETYPE DESCRIPTION
           Display the model names and equations for the polynomial fit type group.
             cflibhelp polynomial
POLYNOMIAL MODELS
MODELNAME EQUATION
                         poly1                    Y = p1*x+p2
                         poly2                    Y = p1*x^2+p2*x+p3
                         poly3                    Y = p1*x^3+p2*x^2+...+p4
                         ...
                         poly9                    Y = p1*x^9+p2*x^8+...+p10
                                                                                      4-7
cftool
Purpose       4cftool
              Open the Curve Fitting Tool
Syntax        cftool
              cftool(xdata,ydata)
Remarks The Curve Fitting Tool is a graphical user interface (GUI) that allows you to
4-8
                                                                          cftool
The Curve Fitting Tool is shown below. The data is from the census MAT-file,
and the fit is a quadratic polynomial. The residuals are shown as a line plot
below the data and fit.
The Curve Fitting Tool provides several features that facilitate data and fit
exploration. Refer to “Viewing Data” on page 2-6 for a description of these
features.
By clicking the Data, Fitting, Exclude, Plotting, or Analysis buttons, you can
open the associated GUIs, which are described below. For a complete example
that uses many of these GUIs, refer to Chapter 1, “Getting Started with the
Curve Fitting Toolbox.”
                                                                                 4-9
cftool
The Data GUI is shown below with the census data loaded.
4-10
                                                                         cftool
The Fitting GUI shown below displays the results of fitting the census data to
a quadratic polynomial.
                                                                                 4-11
cftool
4-12
                                                                             cftool
The Analysis GUI shown below displays the numerical results of extrapolating
the census data from the year 2000 to the year 2050 in 10-year increments.
Refer to “Analyzing the Fit” on page 1-17 for an example that uses the Analysis
GUI.
                                                                                  4-13
coeffvalues
Purpose       4coeffvalues
              Return coefficient values from a cfit object
Syntax coeffvalues(cfobj)
Description   coeffvalues(cfobj) returns the values of the coefficients of the cfit object
              cfobj as a row vector.
cfobj =
coeffs =
1.0e+004 *
4-14
                                                                                       confint
Purpose       4confint
              Compute confidence bounds for fitted coefficients
Syntax        ci = confint(fresult)
              ci = confint(fresult,level)
Remarks       To calculate confidence bounds, confint uses R-1 (the inverse R factor from QR
              decomposition of the Jacobian), the degrees of freedom for error, and the root
              mean squared error. This information is automatically returned by the fit
              function and contained within the fit result object.
              If coefficients are bounded and one or more of the estimates are at their bounds,
              those estimates are regarded as fixed and do not have confidence bounds. Note
              that you cannot calculate confidence bounds for the smoothing spline and
              interpolant fit types.
                                                                                                  4-15
confint
Example    Fit the census data to a second-degree polynomial. The display for fresult
           includes the 95% confidence bounds for the fitted coefficients.
              load census
              fresult = fit(cdate,pop,'poly2')
              fresult =
                   Linear model Poly2:
                     fresult(x) = p1*x^2 + p2*x + p3
                   Coefficients (with 95% confidence bounds):
                     p1 =    0.006541 (0.006124, 0.006958)
                     p2 =      -23.51 (-25.09, -21.93)
                     p3 = 2.113e+004 (1.964e+004, 2.262e+004)
           Calculate 95% confidence bounds for the fitted coefficients using confint.
              ci = confint(fresult,0.95)
              ci =
           Note that the fit display and the array returned by confint present the
           confidence bounds using slightly different formats. The fit display mimics an
           n-by-3 array where n is the number of coefficients, the first column is the
           coefficient variable, the second column is the fitted coefficient value, and the
           third column is the lower and upper bound. confint returns a 2-by-n array
           where the top row contains the lower bound and the bottom row contains the
           upper bound for each coefficient.
4-16
                                                                                  datastats
Purpose       4datastats
              Return descriptive statistics about the data
Description   xds = datastats(xdata) returns statistics for xdata to the structure xds. The
              structure contains the fields shown below.
Field Description
Remarks       If xdata or ydata contains complex values, only the real part of the value is
              used in the statistics computations. If the data contains Infs or NaNs, they are
              processed using the usual MATLAB rules.
                                                                                                 4-17
datastats
xds =
                      num:   21
                      max:   1990
                      min:   1790
                     mean:   1890
                   median:   1890
                    range:   200
                      std:   62.048
yds =
                      num:   21
                      max:   248.7
                      min:   3.9
                     mean:   85.729
                   median:   62.9
                    range:   244.8
                      std:   78.601
4-18
                                                                                               differentiate
Purpose       4differentiate
              Differentiate a fit result object
Remarks       For library equations with closed forms, analytic derivatives are calculated.
              For all other equations, the first derivative is calculated using the central
              difference quotient
                           yx + h – yx – h
                      y' = --------------------------------
                                                          -
                                        2h
              where x is the predictor value at which the derivative is calculated, h is a small
              number, yx+h is fresult evaluated at x+h, and yx-h is fresult evaluated at x-h.
              The second derivative is calculated using the expression
                            y x + h + y x – h – 2y x
                      y'' = -----------------------------------------------
                                                                          -
                                                     2
                                                 h
                                                                                                                  4-19
differentiate
            Create a custom fit type, and fit the data using reasonable starting values.
                ftype = fittype('a*sin(b*x)');
                fopts = fitoptions('Method','Nonlinear','start',[1 1]);
                fit1 = fit(x,y,ftype,fopts);
            Plot the data, the fit to the data, and the first derivatives.
                plot(fit1,'k-',x,y,'b.');hold on
                plot(x,deriv1,'ro')
                legend('data','fitted curve','derivatives')
                     1.5
                                                                     data
                                                                     fitted curve
                                                                     derivatives
0.5
                      0
                y
−0.5
−1
                    −1.5
                           0   2   4       6       8       10      12               14
                                               x
4-20
                                                                                         disp
Purpose       4disp
              Display descriptive information for Curve Fitting Toolbox objects
Syntax        obj
              disp(obj)
Description   obj or disp(obj) displays descriptive information for obj. You can create obj
              with the fit or cfit function, the fitoptions function, or the fittype
              function.
Example       The display for a custom fit type object is shown below.
                      ftype = fittype('a*x^2+b*x+c+d*exp(-e*x)')
                      ftype =
                           General model:
                             ftype(a,b,c,d,e,x) = a*x^2+b*x+c+d*exp(-e*x)
                      fopts =
                              Normalize:   'on'
                                Exclude:   []
                                Weights:   []
                                 Method:   'NonlinearLeastSquares'
                                 Robust:   'Off'
                             StartPoint:   []
                                  Lower:   []
                                  Upper:   []
                              Algorithm:   'Trust-Region'
                          DiffMinChange:   1e-008
                          DiffMaxChange:   0.1
                                Display:   'Notify'
                            MaxFunEvals:   600
                                MaxIter:   400
                                 TolFun:   1e-006
                                   TolX:   1e-006
                                                                                              4-21
disp
           Note that all fit types have the Normalize, Exclude, Weights, and Method fit
           options. Additional fit options are available depending on the Method value. For
           example, if Method is SmoothingSpline, the SmoothingParam fit option is
           available.
           The display for a fit result object is shown below.
              fresult = fit(cdate,pop,ftype,fopts)
fresult =
                    General model:
                      fresult(x) = a*x^2+b*x+c+d*exp(-e*x)
                      where x is normalized by mean 1890 and std 62.05
                    Coefficients (with 95% confidence bounds):
                      a =       21.14 (-27.61, 69.89)
                      b =       64.49 (-188.5, 317.4)
                      c =       49.92 (-421.5, 521.4)
                      d =       11.96 (-458, 481.9)
                      e =     -0.7745 (-10.25, 8.701)
4-22
                                                                              excludedata
Purpose       4excludedata
              Specify data to be excluded from a fit
Method Description
                                                                                                  4-23
excludedata
Remarks    You can combine data exclusion methods using logical operators. For example,
           to combine methods using the | (OR) operator
              outliers = excludedata(xdata,ydata,'indices',[3 5]);
              outliers = outliers|excludedata(xdata,ydata,'box',[1 10 0 90]);
           In some cases, you might want to use the ~ (NOT) operator to specify a box that
           contains all the data to exclude.
              outliers = ~excludedata(xdata,ydata,'box',[1 10 0 90]);
Example    Generate random data in the interval [0, 15], create a sine wave with noise, and
           add two outliers with the value 2.
              rand('state',0);
              x = 15*rand(150,1);
              y = sin(x) + (rand(size(x))-0.5)*0.5;
              y(ceil(length(x)*rand(2,1))) = 2;
           Identify outliers that are outside the interval [-1.5, 1.5] using the range
           method.
              outliers = excludedata(x,y,'range',[-1.5 1.5]);
           You can pass outliers to the fit function to exclude the specified data points
           from a fit.
              ftype = fittype('a*sin(b*x)');
              fresult = fit(x,y,ftype,'startpoint',[1 1],'exclude',outliers);
4-24
                                                                                              feval
Purpose       4feval
              Evaluate a fit result object or a fit type object
Syntax        f = feval(fresult,x)
              f = feval(ftype,coef1,coef2,...,x)
Description   f = feval(fresult,x) evaluates the fit result object fresult at the values
              specified by x, and returns the result to f. You create a fit result object with the
              fit function.
Remarks       You can also evaluate a fit result or a fit type object using the following syntax.
                       f = fresult(x);
                       f = ftype(coef1,coef2,...,x);
Example       Create a fit type object and evaluate the object at x using the specified model
              coefficients.
                       x = (0:0.1:10)';
                       ftype = fittype('a*x^2+b*x');
                       f = feval(ftype,1,2,x);
                                                                                                     4-25
feval
           Create a fit result object and evaluate the object over a finer range in x.
              y = x.^2+(rand(size(x))-0.5);
              xx = (0:0.05:10)';
              fresult = fit(x,y,ftype);
              f = feval(fresult,xx);
4-26
                                                                                                  fit
Purpose       4fit
              Fit data using a library or custom model, a smoothing spline, or an interpolant
                                                                                                  4-27
fit
       fresult = fit(xdata,ydata,'ltype','PropertyName',
       PropertyValue,...) fits the data using the options specified by PropertyName
       and PropertyValue. You can display the fit options available for the specified
       library fit type with the fitoptions function.
Field Description
4-28
                                                                                             fit
          example, the information returned for nonlinear least squares fits is given
          below.
Field Description
Remarks   For rationals and Weibull library models, the coefficient starting values are
          randomly selected in the range [0,1]. Therefore, if you perform multiple fits to
          a data set using the same equation, you might get different coefficient results
          due to different starting values. To avoid this situation, you should pass in a
          vector of starting values each time you fit, or define a specific state for the
          random number generator, rand or randn, before fitting.
          For all other library models, optimal starting points are automatically
          calculated. These values depend on the data, and are based on model-specific
          heuristics.
                                                                                             4-29
fit
Example    Fit the census data with a second-degree polynomial library model and return
           the goodness of fit statistics and the output structure.
              load census
              [fit1,gof1,out1] = fit(cdate,pop,'poly2');
           Create a fit options object, and try to find a better fit by overriding the default
           starting points for the fit coefficients.
              opts = fitoptions('exp1','Norm','on','start',[100 0.1]);
              [fit3,gof3,out3] = fit(cdate,pop,'exp1',opts);
           Fit the data to a custom model that contains the problem parameter n.
              mymodel = fittype('a*exp(b*n*x)+c','problem','n');
              opts = fitoptions(mymodel);
              set(opts,'normalize','on')
              [fit4,gof4,out4] = fit(cdate,pop,mymodel,opts,'problem',{2});
           The warning occurs whenever you fit data with a custom nonlinear model and
           do not provide starting points.
4-30
                                                                                        fitoptions
Purpose       4fitoptions
              Create or modify a fit options object
Description   opts = fitoptions creates the empty fit options object opts. The returned
              options are supported by all fitting methods, and are given by the following
              properties. Note that curly braces denote default property values.
Property Description
                 Exclude               A vector of one or more data points to exclude from the fit.
                                       You can use the excludedata function to create this vector.
                 Method                The fitting method. The value is None for an empty object. A
                                       complete list of supported fitting methods is given below.
                                                                                                      4-31
fitoptions
             opts = fitoptions('method',value,'PropertyName',PropertyValue,...)
             creates a default fit options object for the specified fitting method, and with the
             specified property names and property values.
Remarks      To display the possible fit options property values, use the set function.
                set(opts)
             To display the current fit options property values, use the get function.
                get(opts)
             Note that you can configure or display a single property value using the dot
             notation. See below for an example.
4-32
                                                                   fitoptions
Property Description
                                                                                 4-33
fitoptions
Property Description
              Algorithm       Algorithm used for the fitting procedure. The value can be
                              'Levenberg-Marquardt','Gauss-Newton', or
                              {'Trust-Region'}.
4-34
                                                                              fitoptions
          Note For the properties Upper, Lower, and StartPoint, the order of the
          entries in the vector corresponds to the alphabetical order of the coefficients,
          not the order in which they appear in the expression ftype. For example, if
          you create ftype by the command ftype = fittype('b*x^2+c*x+a'), setting
          StartPoint to [1 3 5] assigns a = 1, b = 3, and c = 5.
Example   Create an empty fit options object and configure the object so that data is
          normalized before fitting.
             opts = fitoptions;
             opts.Normal = 'on'
opts =
                  Normalize: 'on'
                    Exclude: []
                                                                                             4-35
fitoptions
                       Weights: []
                        Method: 'None'
             Creating an empty fit options object is particularly useful when you want to
             configure only the Normalize, Exclude, or Weights properties for a data set,
             and then fit the data using the same fit options object, but with different fitting
             methods. For example, fit the census data using a third-degree polynomial, a
             one-term exponential, and a cubic spline.
                load   census
                f1 =   fit(cdate,pop,'poly3',opts);
                f2 =   fit(cdate,pop,'exp1',opts);
                f3 =   fit(cdate,pop,'cubicsp',opts);
4-36
                                                                  fitoptions
You can return values for some fit options with the fit function. For example,
fit the census data using a smoothing spline and return the default smoothing
parameter. Note that this value is based on the data passed to fit.
  [f,gof,out] = fit(cdate,pop,'smooth');
  smoothparam = out.p
  smoothparam =
0.0089
Increase the default smoothing parameter by about 10% and fit again.
  opts = fitoptions('Method','Smooth','SmoothingParam',0.0098);
  [f,gof,out] = fit(cdate,pop,'smooth',opts);
Create two noisy Gaussian peaks — one with a small width, and one with a
large width.
  a1 = 15; b1 = 3; c1 = 0.02;
  a2 = 35; b2 = 7.5; c2 = 4;
  x = (1:0.01:10)';
  rand('state',0)
  gdata = a1*exp(-((x-b1)/c1).^2) + a2*exp(-((x-b2)/c2).^2) ...
      + 5*(rand(size(x))-.5);
Because the Display property is set to its default value Notify, a message is
included as part of the display due to the fit not converging. The message
indicates that you should try increasing the number of function evaluations.
                                                                                 4-37
fitoptions
             As you can see by examining the fitted coefficients, it is clear that the algorithm
             has difficulty fitting the narrow peak, and does a good job fitting the broad
             peak. In particular, note that the fitted value of the a2 coefficient is negative.
             To help the fitting procedure converge, specify that the lower bounds of the
             amplitude and width parameters for both peaks must be greater than zero. To
             do this, create a fit options object for the gauss2 model and configure the Lower
             property to zero for a1, c1, a2, and c2, but leave b1 and b2 unconstrained.
                opts = fitoptions('gauss2');
                opts.Lower = [0 -Inf 0 0 -Inf 0];
This is a much better fit, although you can still improve the a2 value.
4-38
                                                                                        fittype
Purpose       4fittype
              Create a fit type object
Description   ftype = fittype('ltype') creates the fit type object ftype from the library
              model, spline, or interpolant specified by ltype. You can display the library fit
              type names with the cflibhelp function.
              ftype = fittype('expr') creates the fit type object from the expression
              specified by expr. The expression expr represents the custom model you will
              use to fit your data. To create a general (nonlinear) custom model, specify the
              entire equation as one expression. To create a linear custom model, pass in a
              cell array of expressions to expr but do not include the coefficients. Each
              element of the cell array corresponds to one term of the model. If there is a
              constant term, use “1” as the corresponding element in the cell array.
              By default, the independent variable is assumed to be x, the dependent
              variable is assumed to be y, there are no problem-dependent variables, and all
              other variables are assumed to be coefficients of the model. All coefficients
              must be scalars.
                                                                                                  4-39
fittype
options Specify the default fit options for the current expression.
Example   Create a fit type object for a custom general equation and define the
          problem-dependent name to be n.
            ftype = fittype('a*x+b*exp(n*x)','problem','n');
            ftype =
                 General model:
                   ftype(a,b,n,chan) = a*chan+b*exp(n*chan)
          Create a fit type object for a custom linear equation and specify names for the
          coefficients.
            ftype = fittype({'cos(x)','1'},'coeff',{'a1','a2'})
            ftype =
                 Linear model:
                   ftype(a1,a2,x) = a1*cos(x) + a2
4-40
                                                                           fittype
Create a fit type object for the rat33 library model. Note that the display
includes the full equation.
   ftype = fittype('rat33')
   ftype =
      General model Rat33:
      ftype(p1,p2,p3,p4,q1,q2,q3,x) = (p1*x^3 + p2*x^2 + p3*x + p4)/
                   (x^3 + q1*x^2 + q2*x + q3)
Create a fit type object and include the existing fit options object opts, and fit
to the census data.
   load census
   opts = fitoptions('Method','Nonlinear','Normalize','On');
   ftype = fittype('a*exp(b*x)+c','options',opts);
   f1 = fit(cdate,pop,ftype);
                                                                                     4-41
get
Purpose       4get
              Return properties for a fit options object
Syntax        get(opts)
              a = get(opts)
              a = get(opts,'PropertyName')
Description   get(opts) returns all property names and their current values to the
              command line for the fit options object opts.
              a = get(opts) returns the structure a where each field name is the name of a
              property of opts, and each field contains the value of that property.
Example       Create a fit options object for a second-degree polynomial, and return the
              current property values to the command line.
                     opts = fitoptions('poly2');
                     get(opts)
                     ans =
                         Normalize:   'off'
                           Exclude:   []
                           Weights:   []
                            Method:   'LinearLeastSquares'
                            Robust:   'Off'
                             Lower:   []
                             Upper:   []
4-42
                                                                                    integrate
Purpose       4integrate
              Integrate a fit result object
              Create a custom fit type, and fit the data using reasonable starting values.
                    ftype = fittype('a*sin(b*x)');
                    fit1 = fit(x,y,ftype,'startpoint',[1 1]);
                                                                                                   4-43
plot
Purpose       4plot
              Plot data, fit, prediction bounds, outliers, and residuals
Syntax        plot(fresult)
              plot(fresult,xdata,ydata)
              plot(fresult,xdata,ydata,'s')
              plot(fresult,'s1',xdata,ydata,'s2')
              plot(fresult,xdata,ydata,outliers)
              plot(fresult,xdata,ydata,outliers,'s')
              plot(...,'ptype1','ptype2',...)
              plot(...,'ptype1','ptype2',...,conflev)
              h = plot( )
              'ptype'         The plot type. You can specify multiple plot types as a cell
                              array of strings.
Description   plot(fresult) plots the fit result object fresult. fresult is a fit result object
              generated by the fit function.
4-44
                                                                                  plot
predfunc Same as fit but with prediction bounds for the function.
residuals Plot the residuals. The fit corresponds to the zero line.
                                                                                       4-45
plot
Remarks   To plot error bars, use the errorbar function. For example, if you have a vector
          of weights w (reciprocal variances) associated with the response data ydata, you
          can plot symmetric error bars with the following command.
             errorbar(xdata,ydata,1./sqrt(w))
Example   Create a noisy sine wave on the interval [-2π, 2π] and add two outliers with the
          value 2.
             rand('state',2);
             x = (-2*pi:0.1:2*pi)';
             y = sin(x) + (rand(size(x))-0.5)*0.2;
             y(ceil(length(x)*rand(2,1))) = 2;
          Identify outliers that are outside the interval [-1.5, 1.5] using the range
          method.
             outliers = excludedata(x,y,'range',[-1.5 1.5]);
          Create a custom fit type, define fit options that exclude the outliers from the fit
          and define reasonable starting values, and fit the data.
             ftype = fittype('a*sin(b*x)');
             opts = fitoptions('Method','NonLinear','excl',outliers,...
             'Start',[1 1]);
             fit1 = fit(x,y,ftype,opts);
          Plot the data, the fit to the data, and mark the outliers.
             subplot(2,1,1)
             plot(fit1,'k-',x,y,'b.',outliers,'ro');
4-46
                                                                                plot
               2
                                                          data
              1.5                                         excluded data
                                                          fitted curve
               1
              0.5
y
−0.5
−1
             −1.5
                −8   −6    −4    −2      0     2      4     6               8
                                         x
               2
                                                                data
                                                                zero line
              1.5
y residual
0.5
             −0.5
                −8   −6    −4    −2      0     2      4     6               8
                                         x
                                                                                 4-47
plot
           Plot 99% confidence and prediction bounds for the function and for a new
           observation.
                plot(fit1,'k-',x,y,'b.','predfunc','predobs',0.99);
                 2
                                                               data
                1.5                                            fitted curve
                                                               confidence bounds
                 1
                0.5
           y
−0.5
−1
               −1.5
                  −8   −6    −4      −2         0   2      4          6            8
                                                x
                 2
                                                               data
                1.5                                            fitted curve
                                                               confidence bounds
                 1
                0.5
           y
−0.5
−1
               −1.5
                  −8   −6    −4      −2         0   2      4          6            8
                                                x
4-48
                                                                                        predint
Purpose       4predint
              Compute prediction bounds for new observations or for the function
Syntax        ci = predint(fresult,x)
              ci = predint(fresult,x,level)
              ci = predint(fresult,x,level,'intopt','simopt')
              [ci,ypred] = predint(...)
                                                                                                  4-49
predint
          Fit the data using a single-term exponential and define the range over which
          prediction bounds are calculated.
             fresult = fit(x,y,'exp1');
          Return the prediction bounds for the function as well as the predicted values of
          the fit using nonsimultaneous and simultaneous bounds with a 95% confidence
          level. For nonsimultaneous bounds, given a single predetermined predictor
          value, you have 95% confidence that the true function lies between the
          confidence bounds. For simultaneous bounds, you have 95% confidence that the
          function at all predictor values lies between the bounds.
             [c1,ypred1] = predint(fresult,x,0.95,'fun','off');
             [c2,ypred2] = predint(fresult,x,0.95,'fun','on');
          Return the prediction bounds for new observations as well as the predicted
          values of the fit using nonsimultaneous and simultaneous bounds with a 95%
          confidence level. For nonsimultaneous bounds, given a single predictor value,
          you have 95% confidence that a new observation lies between the confidence
          bounds. For simultaneous bounds, regardless of the predictor value, you have
          95% confidence that a new observation lies between the bounds.
             [c3,ypred3] = predint(fresult,x,0.95,'obs','off');
             [c4,ypred4] = predint(fresult,x,0.95,'obs','on');
4-50
                                                                                                                            predint
                1
                                                                           0.5
               0.5                                                          0
                0                                                         −0.5
                     0       2       4         6       8         10              0         2       4         6       8         10
                                         x                                                             x
                1
                                                                           0.5
               0.5                                                          0
                0                                                         −0.5
                     0       2       4         6       8         10              0         2       4         6       8         10
                                         x                                                             x
                                                                                                                                    4-51
set
Purpose       4set
              Configure or display property values for a fit options object
Syntax        set(opts)
              a = set(opts)
              set(opts,'PropertyName',PropertyValue,...)
              set(opts,PN,PV)
              set(opts,S)
Description   set(opts) displays all configurable property values for the fit options object
              opts. If a property has a finite list of possible string values, these values are
              also displayed.
              set(opts,S) configures the named properties to the specified values for opts.
              The structure S has field names given by the fit options object properties, and
              the field values are the values of the corresponding properties.
4-52
                                                                                              set
Example    Create a custom nonlinear model, and create a default fit options object for the
           model.
              mymodel = fittype('a*x^2+b*exp(n*c*x)','prob','n');
              opts = fitoptions(mymodel);
           Configure the Display, Lower, and Algorithm properties using cell arrays of
           property names and property values.
              set(opts,{'Disp','Low','Alg'},{'Final',[0 0 0],'Levenberg'})
                                                                                              4-53
smooth
Purpose       4smooth
              Smooth the response data
Syntax        yy        =   smooth(ydata)
              yy        =   smooth(ydata,span)
              yy        =   smooth(ydata,'method')
              yy        =   smooth(ydata,span,'method')
              yy        =   smooth(ydata,'sgolay',degree)
              yy        =   smooth(ydata,span,'sgolay',degree)
              yy        =   smooth(xdata,ydata,...)
Description   yy = smooth(ydata) smooths the response data specified by ydata using the
              moving average method. The default number of data points in the average (the
              span) is five. yy is the smoothed response data. Note that you need not specify
              the predictor data if it is sorted and uniform.
4-54
                                                                       smooth
are given below. For the Savitzky-Golay method, the default polynomial degree
is 2.
Method Description
                                                                                  4-55
smooth
Remarks   For the moving average and Savitzky-Golay methods, span must be odd. If an
          even span is specified, it is reduced by 1. If span is greater than the length of
          ydata, it is reduced to the length of ydata.
          Use robust smoothing when you want to assign lower weight to outliers. The
          robust smoothing algorithm uses the 6MAD method, which assigns zero weight
          to data outside six mean absolute deviations.
          Another way to generate a vector of smoothed response values is to fit your
          data using a smoothing spline. Refer to the fit function for more information.
Example   Suppose you want to smooth traffic count data with a moving average filter to
          see the average traffic flow over a 5-hour window (span is 5).
             load count.dat
             y = count(:,1);
             yy = smooth(y);
           120
                     Original Data
                     Smoothed Data Using ’moving’
100
80
60
40
20
             0
                 0             5                    10   15   20      25
4-56
                                                                       smooth
Because of the way that the end points are treated, the result shown above
differs from the result returned by the filter function described in “Difference
Equations and Filtering” in the MATLAB documentation.
In this example, generate random data between 0 and 15, create a sine wave
with noise, and add two outliers with the value 3.
   rand('state',2);
   x = 15*rand(150,1);
   y = sin(x) + (rand(size(x))-0.5)*0.5;
   y(ceil(length(x)*rand(2,1))) = 3;
Smooth the data using the loess and rloess methods with the span specified
as 10% of the data.
   yy1 = smooth(x,y,0.1,'loess');
   yy2 = smooth(x,y,0.1,'rloess');
                                                                                   4-57
smooth
Note how the outliers have less effect with the robust method.
            3
                    Original Data
            2       Smoothed Data Using ’loess’
−1
0 5 10 15
            3
                    Original Data
            2       Smoothed Data Using ’rloess’
−1
0 5 10 15
4-58
                                                                     Index
A
adjusted residuals 3-12                 constraints
adjusted R-square 3-31                     Fit Options GUI 3-23
algorithms 3-15                            Fourier series example 3-54
Analysis GUI                               Gaussian example 3-60
   census data example 1-17             starting values
   description 4-14                        Fit Options GUI 3-23
axes limit control                         Gaussian example 3-60
   census data example 1-12             structure
   nonparametric fit example 3-75          piecewise polynomials 3-73
                                     coefficient of multiple determination 3-30
                                     coeffvalues 4-15
B                                    complex data
backslash operator 3-9                  importing 2-3
batch mode 1-21                      confidence bounds
best fit 1-10                           census data example 1-15
bisquare weights                        definition 3-32
   robust fitting 3-11                  Legendre polynomial example 3-50
   robust smoothing 2-17             confint function 4-16
bounds                               constraints
   confidence                           Fit Options GUI 3-23
      census data example 1-15          Fourier series example 3-54
      definition 3-32                   Gaussian example 3-60
   prediction                        covariance matrix of coefficient estimates 3-34
      definition 3-32                Create Custom Equation GUI
      goodness of fit example 3-37      definition 3-21
                                        Legendre polynomial example 3-46
                                     cubic spline interpolation 3-69
C                                    curve fitting session
carbon12alpha data set 3-46
                                        saving custom equations 3-20
census data example 1-5                 saving fit results 1-20
center and scale 1-10                Curve Fitting Tool
cfit function 4-6
                                        Fourier series example 3-55
cflibhelp function 4-7                  Gaussian example 3-59
cftool function 4-9
                                        Legendre polynomial example 3-50
coefficient                             nonparametric fit example 3-75
   confidence bounds 3-33               opening with cftool 4-9
                                                                                       Index-1
    Index
Index-2
                                                                               Index
                                                                                Index-3
    Index
Index-4
                                                                                     Index
I                                              description 3-21
importing data 1-5                             Legendre polynomial example 3-49
   description 2-2                             robust fit example 3-63
   example 2-4                              linear interpolation 3-69
influential data 2-28                       linear least squares 3-6
Infs                                        loading the curve-fitting session 1-20
   importing 2-3                            local regression smoothing 2-14
   removing 2-41                            loess 2-14
integrate function 4-44                     lowess 2-14
interpolants 3-69
iteratively reweighted least squares 3-12
                                            M
                                            MAD
J                                             robust fitting 3-12
Jacobian 3-14                                 robust smoothing 2-17
                                            marking outliers 2-27
                                            median absolute deviation
L                                             robust fitting 3-12
LAR 3-11                                      robust smoothing 2-17
least absolute residuals 3-11               M-file generation 1-21
least squares fitting                       models
   definition 3-6                             custom 3-20
   linear 3-6                                 library 3-16
   nonlinear 3-14                           moving average filtering 2-12
   robust 3-11                              multiple correlation coefficient 3-30
   weighted linear 3-9
Legendre polynomials
   example 3-46                             N
   generating 3-47                          NaNs
Levenberg-Marquardt algorithm 3-15            importing 2-3
leverages 3-12                                removing 2-41
library models 3-16                         nearest neighbor interpolation 3-69
linear equations                            nonlinear equations
   custom 3-21                                custom 3-22
   fit options 3-23                           fit options 3-23
   fitting 3-6                                fitting 3-14
Linear Equations pane                       nonlinear least squares 3-14
                                                                                      Index-5
    Index
                                             Q
    P                                        QR decomposition 3-9
    parametric fitting 3-4                   quality of data
    pchip 3-71                                 definition 3-5
    piecewise polynomials 3-71                 weighted linear least squares 3-9
    plot function 4-45
    Plotting GUI
       census data example 1-19              R
       description 4-13                      rationals
       smoothing data example 2-23              example 3-41
    polynomials                                 fit type definition 3-19
       census data example 1-7               regression
       fit type definition 3-17                 sum of squares 3-30
Index-6
                                                                                  Index
                                                                                   Index-7
    Index
    T
    Table of Fits 1-7
    Table Options GUI
       census data example 1-13
       goodness of fit evaluation 3-38
    toolbar 2-6
    Tools menu 2-6
    total sum of squares 3-30
    transforming the response data 2-40
    tricube weights 2-14
    trust-region algorithm 3-15
    V
    variances 3-11
    viewing data
Index-8