Data Fitting

Introduction

Given a two variable data set we are interested in writing a polynomial equation that relates one quantity, x to another, y. The equation can then be used for prediction purposes. That is, finding unknown values of y that correspond to values of x that we may be interested in.

The standard form of a polynomial equation of degree n is:

Notice that the number of coefficients, n+1 , is one more than the degree of the polynomial, n. To solve the data fitting problem, we must find these n+1 coefficients.

For the linear case, we have already studied two methods that will solve the data fitting problem. These are illustrated in the following examples:

Example 1: Find the equation of the line that fits the data

x

10

50

y

3.25

6.25

Solution: Use the point slope form of a linear equation.


The graph shows the data points and the line containing them.
The line y=.075x + 2.5 is an exact fit for the data



 

Example 2: Find the equation of the line that fits the data

x

1

2

4

7

y

5

14

21

39

Solution: Use the least squares method where


The graph shows the data points and the least squares line.
The line y=5.40x + .83 is the best fit for the data in the least squares sense.



Many data sets do not satisfy a linear relationship, but instead one that is modeled by a polynomial of larger degree. For these data sets we will study two methods that will lead us to the polynomial equation. In one case, the polynomial we find will be the exact fit for the data; in the other, the best fit. Whether to use one method or the other depends on the size of the data set and the degree of the polynomial we want to find.


Polynomial Interpolation

The method of polynomial interpolation is used to find the equation of a polynomial of degree n, that fits a data set of size n+1. For example, if I want to find the equation of a quadratic polynomial (degree two), that fits 3 data points, I can use polynomial interpolation. If my data set had 4 or more points, I cannot use this method to find a quadratic polynomial.

Suppose we want to find a cubic (degree three) polynomial that lies on the data:

x

0

1

2

4

y

3.07

4.92

8.13

22

For this situation, n = 3, the degree of the polynomial, and n+1 = 4, the size of the data set.

Each one of these data points must satisfy the equation:

for the same coefficients . By substituting each point into the standard cubic equation we get the system:

The matrix equation that corresponds to this system follows:

The 4 by 4 matrix above is called the Vandermonde Matrix. It is a square matrix and has an inverse, which can be used to solve the system for . In general, the Vandermonde Matrix is always square . Why? Furthermore, it is always non-singular, meaning, it has an inverse. Consequently, our system is guaranteed to have a solution.

Solving the matrix equation we get :

We now have the cubic polynomial that fits the data :

Using the equation, we can predict that a value of would correspond to


The illustration shows the graph of the polynomial and the data points (0, 3.07) (1, 4.92) (2, 8.13) and (4, 22)



The graph of this polynomial will lie on all four data points as shown above. In fact, it is the only polynomial of degree three that exactly fits on the four data points. These four points uniquely determine the polynomial we have found. In general, n+1 points uniquely determine a polynomial of degree n.

Problems

1. Find the quadratic function whose graph passes through the points (-1, 2), (1, 1), and (3, 3). What is the value of this function at x = 0 ?

2. Given the following data:

x

0

1

2

4

5

y

3.07

4.92

8.13

22.1

36.5

Find the equation of the fourth degree polynomial that fits the data. Use it to estimate the value of y when x=3.

3. A snowball is melting in such a way that at time 0, its radius is 30 cm, at time 1 its radius is 22 cm, and at time 2 its radius is 6cm. Suppose the radius is a quadratic function of time. Use quadratic interpolation to estimate its radius at any time. What do you expect its radius to be at time 1.5?


Method of Least Squares

When data is collected from an experiment you would expect to have more accurate results by recording a large number of observations. This may result in a data set that is too large to use the method of Polynomial Interpolation. To find a polynomial of degree n, that fits a data set of more than n+1 points, we can use the method of Least Squares. Often, in this case, the graph of the polynomial will not lie on all the data points but it will be the best fit for the data.

Suppose we want to find a quadratic polynomial that fits the following data:

x

1

2

3

5

y

2

10

25

62

Here, we are finding a quadratic polynomial (degree n=2) for a data set of 4 points. Since in this setting, we cannot use the method of Polynomial Interpolation . However, we can start the process in a similar manner.

Each of the data points must satisfy the standard quadratic equation:

for the same coefficients

Substitute each data point into this equation to get the following system:

This system has more equations than unknowns and may not have a solution.

Consider the matrix equation that corresponds to the system .

Unlike the Vandermonde Matrix, the first matrix in the equation is not square and therefore has no inverse . At this point we will employ the transpose of this matrix to find the Normal Equations . This will enable us to find a solution.

Multiply each side of the matrix equation (left multiplication) by the transpose to get:

 

Corresponding to this matrix equation are the Normal Equations:

Solving this system we get the values for the coefficients

And the resulting polynomial:

Problems

1. Find the least squares line for the data: (1, 5), (2, 14), (4, 21) (7, 39).
Compare your result with Example 2 (page 1).

2. Find a quadratic polynomial that fits the data: (3, 70), (-1, 5), (1, 13), (2, 46).
Predict the value at x=5.

More Problems


eobrien@osf1.gmu.edu