Least Squares Regression Curves & Coefficient of Determination (DP IB Applications & Interpretation (AI)): Revision Note
Non-linear regression
What is non-linear regression?
You have already seen that linear regression is when you can use a straight line to fit bivariate data
Non-linear regression is when you can use a curve (rather than a straight line) to fit bivariate data
In your exam the regression could be:
Linear:
Quadratic:
Cubic:
Exponential:
or
Power:
Sine:
How do I find the equation of the non-linear regression model?
Using your GDC:
Type the two sets of the data into your GDC
Select the relevant model
The exam question will tell you which model to use
Your GDC will calculate the constants
You can use logarithms to linearise exponential and power relationships
Power:
then
and
will have a linear relationship
Exponential:
then
and
will have a linear relationship
Examiner Tips and Tricks
You can use your GDC to plot the scatter diagram and include the graph of a regression model
This will allow you to get a sense of how well the model fits the data
Worked Example
Scarlett and Violet collect data on the length of a film ( minutes) and the audience rating (
%).
75 | 93 | 101 | 107 | 115 | 124 | 132 | 140 | 171 | |
83 | 75 | 51 | 38 | 47 | 56 | 76 | 91 | 70 |
a) Scarlett claims that there is a cubic relationship. Find the equation of a cubic regression model of the form .

b) Violet claims that there is a sine relationship. Find the equation of a sine regression model of the form .

c) Whose model predicts a higher audience rating for a film which is 100 minutes long?

Least squares regression curves
What is a residual?
Given a set of n pairs of data and a regression model y = f(x)
A residual is the actual y-value (from the data) minus the predicted y-value (using the regression model)
The sum of the square residuals is denoted by
If you have two regression models using the same data then the one with the smaller
fits the data better
What is a least squares regression curve?
The least squares regression curve can be thought of as a “curve of best fit” y = f(x)
For a given type of model the least squares regression curve minimises the sum of the square residuals
Your GDC calculates the constants for the least squares regression curves
Why is the sum of the square residuals not always a good measure of fit?
If two models are formed using the same number of pairs of data then the sum of the square residuals is a good measure of fit
If two models use different number of pairs of data then
is not always a good measure of fit
The sum will increase with more pairs of data and so can no longer be compared against a data set with a different number of pairs
Compare the two scenarios
10 pairs of data and the absolute value of each residual is 15 then
2250 pairs of data and the absolute value of each residual is 1 then
They have the same value of
but the residuals in the second scenario are much smaller
Your GDC may give you the mean squared error
This is a better measure of fit
You do not need to know this for your exam but it might help with your understanding
Worked Example
Jet is the owner of a gym and he is testing different prices options. The table below shows the number of new members per month () and the price of a monthly membership (
).
10 | 20 | 30 | |
97 | 68 | 55 |
Jet believes that he can fit the data with either the model or the model
.
Jet wants to choose the model with the smallest value for the sum of square residuals.
Determine which model Jet should choose.

The coefficient of determination
What is the coefficient of determination?
The coefficient of determination is a measure of fit for a model
If the coefficient of determination is 0.57 this means 57% of the variation of the y-variable can be explained by the variation in the x-variable
The other 43% can be explained by other factors
The higher this proportion the more the model fits the data
The coefficient of determination is denoted by R²
R² ≤ 1
R² = 1 means the model is a perfect fit for the data
The closer to 1 the better the fit
R² is usually greater than or equal to zero
R² can be negative but this is outside the scope of this course
If the regression model is linear then the coefficient of determination is equal to square of the PMCC
for linear models
Some GDCs will simply denote R² as r² due to its connection to the PMCC for linear models
How do I calculate the coefficient of determination?
When finding the constants for regression models your GDC might give you the value of
You will only be asked to calculate the coefficient of determination for models for which GDCs give the value of R²
The coefficient of determination can be calculated by
Where
You do not need to know this formula but it might help with your understanding
Does the coefficient of determination determine the validity of a model?
If R² is close to 1 then the model fits the data well
However this alone does not guarantee that it is a good model for the relationship between the two variables
Consider the scenario where there are big gaps between data points and a model which fits the data well
The model only fits the data at the data points
As there are gaps between the data points the model might not be a good fit for these areas
Different types of models have different number of parameters
Therefore using different types of models to fit the same data will have different levels of accuracy
Linear models need at least two pairs of data
Quadratic models need at least three pairs of data
Cubic models need at least four pairs of data
Using four pairs of data will mean the cubic model will have R² = 1
This is because the cubic graph will go through all four pieces of data – the value is likely to decrease as extra pairs of data are includedHowever this does not mean it is a better fit than the quadratic model
The quadratic model could be more accurate as it has one more pair of data than is needed
Worked Example
Data is collected on the lengths of cheetahs ( metres) and their average running speeds (
ms-1).
1.21 | 1.33 | 1.12 | 1.45 | 1.42 | 1.39 | 1.24 | 1.19 | 1.32 | |
24.3 | 25.1 | 22.2 | 35.1 | 35.1 | 33.4 | 27.1 | 23.1 | 24.8 |
a) Find the equation of the least squares regression curve using:
(i) a quadratic model .
(ii) an exponential model .

b) Based solely on the coefficients of determination, suggest which model is better fit for the data.

You've read 0 of your 5 free revision notes this week
Unlock more, it's free!
Did this page help you?