Linear Regression (DP IB Analysis & Approaches (AA)): Revision Note

Linear regression

What is linear regression?

  • If strong linear correlation exists on a scatter diagram then the data can be modelled by a linear model

    • Drawing lines of best fit by eye is not the best method as it can be difficult to judge the best position for the line

  • The least squares regression line is the line of best fit that minimises the sum of the squares of the gap between the line and each data value

  • It can be calculated by either looking at:

    • vertical distances between the line and the data values

      • This is the regression line of y on x

    • horizontal distances between the line and the data values

      • This is the regression line of x on y

How do I find the regression line of y on x?

  • The regression line of y on x is written in the form space y equals a x plus b

  • a is the gradient of the line

    • It represents the change in y for each individual unit change in x

      • If is positive this means increases by when x increases by one

      • If is negative this means decreases by |a| when x increases by one

  • b is the y – intercept

    • It shows the value of y when x is zero

  • You are expected to use your GDC to find the equation of the regression line

    • Enter the bivariate data and choose the model “ax + b”

    • Remember the mean point left parenthesis x with bar on top comma space y with bar on top right parenthesis will lie on the regression line

How do I find the regression line of x on y?

  • The regression line of x on y is written in the form space x equals c y plus d

  • c is the gradient of the line

    • It represents the change in x for each individual unit change in y

      • If c is positive this means x increases by c when y increases by one

      • If c is negative this means x decreases by |c| when y increases by one

  • d is the x – intercept

    • It shows the value of x when y is zero

  • You are expected to use your GDC to find the equation of the regression line

    • It is found the same way as the regression line of y on x but with the two data sets switched around

    • Remember the mean point left parenthesis x with bar on top comma space y with bar on top right parenthesis will lie on the regression line

How do I use a regression line?

  • The regression line can be used to decide what type of correlation there is if there is no scatter diagram

    • If the gradient is positive then the data set has positive correlation

    • If the gradient is negative then the data set has negative correlation

  • The regression line can also be used to predict the value of a dependent variable from an independent variable

    • The equation for the y on x line should only be used to make predictions for y

      • Using a y on x line to predict x is not always reliable

    • The equation for the x on y line should only be used to make predictions for x

      • Using an x on y line to predict y is not always reliable

    • Making a prediction within the range of the given data is called interpolation

      • This is usually reliable

      • The stronger the correlation the more reliable the prediction

    • Making a prediction outside of the range of the given data is called extrapolation

      • This is much less reliable

    • The prediction will be more reliable if the number of data values in the original sample set is bigger

  • The y on x and x on y regression lines intersect at the mean point left parenthesis x with bar on top comma space y with bar on top right parenthesis

Examiner Tips and Tricks

Once you calculate the values of and (or c and d), store them in your GDC.

This helps to avoid rounding errors, as you can use the full display values rather than the rounded values when using the linear regression equation to predict other values.

Worked Example

The table below shows the scores of eight students for a maths test and an English test.

Maths (x)

7

18

37

52

61

68

75

82

English (y)

5

3

9

12

17

41

49

97

a) Write down the value of Pearson’s product-moment correlation coefficient, r.

4-2-2-ib-aa-sl-linear-reg-a-we-solution

b) Write down the equation of the regression line of y on x, giving your answer in the form y equals a x plus b where a and b are constants to be found.

4-2-2-ib-aa-sl-linear-reg-b-we-solution

c) Write down the equation of the regression line of x on y, giving your answer in the form x equals c y plus d where c and d are constants to be found.

4-2-2-ib-aa-sl-linear-reg-c-we-solution

d) Use the appropriate regression line to predict the score on the maths test of a student who got a score of 63 on the English test.

4-2-2-ib-aa-sl-linear-reg-d-we-solution

You've read 0 of your 5 free revision notes this week

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Dan Finlay

Author: Dan Finlay

Expertise: Maths Subject Lead

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.

Roger B

Reviewer: Roger B

Expertise: Maths Content Creator

Roger's teaching experience stretches all the way back to 1992, and in that time he has taught students at all levels between Year 7 and university undergraduate. Having conducted and published postgraduate research into the mathematical theory behind quantum computing, he is more than confident in dealing with mathematics at any level the exam boards might throw at you.