Linear Regression (DP IB Analysis & Approaches (AA)): Revision Note

Did this video help you?

Linear Regression

What is linear regression?

  • If strong linear correlation exists on a scatter diagram then the data can be modelled by a linear model

    • Drawing lines of best fit by eye is not the best method as it can be difficult to judge the best position for the line

  • The least squares regression line is the line of best fit that minimises the sum of the squares of the gap between the line and each data value

  • It can be calculated by either looking at:

    • vertical distances between the line and the data values

      • This is the regression line of y on x

    • horizontal distances between the line and the data values

      • This is the regression line of x on y

How do I find the regression line of y on x?

  • The regression line of y on x is written in the form space y equals a x plus b

  • a is the gradient of the line

    • It represents the change in y for each individual unit change in x

      • If is positive this means increases by for a unit increase in x

      • If is negative this means decreases by |a| for a unit increase in x

  • b is the y – intercept

    • It shows the value of y when x is zero

  • You are expected to use your GDC to find the equation of the regression line

    • Enter the bivariate data and choose the model “ax + b”

    • Remember the mean point left parenthesis x with bar on top comma space y with bar on top right parenthesis will lie on the regression line

How do I find the regression line of x on y?

  • The regression line of x on y is written in the form space x equals c y plus d

  • c is the gradient of the line

    • It represents the change in x for each individual unit change in y

      • If c is positive this means x increases by c for a unit increase in y

      • If c is negative this means x decreases by |c| for a unit increase in y

  • d is the x – intercept

    • It shows the value of x when y is zero

  • You are expected to use your GDC to find the equation of the regression line

    • It is found the same way as the regression line of y on x but with the two data sets switched around

    • Remember the mean point left parenthesis x with bar on top comma space y with bar on top right parenthesis will lie on the regression line

How do I use a regression line?

  • The regression line can be used to decide what type of correlation there is if there is no scatter diagram

    • If the gradient is positive then the data set has positive correlation

    • If the gradient is negative then the data set has negative correlation

  • The regression line can also be used to predict the value of a dependent variable from an independent variable

    • The equation for the y on x line should only be used to make predictions for y

      • Using a y on x line to predict x is not always reliable

    • The equation for the x on y line should only be used to make predictions for x

      • Using an x on y line to predict y is not always reliable

    • Making a prediction within the range of the given data is called interpolation

      • This is usually reliable

      • The stronger the correlation the more reliable the prediction

    • Making a prediction outside of the range of the given data is called extrapolation

      • This is much less reliable

    • The prediction will be more reliable if the number of data values in the original sample set is bigger

  • The y on x and x on y regression lines intersect at the mean point left parenthesis x with bar on top comma space y with bar on top right parenthesis

Examiner Tips and Tricks

  • Once you calculate the values of and store then in your GDC

    • This means you can use the full display values rather than the rounded values when using the linear regression equation to predict values

    • This avoids rounding errors

Worked Example

The table below shows the scores of eight students for a maths test and an English test.

Maths (x)

7

18

37

52

61

68

75

82

English (y)

5

3

9

12

17

41

49

97

a) Write down the value of Pearson’s product-moment correlation coefficient, r.

4-2-2-ib-aa-sl-linear-reg-a-we-solution

b) Write down the equation of the regression line of y on x , giving your answer in the form y equals a x plus b where a and b are constants to be found.

4-2-2-ib-aa-sl-linear-reg-b-we-solution

c) Write down the equation of the regression line of x on y, giving your answer in the form x equals c y plus d where c and d are constants to be found.

4-2-2-ib-aa-sl-linear-reg-c-we-solution

d) Use the appropriate regression line to predict the score on the maths test of a student who got a score of 63 on the English test.

4-2-2-ib-aa-sl-linear-reg-d-we-solution

You've read 0 of your 5 free revision notes this week

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Dan Finlay

Author: Dan Finlay

Expertise: Maths Subject Lead

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.