Regression Lines (AQA Level 3 Mathematical Studies (Core Maths))

Revision Note

Naomi C

Author

Naomi C

Expertise

Maths

Calculating & Using Regression Lines

How do I draw a line of best fit?

  • If strong linear correlation exists on a scatter diagram, then a line of best fit can be drawn

  • A line of best fit can be drawn by eye following the trend of the data points

    • For a linear correlation it is a straight line

    • It must go through the mean point open parentheses x with bar on top comma space y with bar on top close parentheses

    • There should be an approximately equal number of points above and below the line

What is a regression line?

  • A line of best fit drawn by eye can be inaccurate, however the equation of an accurate line of best fit can be calculated

    • This is called the regression line

  • The least squares regression line is the line of best fit that minimises the sum of the squares of the gap between the line and each data value

    • If the regression line is calculated by looking at the vertical distances it is called the regression line of y on x

    • If the regression line is calculated by looking at the horizontal distances it is called the regression line of x on y

      • The regression line of x on y is rarely used and is not in this course

  • The regression line of y on x is written in the form y equals a plus b x

    • b is the gradient of the line and represents the change in y for each individual unit change in x

    • a is the y-intercept and shows the value of y for which xis zero

    • The point open parentheses x with bar on top comma space y with bar on top close parentheses will lie on the regression line

  • You may be expected to:

    • work out the equation of the regression line from raw data using your calculator

    • draw a line of regression onto a scatter diagram

    • interpret or use a regression line to predict values

    • define a and b in the context of the question

How do I find the line of regression?

  • Most calculators will be able to calculate the equation for the line of regression when you input raw data

    • Select the statistics function on your calculator

    • Input the x and y values from the given data set

    • Use the option to generate the regression line or the line y equals a plus b x

  • The calculated values for a and b are likely to be given to a number of decimal places

    • Round to an appropriate degree of accuracy when writing them into the equation, e.g. 13 s.f.

Exam Tip

Make sure you know how to find the line of regression on your calculator as every calculator is different!

How do I draw the regression line from the equation?

  • Drawing a regression line is done in the same way as drawing a straight line graph

    • Plot the point open parentheses 0 comma space a close parentheses

    • Substitute values for x into the equation to find values of y

    • Connect two or more found points with a straight line

  • The equation of the regression line can be used to decide what type of correlation there is if there is no scatter diagram

    • If a is positive then the data set has positive correlation

    • If a is negative then the data set has negative correlation

Exam Tip

Remember that the value of b is the gradient of the regression line, it is not the strength of the correlation.

Worked Example

The table of values below shows the number of Save My Exams question packs completed by a group of students, x, and the percentage score they received in their Statistics exam, y.

No. SME question packs completed

Percentage scored on exam

10

0

31

32

12

38

51

100

29.5

65

24.5

35

39

59

50

92

30.5

76

60

78

(a) Find the equation of the regression line of y on x in the form y equals a plus b x  

Input the values into your calculator and calculate the line of regression

a equals 4.65840...
b equals 1.56567...

bold italic y bold equals bold 4 bold. bold 66 bold plus bold 1 bold. bold 57 bold italic x

The scatter graph of the results is shown below.

A scatter diagram showing the number of Save My Exams question packs completed against the percentage scored in the Statistics exam.

(b) Draw the regression line onto the scatter diagram.

 Substitute values into the equation

When x equals 0, y equals 4.66 plus 1.57 open parentheses 0 close parentheses equals 4.66
When x equals 50, y equals 4.66 plus 1.57 open parentheses 50 close parentheses equals 83.16

Plot points on the graphs and draw a straight line through them

A scatter diagram showing the number of Save My Exams question packs completed against the percentage scored in the Statistics exam. The line of regression y = 4.66 + 1.57x is drawn on the graph.

(c) Explain, in context, the meaning of the values of a and b.

a is the bold italic y-intercept, it is the value for y when x equals 0

bold italic a is the score (4.66%) that a person would expect to get in their Statistics exam if they competed no SME question packs

b is the gradient of the line, it is the change in y over the change in x

bold italic b is the increase in percentage score (1.57%) in their Statistics exam for every completed SME question pack

Interpolation & Extrapolation

  • The equation of the regression line can also be used to predict the value of a dependent variable (y) from an independent variable (x)

    • The equation should only be used to make predictions for y

      • Using a y on x line to predict x is not always reliable

  • Predictions should only be made for values of the dependent variable that are within the range of the given data

    • Making a prediction within the range of the given data is called interpolation

    • Making a prediction outside of the range of the given data is called extrapolation and is much less reliable

  • The prediction will be more reliable if the number of data values in the original sample set is bigger

Worked Example

The table below shows the daily mean pressure, p (hPa), and daily total sunshine,s (hrs), in Camborne for a random sample of 12 days in 2015.

p

1007

1023

1011

1022

1011

1019

1017

1016

1022

997

1030

1023

s

0

6.3

2.4

6.2

1.7

8.4

1.9

6.7

7.7

2.3

10.3

4.1


The equation of the regression line of s on p is  s equals negative 270.5 plus 0.271 p

(a) Give an interpretation of the value of the gradient of the regression line.

For unit on the x-axis there is a 0.271 increase in the y-axis

0.271 hour increase in sunshine for every 1 hPa pressure increase

(b) Use the regression line equation to find an estimate for the daily total sunshine in hours when the daily mean pressure is 1003.

Substitute p equals 1003 into the equation of the regression line

table row s equals cell negative 270.5 plus 0.271 open parentheses 1003 close parentheses end cell row blank equals cell 1.313 end cell end table

bold 1 bold. bold 313 hours

(c) Explain why the regression line given should not be used to estimate the daily total hours of sunshine when the daily mean pressure is 1035.

The value of the pressure is outside of the sample range (extrapolating), so result would be less likely to be accurate

bold 1035 is outside the range of the of values for bold italic p, stretchy left parenthesis 997 less or equal than p less or equal than 1030 stretchy right parenthesis

You've read 0 of your 0 free revision notes

Get unlimited access

to absolutely everything:

  • Downloadable PDFs
  • Unlimited Revision Notes
  • Topic Questions
  • Past Papers
  • Model Answers
  • Videos (Maths and Science)

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Naomi C

Author: Naomi C

Naomi graduated from Durham University in 2007 with a Masters degree in Civil Engineering. She has taught Mathematics in the UK, Malaysia and Switzerland covering GCSE, IGCSE, A-Level and IB. She particularly enjoys applying Mathematics to real life and endeavours to bring creativity to the content she creates.