Residuals (College Board AP® Statistics): Revision Note

Syllabus Edition

First teaching 2026

First exams 2027

Mark Curtis

Written by: Mark Curtis

Reviewed by: Dan Finlay

Updated on

Residuals

What are residuals?

  • A residual of a data point on a scatterplot is its vertical distance from the regression line

    • A positive residual means the point lies above the regression line

    • A negative residual means the point lies below the regression line

  • When a residual is positive, the regression line underestimates the y-value of that data point

    • whereas when a residual is negative, the regression line overestimates it

  • An outlier gives a larger residual than the other points

A scatter plot with data points, a dashed regression line, and one point (an outlier) circled with an arrow pointing upwards from the regression to the circled point.
An outlier with a large positive residual

What is the formula for calculating a residual?

  • The formula for calculating a residual is residual = y minus y with hat on top

    • The residual is the actual y-value minus the predicted y-value

Examiner Tips and Tricks

Residuals can be negative. Make sure you input the values into the formula in the correct order. You subtract the predicted value from the actual value.

Worked Example

A scatterplot and regression line are shown below. Calculate the residual for each of the five data points.

A scatterplot with points shown and a dashed regression line on a grid.

Answer:

The residuals are the numbers shown in brackets on the diagram below

The residuals +2, -3, 0, +3, -2 shown between data points on a scatterplot and the regression line.

Worked Example

A city planner is investigating the relationship between the distance a commuter lives from the downtown business district (in miles) and their average morning commute time (in minutes). The planner selects a random sample of 20 commuters and records their distance and commute time. A least-squares regression line is fit to the data, yielding the following equation:

predicted space commute space time equals 12.4 plus 2.6 cross times left parenthesis distance right parenthesis

One commuter in the sample lives 15 miles from the downtown business district and has an average morning commute time of 48 minutes. Calculate and interpret the residual for this commuter. Based on the residual, does the linear model overpredict or underpredict this commuter's commute time?

Answer:

Calculate the predicted commute time for a commuter living 15 miles away

12.4 plus 2.6 cross times left parenthesis 15 right parenthesis equals 51.4 space minutes

Subtract the predicted value from the actual value

48 minus 51.4 equals negative 3.4 space minutes

Interpret the residual

This commuter's actual average morning commute time is 3.4 minutes less than the commute time predicted by the least-squares regression model

Because the residual is negative, the linear model overpredicts (overestimates) the commuter's average morning commute time

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Mark Curtis

Author: Mark Curtis

Expertise: Maths Content Creator

Mark graduated twice from the University of Oxford: once in 2009 with a First in Mathematics, then again in 2013 with a PhD (DPhil) in Mathematics. He has had nine successful years as a secondary school teacher, specialising in A-Level Further Maths and running extension classes for Oxbridge Maths applicants. Alongside his teaching, he has written five internal textbooks, introduced new spiralling school curriculums and trained other Maths teachers through outreach programmes.

Dan Finlay

Reviewer: Dan Finlay

Expertise: Maths Subject Lead

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.