Validating Models (College Board AP® Precalculus): Revision Note

Roger B

Written by: Roger B

Reviewed by: Mark Curtis

Updated on

Residuals and residual plots

How can residuals be used to validate a model?

  • A residual is the difference between

    • the actual value of the dependent variable

      • and the value predicted by the regression model

    • Residual = actual value - predicted value

  • After fitting a regression model (linear, quadratic, exponential, etc.) to a data set

    • You can plot the residuals to assess how well the model fits

  • A residual plot graphs the residuals against the input variable

    • A model is appropriate for a data set if the residual plot appears without pattern

      • The points should be scattered randomly above and below the horizontal axis

    • If the residuals show a clear pattern (e.g. a curve, a systematic trend from positive to negative, or clusters)

      • then the model is not appropriate

    • The presence of a pattern suggests that a different function type would better capture the relationship in the data

  • E.g. if a linear regression is fitted to data and the residual plot shows a U-shaped curve

    • I.e. the residuals are negative in the middle and positive at the ends (or vice versa)

    • This pattern indicates the data has a curvature that the linear model cannot capture

    • A quadratic or exponential model might be more appropriate

Examiner Tips and Tricks

When considering a residual plot, it is not about whether there are more points above or below zero, or whether the largest residuals are balanced.

  • It is specifically about whether there is a clear and obvious pattern

Worked Example

A scientist collects data on the growth of a plant over time and fits a linear regression model to the data. The residual plot is shown below.

Residuals plot showing residuals over time, forming a U-shape. Y-axis labelled "Residuals," X-axis labelled "Time."

Which of the following statements about the linear regression is true?

(A) The linear model is not appropriate, because there is a clear pattern in the graph of the residuals.

(B) The linear model is not appropriate, because the graph of the residuals has more points below 0 than above 0.

(C) The linear model is appropriate, because the positive and negative residuals are roughly balanced.

(D) The linear model is appropriate, because there is a clear pattern in the graph of the residuals.

Answer:

The residual plot shows a clear curved pattern

  • The residuals are not randomly scattered but instead follow a systematic arc

  • This indicates the linear model is not capturing the full relationship in the data

A) The linear model is not appropriate, because there
is a clear pattern in the graph of the residuals.

Errors in a model

What is the error in a model, and why does it matter?

  • The error in a model refers to the difference between the predicted and actual values

    • Since residuals provide information about errors

    • either residuals or errors can be used to discuss overestimates and underestimates

  • A model produces an overestimate when the predicted value is greater than the actual value

    • This corresponds to a negative residual

  • A model produces an underestimate when the predicted value is less than the actual value

    • This corresponds to a positive residual

  • Depending on the context, it may be more appropriate to have an underestimate or overestimate for a given interval

    • E.g. when estimating the strength of a bridge, it might be safer to underestimate

      • This builds in a safety margin

    • But when estimating project costs, it might be better to overestimate

      • This would ensure that there is a sufficient budget

Examiner Tips and Tricks

The course specification materials note that 'error' is sometimes defined as predicted minus actual, and sometimes as the absolute value of this difference.

The sign of the residual is valid content for the AP® Precalculus exam

  • and is based on Residual = actual value - predicted value

However the sign of the error will not be assessed on the exam.

When does the error in a model grow over time?

  • A model's error can increase when the model's predictions diverge from the actual behavior of the real-world quantity

  • This commonly happens when, for example

    • The model predicts the quantity will continue increasing, but the actual quantity starts decreasing (or vice versa)

    • The model is used to predict beyond the range of the data it was constructed from (this is called extrapolation)

  • In such cases

    • the gap between the model's predicted values and the actual values

    • grows larger over time, meaning the error increases

  • E.g. a logarithmic model G(t) = a + b\ln(t + 1) might fit sales data well for 0 \leq t \leq 91 days

    • But if daily sales start decreasing after day 91

    • while the model G continues to increase

      • (because a logarithmic function of that form will always increase if b greater than 0)

    • then over time the model's predictions will diverge further and further from reality

  • At t = 91 the model and actual values might agree

    • but for t > 91 the gap grows

    • so the error increases

Examiner Tips and Tricks

Free response question 2(C) on the 2024 exam asked students to explain why the error in a model increases. Only 8% of students earned this point!

The key is to explicitly connect two things

  • what the model does (e.g. increases)

  • and what the actual quantity does (e.g. decreases)

Simply stating that "the model is inaccurate" is not enough. You need to explain why the error grows.

When explaining model limitations, use precise language.

  • Say "the model predicts increasing values while actual sales are decreasing"

  • rather than vague statements like "the model doesn't work anymore"

Worked Example

The number of daily visitors to a new website, in thousands, can be modeled by the function V given by V left parenthesis t right parenthesis equals 20 plus 8.5 ln left parenthesis t plus 1 right parenthesis. The constants 20 and 8.5 in the model were calculated based on the number of visitors on the initial day (t equals 0) and on the number of visitors 60 days later (t equals 60).

The website owner reports that daily visitors began to decrease each day after t equals 60. Explain why the error in the model V increases after t equals 60.

Answer:

The important thing here is that the logarithmic function V open parentheses t close parentheses is an increasing function

  • As t continues to increase, V open parentheses t close parentheses will never decrease

  • This clashes with the fact of decreasing visitors after t equals 60

The model V was constructed to match the actual visitor data at t = 60, so at that point the model and actual values should match.

For t > 60, the actual number of daily visitors is decreasing. However, the model V left parenthesis t right parenthesis equals 20 plus 8.5 ln left parenthesis t plus 1 right parenthesis is a logarithmic function, which is always increasing.

Because the model predicts increasing values while the actual visitor count is decreasing, the gap between the predicted and actual values grows larger each day after t = 60. Therefore, the error in the model increases for t > 60.

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Roger B

Author: Roger B

Expertise: Maths Content Creator

Roger's teaching experience stretches all the way back to 1992, and in that time he has taught students at all levels between Year 7 and university undergraduate. Having conducted and published postgraduate research into the mathematical theory behind quantum computing, he is more than confident in dealing with mathematics at any level the exam boards might throw at you.

Mark Curtis

Reviewer: Mark Curtis

Expertise: Maths Content Creator

Mark graduated twice from the University of Oxford: once in 2009 with a First in Mathematics, then again in 2013 with a PhD (DPhil) in Mathematics. He has had nine successful years as a secondary school teacher, specialising in A-Level Further Maths and running extension classes for Oxbridge Maths applicants. Alongside his teaching, he has written five internal textbooks, introduced new spiralling school curriculums and trained other Maths teachers through outreach programmes.