Validating Models (College Board AP® Precalculus): Revision Note
Residuals and residual plots
How can residuals be used to validate a model?
A residual is the difference between
the actual value of the dependent variable
and the value predicted by the regression model
Residual
actual value
predicted value
After fitting a regression model (linear, quadratic, exponential, etc.) to a data set
You can plot the residuals to assess how well the model fits
A residual plot graphs the residuals against the input variable
A model is appropriate for a data set if the residual plot appears without pattern
The points should be scattered randomly above and below the horizontal axis
If the residuals show a clear pattern (e.g. a curve, a systematic trend from positive to negative, or clusters)
then the model is not appropriate
The presence of a pattern suggests that a different function type would better capture the relationship in the data
E.g. if a linear regression is fitted to data and the residual plot shows a U-shaped curve
I.e. the residuals are negative in the middle and positive at the ends (or vice versa)
This pattern indicates the data has a curvature that the linear model cannot capture
A quadratic or exponential model might be more appropriate
Examiner Tips and Tricks
When considering a residual plot, it is not about whether there are more points above or below zero, or whether the largest residuals are balanced.
It is specifically about whether there is a clear and obvious pattern
Worked Example
A scientist collects data on the growth of a plant over time and fits a linear regression model to the data. The residual plot is shown below.

Which of the following statements about the linear regression is true?
(A) The linear model is not appropriate, because there is a clear pattern in the graph of the residuals.
(B) The linear model is not appropriate, because the graph of the residuals has more points below 0 than above 0.
(C) The linear model is appropriate, because the positive and negative residuals are roughly balanced.
(D) The linear model is appropriate, because there is a clear pattern in the graph of the residuals.
Answer:
The residual plot shows a clear curved pattern
The residuals are not randomly scattered but instead follow a systematic arc
This indicates the linear model is not capturing the full relationship in the data
A) The linear model is not appropriate, because there
is a clear pattern in the graph of the residuals.
Errors in a model
What is the error in a model, and why does it matter?
The error in a model refers to the difference between the predicted and actual values
Since residuals provide information about errors
either residuals or errors can be used to discuss overestimates and underestimates
A model produces an overestimate when the predicted value is greater than the actual value
This corresponds to a negative residual
A model produces an underestimate when the predicted value is less than the actual value
This corresponds to a positive residual
Depending on the context, it may be more appropriate to have an underestimate or overestimate for a given interval
E.g. when estimating the strength of a bridge, it might be safer to underestimate
This builds in a safety margin
But when estimating project costs, it might be better to overestimate
This would ensure that there is a sufficient budget
Examiner Tips and Tricks
The course specification materials note that 'error' is sometimes defined as predicted minus actual, and sometimes as the absolute value of this difference.
The sign of the residual is valid content for the AP® Precalculus exam
and is based on Residual
actual value
predicted value
However the sign of the error will not be assessed on the exam.
When does the error in a model grow over time?
A model's error can increase when the model's predictions diverge from the actual behavior of the real-world quantity
This commonly happens when, for example
The model predicts the quantity will continue increasing, but the actual quantity starts decreasing (or vice versa)
The model is used to predict beyond the range of the data it was constructed from (this is called extrapolation)
In such cases
the gap between the model's predicted values and the actual values
grows larger over time, meaning the error increases
E.g. a logarithmic model
might fit sales data well for
days
But if daily sales start decreasing after day 91
while the model
continues to increase
(because a logarithmic function of that form will always increase if
)
then over time the model's predictions will diverge further and further from reality
At
the model and actual values might agree
but for
the gap grows
so the error increases
Examiner Tips and Tricks
Free response question 2(C) on the 2024 exam asked students to explain why the error in a model increases. Only 8% of students earned this point!
The key is to explicitly connect two things
what the model does (e.g. increases)
and what the actual quantity does (e.g. decreases)
Simply stating that "the model is inaccurate" is not enough. You need to explain why the error grows.
When explaining model limitations, use precise language.
Say "the model predicts increasing values while actual sales are decreasing"
rather than vague statements like "the model doesn't work anymore"
Worked Example
The number of daily visitors to a new website, in thousands, can be modeled by the function given by
. The constants 20 and 8.5 in the model were calculated based on the number of visitors on the initial day (
) and on the number of visitors 60 days later (
).
The website owner reports that daily visitors began to decrease each day after . Explain why the error in the model
increases after
.
Answer:
The important thing here is that the logarithmic function is an increasing function
As
continues to increase,
will never decrease
This clashes with the fact of decreasing visitors after
The model was constructed to match the actual visitor data at
, so at that point the model and actual values should match.
For , the actual number of daily visitors is decreasing. However, the model
is a logarithmic function, which is always increasing.
Because the model predicts increasing values while the actual visitor count is decreasing, the gap between the predicted and actual values grows larger each day after . Therefore, the error in the model increases for
.
Unlock more, it's free!
Was this revision note helpful?