Comparing Data using Summary Statistics (College Board AP® Statistics): Revision Note
Syllabus Edition
First teaching 2026
First exams 2027
Comparing data using summary statistics
Any of the numerical summaries (e.g., mean, standard deviation, relative frequency, etc.) can be used to compare two or more independent samples
How do I compare two data sets?
You may be given two sets of data that relate to a context
To compare data sets, you need to
compare their measures of center
Mode, median or mean
compare their measures of spread
Range, interquartile range or standard deviation
comment on the shape of the distribution of the data
Skew, symmetry
comment on any unusual features
Outliers
How do I choose which measures to use?
If the distributions are both roughly symmetrical, then you should use:
the mean
the standard deviation
If at least one of the distributions is skewed or contains outliers, then you should use:
the median
the interquartile range
How do I write a conclusion when comparing two data sets?
When comparing features, you need to
compare numerical values or calculate summary statistics
describe (interpret) what this means in real life
For example, some good ways to describe a measure of spread (variability) are:
"A smaller spread of scores means...
scores are closer together"
scores are more consistent"
there is less variation in the scores"
Examiner Tips and Tricks
When comparing data sets, always remember to relate any numerical values to the context in the question. You may need to copy the exact wording from the question a few times.
Write a sentence comparing the numbers. And then write a sentence interpreting what the numbers mean in context.
What restrictions are there when drawing conclusions?
The data sets may be too small to be truly representative
Measuring the heights of only 5 pupils in a whole school is not enough to talk about averages and spreads
The data sets may be biased
Measuring the heights of just the older year groups in a school will make the average appear too high
The conclusions might be influenced by who is presenting them
A politician might select the specific type of average that helps to strengthen their argument!
You may need to choose which measure of center or measure of spread to compare
Check for outliers (extreme values) in the data
If there are outliers, avoid using the mean, standard deviation and range as they are affected by extreme values!
Worked Example
Manuel, an insurance agent, wants to compare the commute times to work (in minutes) for populations in two different areas of a region. He collects data from a random sample of residents in both areas and calculates the following summary statistics:
Area | Mean ( | Min | Median | Max | |||
|---|---|---|---|---|---|---|---|
Area One | 2,887 | 55 | 7 | 35 | 53 | 74 | 101 |
Area Five | 4,502 | 42 | 5 | 24 | 32 | 68 | 83 |
(a) Use comparisons of the summary statistics in the table to describe the most likely shape of the distribution of commute times for Area Five.
(b) Compare the distance from to the median and the distance from the median to
for Area One. Explain what this comparison reveals about the shape of the distribution for Area One.
(c) Based on your answers to (a) and (b), Manuel decides to compare the centers and variability of the two areas using the medians and the interquartile ranges (IQR) rather than the means and standard deviations. Justify why Manuel is using the correct measures to compare the distributions.
Answer:
(a)
Compare the median with the mean
This is evident because the mean (42) is substantially larger than the median (32)
Compare the median with the quartiles
The distance from the median to (68−32=36) is much larger than the distance from
to the median (32−24=8), indicating that the data is stretched out further in the upper half of the distribution
The distribution of commute times for Area Five is likely skewed to the right (positively skewed)
(b)
Compare the distances
For Area One, the distance from to the median is 53−35=18, and the distance from the median to
is 74−53=21
Interpret the skewness
Because these distances are relatively close to each other, and the mean (55) and the median (53) are also relatively close to each other, the summary statistics suggest that the distribution of commute times for Area One is approximately symmetric
(c)
Identify how the skewness might affect the summary statistics
The median and IQR are considered resistant (or robust) measures of center and variability, meaning their values are not greatly affected by skewness or extreme outliers
By contrast, the mean and standard deviation are non-resistant measures
Because at least one of the distributions is significantly skewed, comparing the medians and IQRs provides a better representation of the typical commute times and their spread for both areas
Therefore, Manuel is using the correct measures of center and variability because the distribution for Area Five is strongly skewed to the right
Unlock more, it's free!
Was this revision note helpful?