AP®StatisticsCollege BoardRevision NotesUnit 1: Exploring One-Variable Data and Collecting DataSummary StatisticsComparing Data using Summary Statistics

Comparing Data using Summary Statistics (College Board AP® Statistics): Revision Note

Written by: Mark Curtis

Reviewed by: Dan Finlay

Updated on 21 May 2026

Comparing data using summary statistics

Any of the numerical summaries (e.g., mean, standard deviation, relative frequency, etc.) can be used to compare two or more independent samples

How do I compare two data sets?

You may be given two sets of data that relate to a context
To compare data sets, you need to
- compare their measures of center
  - Mode, median or mean
- compare their measures of spread
  - Range, interquartile range or standard deviation
- comment on the shape of the distribution of the data
  - Skew, symmetry
- comment on any unusual features
  - Outliers

How do I choose which measures to use?

If the distributions are both roughly symmetrical, then you should use:
- the mean
- the standard deviation
If at least one of the distributions is skewed or contains outliers, then you should use:
- the median
- the interquartile range

How do I write a conclusion when comparing two data sets?

When comparing features, you need to
- compare numerical values or calculate summary statistics
- describe (interpret) what this means in real life
For example, some good ways to describe a measure of spread (variability) are:
- "A smaller spread of scores means...
  - scores are closer together"
  - scores are more consistent"
  - there is less variation in the scores"

Examiner Tips and Tricks

When comparing data sets, always remember to relate any numerical values to the context in the question. You may need to copy the exact wording from the question a few times.

Write a sentence comparing the numbers. And then write a sentence interpreting what the numbers mean in context.

What restrictions are there when drawing conclusions?

The data sets may be too small to be truly representative
- Measuring the heights of only 5 pupils in a whole school is not enough to talk about averages and spreads
The data sets may be biased
- Measuring the heights of just the older year groups in a school will make the average appear too high
The conclusions might be influenced by who is presenting them
- A politician might select the specific type of average that helps to strengthen their argument!

You may need to choose which measure of center or measure of spread to compare
- Check for outliers (extreme values) in the data
  - If there are outliers, avoid using the mean, standard deviation and range as they are affected by extreme values!

Worked Example

Manuel, an insurance agent, wants to compare the commute times to work (in minutes) for populations in two different areas of a region. He collects data from a random sample of residents in both areas and calculates the following summary statistics:

Area	$n$	Mean ( $\bar{x}$ )	Min	$Q_{1}$	Median	$Q_{3}$	Max
Area One	2,887	55	7	35	53	74	101
Area Five	4,502	42	5	24	32	68	83

(a) Use comparisons of the summary statistics in the table to describe the most likely shape of the distribution of commute times for Area Five.

(b) Compare the distance from $Q_{1}$ to the median and the distance from the median to $Q_{3}$ for Area One. Explain what this comparison reveals about the shape of the distribution for Area One.

(c) Based on your answers to (a) and (b), Manuel decides to compare the centers and variability of the two areas using the medians and the interquartile ranges (IQR) rather than the means and standard deviations. Justify why Manuel is using the correct measures to compare the distributions.

Answer:

(a)

Compare the median with the mean

This is evident because the mean (42) is substantially larger than the median (32)

Compare the median with the quartiles

The distance from the median to $Q_{3}$ (68−32=36) is much larger than the distance from $Q_{1}$ to the median (32−24=8), indicating that the data is stretched out further in the upper half of the distribution

The distribution of commute times for Area Five is likely skewed to the right (positively skewed)

(b)

Compare the distances

For Area One, the distance from $Q_{1}$ to the median is 53−35=18, and the distance from the median to $Q_{3}$ is 74−53=21

Interpret the skewness

Because these distances are relatively close to each other, and the mean (55) and the median (53) are also relatively close to each other, the summary statistics suggest that the distribution of commute times for Area One is approximately symmetric

(c)

Identify how the skewness might affect the summary statistics

The median and IQR are considered resistant (or robust) measures of center and variability, meaning their values are not greatly affected by skewness or extreme outliers

By contrast, the mean and standard deviation are non-resistant measures

Because at least one of the distributions is significantly skewed, comparing the medians and IQRs provides a better representation of the typical commute times and their spread for both areas

Therefore, Manuel is using the correct measures of center and variability because the distribution for Area Five is strongly skewed to the right

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

I would just like to say a massive thank you for putting together such a brilliant, easy to use website.I really think using this site helped me secure my top gradesin science and maths. You really did save my exams! Thank you.

Beth
IGCSE Student

This website is soooo useful and I can’t ever thank you enough for organising questions by topic like this. Furthermore, the name of the website could not have been more appropriate as it literally did SAVE MY EXAMS!

Fathima
A Level Student

Incredible! SO worth my money, the revision notes have everything I need to know and are so easy to understand. I actually enjoy revising! It makes me feel a lot more confident for my GCSEs in a few months.

Kate
GCSE Student

Absolutely brilliant, both my girls used it for A levels and GCSE. It's saves on paper copies, also beneficial exam questions ranked from easy to hard. It's removed a lot of stress from the exams.

Sameera
Parent

Just to say that your resources are the best I have seen and I have been teaching chemistry at different levels for about 40 years

Mark
Chemistry Teacher

Excellent