Comparing Data using Summary Statistics (College Board AP® Statistics): Study Guide

Syllabus Edition

First teaching 2026

First exams 2027

Mark Curtis

Written by: Mark Curtis

Reviewed by: Dan Finlay

Updated on

Comparing data using summary statistics

  • Any of the numerical summaries (e.g., mean, standard deviation, relative frequency, etc.) can be used to compare two or more independent samples

How do I compare two data sets?

  • You may be given two sets of data that relate to a context

  • To compare data sets, you need to

    • compare their measures of center

      • Mode, median or mean

    • compare their measures of spread

      • Range, interquartile range or standard deviation

    • comment on the shape of the distribution of the data

      • Skew, symmetry

    • comment on any unusual features

      • Outliers

How do I choose which measures to use?

  • If the distributions are both roughly symmetrical, then you should use:

    • the mean

    • the standard deviation

  • If at least one of the distributions is skewed or contains outliers, then you should use:

    • the median

    • the interquartile range

How do I write a conclusion when comparing two data sets?

  • When comparing features, you need to

    • compare numerical values or calculate summary statistics

    • describe (interpret) what this means in real life 

  • For example, some good ways to describe a measure of spread (variability) are:

    • "A smaller spread of scores means...

      • scores are closer together"

      • scores are more consistent"

      • there is less variation in the scores"

Examiner Tips and Tricks

When comparing data sets, always remember to relate any numerical values to the context in the question. You may need to copy the exact wording from the question a few times.

Write a sentence comparing the numbers. And then write a sentence interpreting what the numbers mean in context.

What restrictions are there when drawing conclusions?

  • The data sets may be too small to be truly representative

    • Measuring the heights of only 5 pupils in a whole school is not enough to talk about averages and spreads

  • The data sets may be biased

    • Measuring the heights of just the older year groups in a school will make the average appear too high

  • The conclusions might be influenced by who is presenting them

    • A politician might select the specific type of average that helps to strengthen their argument!

  • You may need to choose which measure of center or measure of spread to compare

    • Check for outliers (extreme values) in the data

      • If there are outliers, avoid using the mean, standard deviation and range as they are affected by extreme values!

Worked Example

Manuel, an insurance agent, wants to compare the commute times to work (in minutes) for populations in two different areas of a region. He collects data from a random sample of residents in both areas and calculates the following summary statistics:

Area

n

Mean (x with bar on top)

Min

Q subscript 1

Median

Q subscript 3

Max

Area One

2,887

55

7

35

53

74

101

Area Five

4,502

42

5

24

32

68

83

(a) Use comparisons of the summary statistics in the table to describe the most likely shape of the distribution of commute times for Area Five.

(b) Compare the distance from Q subscript 1​ to the median and the distance from the median to Q subscript 3​ for Area One. Explain what this comparison reveals about the shape of the distribution for Area One.

(c) Based on your answers to (a) and (b), Manuel decides to compare the centers and variability of the two areas using the medians and the interquartile ranges (IQR) rather than the means and standard deviations. Justify why Manuel is using the correct measures to compare the distributions.

Answer:

(a)

Compare the median with the mean

This is evident because the mean (42) is substantially larger than the median (32)

Compare the median with the quartiles

The distance from the median to Q subscript 3​ (68−32=36) is much larger than the distance from Q subscript 1​ to the median (32−24=8), indicating that the data is stretched out further in the upper half of the distribution

The distribution of commute times for Area Five is likely skewed to the right (positively skewed)

(b)

Compare the distances

For Area One, the distance from Q subscript 1 to the median is 53−35=18, and the distance from the median to Q subscript 3​ is 74−53=21

Interpret the skewness

Because these distances are relatively close to each other, and the mean (55) and the median (53) are also relatively close to each other, the summary statistics suggest that the distribution of commute times for Area One is approximately symmetric

(c)

Identify how the skewness might affect the summary statistics

The median and IQR are considered resistant (or robust) measures of center and variability, meaning their values are not greatly affected by skewness or extreme outliers

By contrast, the mean and standard deviation are non-resistant measures

Because at least one of the distributions is significantly skewed, comparing the medians and IQRs provides a better representation of the typical commute times and their spread for both areas

Therefore, Manuel is using the correct measures of center and variability because the distribution for Area Five is strongly skewed to the right

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Mark Curtis

Author: Mark Curtis

Expertise: Maths Content Creator

Mark graduated twice from the University of Oxford: once in 2009 with a First in Mathematics, then again in 2013 with a PhD (DPhil) in Mathematics. He has had nine successful years as a secondary school teacher, specialising in A-Level Further Maths and running extension classes for Oxbridge Maths applicants. Alongside his teaching, he has written five internal textbooks, introduced new spiralling school curriculums and trained other Maths teachers through outreach programmes.

Dan Finlay

Reviewer: Dan Finlay

Expertise: Maths Subject Lead

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.