Sampling Distributions for Differences in Sample Means (College Board AP® Statistics): Study Guide

Mark Curtis

Written by: Mark Curtis

Reviewed by: Dan Finlay

Updated on

Sampling distributions for differences in sample means

What is a one-sample problem?

  • When one random sample of size n has been taken from one population

    • with population mean mu and population standard deviation sigma

      • The sample mean is x with bar on top

      • This is a one-sample problem

What is a two-sample problem?

  • If one random sample of size n subscript 1 is taken from one population with population mean mu subscript 1 and population standard deviation sigma subscript 1

    • then a different random sample of size n subscript 2is taken from a different population (that is independent to the first population) with population mean mu subscript 2 and population standard deviation sigma subscript 2

      • then this is a two-sample problem

      • The sample means are x with bar on top subscript 1 and x with bar on top subscript 2

What is the difference in sample means?

  • In a two-sample problem you can compare the sample means from separate samples of two independent populations

    • You can look at the difference in sample means, x with bar on top subscript 1 minus x with bar on top subscript 2

      • e.g. if x with bar on top subscript 1 minus x with bar on top subscript 2 greater than 0 then the mean of the first sample is greater than the mean of the second sample

What is the sampling distribution for differences in sample means?

  • You can find the differences in sample means, if

    • you take all possible samples of size n subscript 1 from the first population and calculate their sample means, x with bar on top subscript 1

    • then take all possible samples of size n subscript 2 from the second population and calculate their sample means, x with bar on top subscript 2

    • then work out all the possible values that the difference x with bar on top subscript 1 minus x with bar on top subscript 2 can take

      • The collection of all these values is called the sampling distribution for differences in sample means

What are the mean and standard deviation of the sampling distribution for differences in sample means?

  • If the first population has a population mean of mu subscript 1 and a population standard deviation of sigma subscript 1

    • and the second independent population has a population mean of mu subscript 2 and population standard deviation of sigma subscript 2

  • Then the sampling distribution for differences in sample means, x with bar on top subscript 1 minus x with bar on top subscript 2

    • has a mean of mu subscript 1 minus mu subscript 2

    • and a standard deviation of square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root

    • where n subscript 1 is the size of the first sample

    • and n subscript 2 is the size of the second sample

  • The standard deviation of square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root assumes sampling was done with replacement

    • If sampling without replacement, make sure that each sample size is less than 10% of its population size to be able to use square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root

      • otherwise the standard deviation will be smaller

Examiner Tips and Tricks

The mean, mu subscript 1 minus mu subscript 2, and the standard deviation, square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root, are given in the exam under 'Sampling distributions for means', in the row called 'For two populations'.

What conditions are needed for normality?

  • If in addition to the above, the two independent populations are also known to be normally distributed

    • then the sampling distribution for differences in sample means is also normally distributed

      • with mean mu subscript 1 minus mu subscript 2 and standard deviation square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root

  • You can use these properties to calculate probabilities involving differences in sample means, x with bar on top subscript 1 minus x with bar on top subscript 2, as they follow a normal distribution

    • Its standardized z-statistic is fraction numerator open parentheses top enclose x subscript 1 minus top enclose x subscript 2 close parentheses minus open parentheses mu subscript 1 minus mu subscript 2 close parentheses over denominator square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root end fraction

      • mu subscript 1, mu subscript 2, sigma subscript 1 and sigma subscript 2 will be given in the question

What do I do if the populations are not normally distributed?

  • If the populations are not normally distributed, then you cannot say the sampling distribution for differences in sample means is normally distributed

    • This means you cannot work out any probabilities

  • However, despite not knowing its shape, the sampling distribution for differences in sample means still has a

    • mean of mu subscript 1 minus mu subscript 2 and a standard deviation of square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root

      • i.e. you can always write these down, even though the distribution is unknown

Can I use the Central Limit theorem if populations are not normally distributed?

  • If the populations are not normally distributed, but both sample sizes are greater than or equal to 30 (n subscript 1 greater or equal than 30 and n subscript 2 greater or equal than 30)

    • then the Central Limit theorem can be applied

    • meaning the sampling distribution for differences in sample means is approximately normally distributed with the parameters above

      • i.e. mean mu subscript 1 minus mu subscript 2 and standard deviation square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root

  • You can use these properties to estimate probabilities involving differences in sample means, x with bar on top subscript 1 minus x with bar on top subscript 2, as they follow an approximate normal distribution

    • Its standardized z-statistic is fraction numerator open parentheses top enclose x subscript 1 minus top enclose x subscript 2 close parentheses minus open parentheses mu subscript 1 minus mu subscript 2 close parentheses over denominator square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root end fraction

      • mu subscript 1, mu subscript 2, sigma subscript 1 and sigma subscript 2 will be given in the question

Worked Example

The average lifetime of bulbs from a company called Brite have a mean of 900 hours and a standard deviation of 25 hours. The average lifetime of bulbs from a company called Shine have a mean of 800 hours and a standard deviation of 15 hours.

Estimate the probability that the mean of a sample of 40 bulbs from Brite is at least 108 hours more than the mean of a sample of 50 bulbs from Shine.

Answer:

This a probability question about the difference in means of two samples, so requires the sampling distribution for differences in sample means

Start by labeling each population

Population 1 is the average lifetime of bulbs from Brite

Population 2 is the average lifetime of bulbs from Shine

You are not told the lifetimes of the bulbs are normally distributed but both sample sizes are greater than 30 so the Central Limit theorem can be applied

n subscript 1 equals 40 greater or equal than 30 and n subscript 2 equals 50 greater or equal than 30 so use the Central Limit theorem

The difference in sample means follows an approximate normal distribution with mean mu subscript 1 minus mu subscript 2 and standard deviation square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root

Substitute mu subscript 1 equals 900 and mu subscript 2 equals 800 into mu subscript 1 minus mu subscript 2

mu subscript 1 minus mu subscript 2 equals 900 minus 800 equals 100

Substitute sigma subscript 1 equals 25, n subscript 1 equals 40, sigma subscript 2 equals 15 and n subscript 2 equals 50 into square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root

square root of fraction numerator sigma subscript 1 superscript 2 over denominator n subscript 1 end fraction plus fraction numerator sigma subscript 2 superscript 2 over denominator n subscript 2 end fraction end root equals square root of fraction numerator 25 squared over denominator 40 end fraction plus fraction numerator 15 squared over denominator 50 end fraction end root equals 4.4860896...

The wording in the question asks for the probability that x with bar on top subscript 1 greater than x with bar on top subscript 2 plus 108

Rearrange this to form the difference of sample means, x with bar on top subscript 1 minus x with bar on top subscript 2

straight P open parentheses X with bar on top subscript 1 greater than X with bar on top subscript 2 plus 108 close parentheses equals straight P open parentheses X with bar on top subscript 1 minus X with bar on top subscript 2 greater than 108 close parentheses

The difference in sample means follows an approximate normal distribution with mean 100 and standard deviation 4.4860896... from above

To find the probability that the difference in sample means is greater than 108, first calculate the z-score for 108

fraction numerator 108 minus 100 over denominator 4.4860896... end fraction equals 1.783...

Then find straight P open parentheses Z greater than 1.783... close parentheses, e.g. using the normal tables

table row cell straight P open parentheses Z greater than 1.783... close parentheses end cell equals cell 1 minus straight P open parentheses Z less than 1.783... close parentheses end cell row blank equals cell 1 minus 0.9625 end cell row blank equals cell 0.0375 end cell end table

The probability that the mean of a sample of 40 bulbs from Brite is at least 108 hours more than the mean of a sample of 50 bulbs from Shine is approximately 0.0375

You've read 0 of your 5 free study guides this week

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Mark Curtis

Author: Mark Curtis

Expertise: Maths Content Creator

Mark graduated twice from the University of Oxford: once in 2009 with a First in Mathematics, then again in 2013 with a PhD (DPhil) in Mathematics. He has had nine successful years as a secondary school teacher, specialising in A-Level Further Maths and running extension classes for Oxbridge Maths applicants. Alongside his teaching, he has written five internal textbooks, introduced new spiralling school curriculums and trained other Maths teachers through outreach programmes.

Dan Finlay

Reviewer: Dan Finlay

Expertise: Maths Subject Lead

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.