Sampling Distributions for Differences in Sample Means (College Board AP® Statistics): Study Guide

Written by: Mark Curtis

Reviewed by: Dan Finlay

Updated on 26 August 2024

Sampling distributions for differences in sample means

What is a one-sample problem?

When one random sample of size $n$ has been taken from one population
- with population mean $μ$ and population standard deviation $σ$
  - The sample mean is $\bar{x}$
  - This is a one-sample problem

What is a two-sample problem?

If one random sample of size $n_{1}$ is taken from one population with population mean $μ_{1}$ and population standard deviation $σ_{1}$
- then a different random sample of size $n_{2}$ is taken from a different population (that is independent to the first population) with population mean $μ_{2}$ and population standard deviation $σ_{2}$
  - then this is a two-sample problem
  - The sample means are ${\bar{x}}_{1}$ and ${\bar{x}}_{2}$

What is the difference in sample means?

In a two-sample problem you can compare the sample means from separate samples of two independent populations
- You can look at the difference in sample means, ${\bar{x}}_{1} - {\bar{x}}_{2}$
  - e.g. if ${\bar{x}}_{1} - {\bar{x}}_{2} > 0$ then the mean of the first sample is greater than the mean of the second sample

What is the sampling distribution for differences in sample means?

You can find the differences in sample means, if
- you take all possible samples of size $n_{1}$ from the first population and calculate their sample means, ${\bar{x}}_{1}$
- then take all possible samples of size $n_{2}$ from the second population and calculate their sample means, ${\bar{x}}_{2}$
- then work out all the possible values that the difference ${\bar{x}}_{1} - {\bar{x}}_{2}$ can take
  - The collection of all these values is called the sampling distribution for differences in sample means

What are the mean and standard deviation of the sampling distribution for differences in sample means?

If the first population has a population mean of $μ_{1}$ and a population standard deviation of $σ_{1}$
- and the second independent population has a population mean of $μ_{2}$ and population standard deviation of $σ_{2}$
Then the sampling distribution for differences in sample means, ${\bar{x}}_{1} - {\bar{x}}_{2}$
- has a mean of $μ_{1} - μ_{2}$
- and a standard deviation of $\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$
- where $n_{1}$ is the size of the first sample
- and $n_{2}$ is the size of the second sample
The standard deviation of $\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$ assumes sampling was done with replacement
- If sampling without replacement, make sure that each sample size is less than 10% of its population size to be able to use $\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$
  - otherwise the standard deviation will be smaller

Examiner Tips and Tricks

The mean, $μ_{1} - μ_{2}$ , and the standard deviation, $\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$ , are given in the exam under 'Sampling distributions for means', in the row called 'For two populations'.

What conditions are needed for normality?

If in addition to the above, the two independent populations are also known to be normally distributed
- then the sampling distribution for differences in sample means is also normally distributed
  - with mean $μ_{1} - μ_{2}$ and standard deviation $\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$
You can use these properties to calculate probabilities involving differences in sample means, ${\bar{x}}_{1} - {\bar{x}}_{2}$ , as they follow a normal distribution
- Its standardized z-statistic is $\frac{(x_{1} - x_{2}) - (μ_{1} - μ_{2})}{\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}}$
  - $μ_{1}$ , $μ_{2}$ , $σ_{1}$ and $σ_{2}$ will be given in the question

What do I do if the populations are not normally distributed?

If the populations are not normally distributed, then you cannot say the sampling distribution for differences in sample means is normally distributed
- This means you cannot work out any probabilities
However, despite not knowing its shape, the sampling distribution for differences in sample means still has a
- mean of $μ_{1} - μ_{2}$ and a standard deviation of $\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$
  - i.e. you can always write these down, even though the distribution is unknown

Can I use the Central Limit theorem if populations are not normally distributed?

If the populations are not normally distributed, but both sample sizes are greater than or equal to 30 ( $n_{1} \geq 30$ and $n_{2} \geq 30$ )
- then the Central Limit theorem can be applied
- meaning the sampling distribution for differences in sample means is approximately normally distributed with the parameters above
  - i.e. mean $μ_{1} - μ_{2}$ and standard deviation $\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$
You can use these properties to estimate probabilities involving differences in sample means, ${\bar{x}}_{1} - {\bar{x}}_{2}$ , as they follow an approximate normal distribution
- Its standardized z-statistic is $\frac{(x_{1} - x_{2}) - (μ_{1} - μ_{2})}{\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}}$
  - $μ_{1}$ , $μ_{2}$ , $σ_{1}$ and $σ_{2}$ will be given in the question

Worked Example

The average lifetime of bulbs from a company called Brite have a mean of 900 hours and a standard deviation of 25 hours. The average lifetime of bulbs from a company called Shine have a mean of 800 hours and a standard deviation of 15 hours.

Estimate the probability that the mean of a sample of 40 bulbs from Brite is at least 108 hours more than the mean of a sample of 50 bulbs from Shine.

Answer:

This a probability question about the difference in means of two samples, so requires the sampling distribution for differences in sample means

Start by labeling each population

Population 1 is the average lifetime of bulbs from Brite

Population 2 is the average lifetime of bulbs from Shine

You are not told the lifetimes of the bulbs are normally distributed but both sample sizes are greater than 30 so the Central Limit theorem can be applied

$n_{1} = 40 \geq 30$ and $n_{2} = 50 \geq 30$ so use the Central Limit theorem

The difference in sample means follows an approximate normal distribution with mean $μ_{1} - μ_{2}$ and standard deviation $\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$

Substitute $μ_{1} = 900$ and $μ_{2} = 800$ into $μ_{1} - μ_{2}$

$μ_{1} - μ_{2} = 900 - 800 = 100$

Substitute $σ_{1} = 25$ , $n_{1} = 40$ , $σ_{2} = 15$ and $n_{2} = 50$ into $\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$

$\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}} = \sqrt{\frac{25^{2}}{40} + \frac{15^{2}}{50}} = 4.4860896 . . .$

The wording in the question asks for the probability that ${\bar{x}}_{1} > {\bar{x}}_{2} + 108$

Rearrange this to form the difference of sample means, ${\bar{x}}_{1} - {\bar{x}}_{2}$

$P ({\bar{X}}_{1} > {\bar{X}}_{2} + 108) = P ({\bar{X}}_{1} - {\bar{X}}_{2} > 108)$

The difference in sample means follows an approximate normal distribution with mean 100 and standard deviation 4.4860896... from above

To find the probability that the difference in sample means is greater than 108, first calculate the z-score for 108

$\frac{108 - 100}{4.4860896 . . .} = 1.783 . . .$

Then find $P (Z > 1.783 . . .)$ , e.g. using the normal tables

$\begin{array}{rcl} P (Z > 1.783 . . .) & = & 1 - P (Z < 1.783 . . .) \\ = & 1 - 0.9625 \\ = & 0.0375 \end{array}$

The probability that the mean of a sample of 40 bulbs from Brite is at least 108 hours more than the mean of a sample of 50 bulbs from Shine is approximately 0.0375

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Test yourself

Did this page help you?

Previous:The Central Limit TheoremNext:Sampling Distributions for Sample Proportions

Sampling Distributions for Differences in Sample Means (College Board AP® Statistics): Study Guide

Sampling distributions for differences in sample means

What is a one-sample problem?

What is a two-sample problem?

What is the difference in sample means?

What is the sampling distribution for differences in sample means?

What are the mean and standard deviation of the sampling distribution for differences in sample means?

What conditions are needed for normality?

What do I do if the populations are not normally distributed?

Can I use the Central Limit theorem if populations are not normally distributed?

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

Unit 1: Exploring One-Variable Data

Summary Statistics

Describing Variables

Parameters & Statistics

Measures of Center

Measures of Position

Measures of Variability

Tables & Relative Frequency

Grouped Data

Outliers & Resistant Measures

Five-Number Summary & Boxplots

Skewness of Data

Comparing Data using Summary Statistics

Graphical Representations

Shape of Distributions

Bar Charts & Histograms

Dotplots & Stemplots

Cumulative Graphs

Comparing Univariate Graphs

The Normal Distribution

Properties of Normal Distributions

Standardized z-scores

Comparing Normal Distributions

Finding Proportions from Normal Distributions

Inverse Normal Calculations

Estimating Parameters of Normal Distributions

Unit 2: Exploring Two-Variable Data

Tables & Graphs

Two-Way Tables & Relative Frequencies

Bar Graphs & Mosaic Plots

Scatterplots & Regression

Explanatory & Response Variables

Scatterplots

Association & Correlation Coefficients

Interpolation & Extrapolation using Linear Models

Residuals

The Least-Squares Regression Line

Residual Plots

The Coefficient of Determination

Outliers, High-Leverage & Influential Points

Linearization of Bivariate Data

Unit 3: Collecting Data

Sampling Methods & Bias

Introduction to Sampling

Simple Random Sampling (SRS)

Random Sampling Methods

Types of Bias

Non-random (Biased) Sampling Methods

Experimental Design

Introduction to Experiments

Well-Designed Experiments

Control Groups, Placebos & Blind Experiments

Completely Randomized Design

Randomized Block & Matched Pairs Design

Unit 4: Probability, Random Variables & Probability Distributions

Probability

Estimating Probability using Relative Frequency

Probabilities of Single Events

Introduction to Combined Events

Addition Rule & Mutually Exclusive Events

Conditional Probability

Multiplication Rule & Independent Events

Probabilities of Combined Events using Tree Diagrams

Probabilities of Combined Events using the Rules

Discrete Random Variables

Probability Distributions for Discrete Random Variables

Cumulative Probability Distributions for Discrete Random Variables

Mean & Standard Deviation of a Discrete Random Variable