Sampling Distributions for Differences in Sample Proportions (College Board AP® Statistics): Study Guide

Written by: Mark Curtis

Reviewed by: Dan Finlay

Updated on 28 August 2024

Sampling distributions for differences in sample proportions

What is a one-sample problem?

So far we've only considered one random sample of size $n$ being taken from one population with a population proportion of $p$
- The sample proportion is $\hat{p}$
- This is a one-sample problem

What is a two-sample problem?

If one random sample of size $n_{1}$ is taken from one population with population proportion of $p_{1}$
- and a different random sample of size $n_{2}$ is taken from a different population (that is independent to the first population) with population proportion of $p_{2}$
  - then this is a two-sample problem
  - The sample proportions are ${\hat{p}}_{1}$ and ${\hat{p}}_{2}$

What is the difference in sample proportions?

In a two-sample problem you can compare the sample proportions from separate samples of two independent populations
- You can look at the difference in sample proportions, ${\hat{p}}_{1} - {\hat{p}}_{2}$
  - e.g. if ${\hat{p}}_{1} - {\hat{p}}_{2} > 0$ then the proportion of successes in the first sample is greater than the proportion of successes of the second sample

What is the sampling distribution for differences in sample proportions?

In a sample of size $n_{1}$ taken from the first population
- let $X_{1}$ count the number of successes in the sample
  - so $X_{1}$ is the number of successes in $n_{1}$ trials
  - and each trial is either a success or a failure
$X_{1}$ follows a binomial distribution with probability of success $p_{1}$
- where $p_{1}$ is the population proportion
The sample proportion, ${\hat{p}}_{1}$ , is given by
- ${\hat{p}}_{1} = \frac{X_{1}}{n_{1}}$
  - The number of successes in the sample divided by the total number of individuals in the sample
Similarly, for a sample of size $n_{2}$ taken from the second population with $X_{2}$ successes
- The sample proportion, ${\hat{p}}_{2}$ , is given by
  - ${\hat{p}}_{2} = \frac{X_{2}}{n_{2}}$
If the sample sizes are large enough such that the conditions
- $n_{1} p_{1} \geq 10$
- $n_{1} (1 - p_{1}) \geq 10$
- $n_{2} p_{2} \geq 10$
- $n_{2} (1 - p_{2}) \geq 10$ are all satisfied
- then the difference in sample proportions, ${\hat{p}}_{1} - {\hat{p}}_{2}$ , will follow:
  - an approximate normal distribution
  - with mean $p_{1} - p_{2}$
  - and standard deviation $\sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}}$
- This is the sampling distribution for the difference in sample proportions

What else should I know about the sampling distribution for differences in sample proportions?

You need to know that
- The standard deviation $\sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}}$ assumes sampling was done with replacement
  - If sampling without replacement, make sure both sample sizes are less than 10% of their population size to be able to use $\sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}}$
  - otherwise the standard deviation will be smaller
- Because the distribution is approximately normal, you can use the normal distribution to calculate probabilities involving differences of sample proportions, ${\hat{p}}_{1} - {\hat{p}}_{2}$
  - Its standardized z-statistic is $\frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - (p_{1} - p_{2})}{\sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}}}$
  - $p_{1}$ and $p_{2}$ , the population proportions, will be given in the question
- If the sample sizes are not large enough (i.e. the four conditions are not satisfied) then the sampling distribution is not approximately normal
  - but the mean and standard deviation formulas still hold

Examiner Tips and Tricks

The mean, $p_{1} - p_{2}$ , and the standard deviation, $\sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}}$ , are given in the exam under 'Sampling distributions for proportions', in the row called 'For two populations'.

Worked Example

In Twiggy National Park, 35% of all eagles are male and in Dusty National Park, 20% of all eagles are male. A sample of 40 eagles is taken from Twiggy National Park and a sample of 50 eagles is taken from Dusty National Park.

Find the probability that the proportion of male eagles sampled in Twiggy National Park is less than the proportion of male eagles sampled in Dusty National Park.

Answer:

Start by labeling each population

Population 1 consists of all eagles in Twiggy National Park

Population 2 consists of all eagles in Dusty National Park

The question is about one sample proportion being less than another, ${\hat{p}}_{1} < {\hat{p}}_{2}$

This can be rearranged into the difference of two sample proportions, ${\hat{p}}_{1} - {\hat{p}}_{2}$

$P ({\hat{p}}_{1} < {\hat{p}}_{2}) = P ({\hat{p}}_{1} - {\hat{p}}_{2} < 0)$

The difference in sample proportions follows an approximate normal distribution with mean ${\hat{p}}_{1} - {\hat{p}}_{2}$ and standard deviation $\sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}}$ so long as $n_{1} p_{1} \geq 10$ , $n_{1} (1 - p_{1}) \geq 10$ , $n_{2} p_{2} \geq 10$ and $n_{2} (1 - p_{2}) \geq 10$

Test the four conditions with $n_{1} = 40$ , $p_{1} = 0.35$ , $n_{2} = 50$ and $p_{2} = 0.2$

$n_{1} p_{1} = 40 \times 0.35 = 14 \geq 10 n_{1} (1 - p_{1}) = 40 \times (1 - 0.35) = 26 \geq 10 n_{2} p_{2} = 50 \times 0.2 = 10 \geq 10 n_{2} (1 - p_{2}) = 50 \times (1 - 0.2) = 40 \geq 10$

The conditions are satisfied

Substitute $p_{1} = 0.35$ and $p_{2} = 0.2$ into $p_{1} - p_{2}$

$p_{1} - p_{2} = 0.35 - 0.2 = 0.15$

Substitute $n_{1} = 40$ , $p_{1} = 0.35$ , $n_{2} = 50$ and $p_{2} = 0.2$ into $\sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}}$

$\begin{array}{rcl} \sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}} & = & \sqrt{\frac{0.35 \times 0.65}{40} + \frac{0.2 \times 0.8}{50}} \\ = & 0.094273538 . . . \end{array}$

From above, you want to find $P ({\hat{p}}_{1} - {\hat{p}}_{2} < 0)$

The difference in sample proportions follows (approximately) a normal distribution with mean 0.15 and standard deviation 0.094273538...

To find the probability that the difference in sample means is less than 0, first calculate the z-score for 0

$\frac{0 - 0.15}{0.094273538 . . .} = - 1.5911 . . .$

Then find $P (Z < - 1.5911 . . .)$ , e.g. using the normal tables

$P (Z < - 1.5911 . . .) = 0.0559$

The probability that the proportion of male eagles sampled in Twiggy National Park is less than the proportion of male eagles sampled in Dusty National Park is 0.0559

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Test yourself

Did this page help you?

Previous:Sampling Distributions for Sample ProportionsNext:Biased & Unbiased Estimators

Sampling Distributions for Differences in Sample Proportions (College Board AP® Statistics): Study Guide

Sampling distributions for differences in sample proportions

What is a one-sample problem?

What is a two-sample problem?

What is the difference in sample proportions?

What is the sampling distribution for differences in sample proportions?

What else should I know about the sampling distribution for differences in sample proportions?

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

Unit 1: Exploring One-Variable Data

Summary Statistics

Describing Variables

Parameters & Statistics

Measures of Center

Measures of Position

Measures of Variability

Tables & Relative Frequency

Grouped Data

Outliers & Resistant Measures

Five-Number Summary & Boxplots

Skewness of Data

Comparing Data using Summary Statistics

Graphical Representations

Shape of Distributions

Bar Charts & Histograms

Dotplots & Stemplots

Cumulative Graphs

Comparing Univariate Graphs

The Normal Distribution

Properties of Normal Distributions

Standardized z-scores

Comparing Normal Distributions

Finding Proportions from Normal Distributions

Inverse Normal Calculations

Estimating Parameters of Normal Distributions

Unit 2: Exploring Two-Variable Data

Tables & Graphs

Two-Way Tables & Relative Frequencies

Bar Graphs & Mosaic Plots

Scatterplots & Regression

Explanatory & Response Variables

Scatterplots

Association & Correlation Coefficients

Interpolation & Extrapolation using Linear Models

Residuals

The Least-Squares Regression Line

Residual Plots

The Coefficient of Determination

Outliers, High-Leverage & Influential Points

Linearization of Bivariate Data

Unit 3: Collecting Data

Sampling Methods & Bias

Introduction to Sampling

Simple Random Sampling (SRS)

Random Sampling Methods

Types of Bias

Non-random (Biased) Sampling Methods

Experimental Design

Introduction to Experiments

Well-Designed Experiments

Control Groups, Placebos & Blind Experiments

Completely Randomized Design

Randomized Block & Matched Pairs Design

Unit 4: Probability, Random Variables & Probability Distributions

Probability

Estimating Probability using Relative Frequency

Probabilities of Single Events

Introduction to Combined Events

Addition Rule & Mutually Exclusive Events

Conditional Probability

Multiplication Rule & Independent Events

Probabilities of Combined Events using Tree Diagrams

Probabilities of Combined Events using the Rules

Discrete Random Variables

Probability Distributions for Discrete Random Variables

Cumulative Probability Distributions for Discrete Random Variables

Mean & Standard Deviation of a Discrete Random Variable

Linear Transformations of Random Variables

Linear Combinations of Random Variables

Binomial & Geometric Distributions