Working with Data (Edexcel A Level Maths: Statistics): Exam Questions

Exam code: 9MA0

3 hours24 questions
1
Sme Calculator
2 marks

The table below gives information about the ages of passengers on an airline.

There were no passengers aged 90 or over.

Age (x years)

0 less or equal than x less than 5

5 less or equal than x less than 20

20 less or equal than x less than 40

40 less or equal than x less than 65

65 less or equal than x less than 80

80 less or equal than x less than 90

Frequency

5

45

90

130

60

1

An outlier is defined as a value greater than Q subscript 3 plus 1.5 cross times interquartile space range.

Given that Q subscript 1 equals 27.3 and Q subscript 3 equals 58.9, determine, giving a reason, whether or not the oldest passenger could be considered as an outlier.

2a
3 marks

In a conkers competition the number of strikes required to smash an opponent's conker (and thus win a match) is recorded for 15 matches. The results are:

6, \; 2, \; 9, \; 10, \; 9, \; 12, \; 5, \; 8, \; 7, \; 5, \; 11, \; 9, \; 17, \; 8, \; 9

(i) Find the median number of strikes.

(ii) Find the interquartile range.

2b
2 marks

An outlier is defined as any data value that falls either more than 1.5 \times \text{IQR} above the upper quartile or less than 1.5 \times \text{IQR} below the lower quartile.

Determine, giving a reason, whether there are any outliers.

3a
2 marks

A hotel manager recorded the number of towels that went missing at the end of each day for 12 days. The results are below.

2, \; 4, \; 1, \; 0, \; 3, \; 4, \; 3.2, \; 9, \; 3, \; 2, \; 4, \; 5

Explain how the data will need to be cleaned before the manager can calculate summary statistics.

3b
Sme Calculator
3 marks

The manager cleans the data as required. For the remaining 11 days,

n = 11, \quad \Sigma x = 37, \quad \Sigma x^{2} = 181

Calculate the mean and the standard deviation for the number of towels missing per day.

3c
Sme Calculator
2 marks

An outlier is defined as any data value lying more than 2 standard deviations away from the mean.

Determine, giving a reason, whether there are any outliers in the cleaned data.

3d
1 mark

State, giving a reason, whether the manager should remove this outlier from the data set.

4a
Sme Calculator
1 mark

Joe counts the number of different species of bird visiting his garden each day for a week. The results are given below.

7, \; 8, \; 5, \; 12, \; 9, \; 7, \; 3

Calculate the mean number of different species of bird visiting Joe's garden.

4b
Sme Calculator
2 marks

Joe continues to record the number of different species of bird visiting his garden each day for the rest of the month and calculates the mean number of different species is 9.25 for the remaining 24 days.

Joe claims that, using the data from the whole month, the mean number of species seen per day is exactly 9.

State, with clear working, whether Joe is correct.

4c
2 marks

Joe notices that one of the recorded values is 8.8.

Explain why this is an error and state what Joe must do with this data value.

5a
3 marks

The cumulative frequency diagram below shows the length of 100 phone calls, in minutes, made to a computer help centre for one morning.

Cumulative frequency diagram of morning call times, cumulative frequency from 0 to 100 against call time from 0 to 25 minutes

(i) Use the cumulative frequency graph to estimate the 10th and 90th percentiles.

(ii) Find the 10th to 90th interpercentile range.

5b
2 marks

In the afternoon, on the same day, the length of another 100 phone calls to the computer help centre were recorded. The median length of these calls was 15 minutes and the 10th to 90th interpercentile range was 18 minutes.

Compare the distributions of the call times in the morning and the afternoon.

6a
Sme Calculator
3 marks

Two geologists are measuring the size of rocks found on a beach in front of a cliff.

The geologists record the greatest length, in millimetres, of each rock they find at distances of 5 \text{ m} and 25 \text{ m} from the base of the cliff. They randomly choose 20 rocks at each distance.

The mean and standard deviation for the rocks at 25 \text{ m} from the base of the cliff are \bar{x} = 111 \text{ mm} and \sigma = 120 \text{ mm} (both to 3 s.f.).

For the rocks at 5 \text{ m} from the base of the cliff, the summary statistics are

n = 20, \quad \Sigma x = 3885, \quad S_{xx} = 369513.75

Find the mean and standard deviation for the size of rocks at 5 \text{ m} from the base of the cliff.

6b
2 marks

Compare the size of the rocks at 5 \text{ m} and 25 \text{ m} from the base of the cliff.

6c
Sme Calculator
2 marks

An outlier is defined as any data value that lies outside one standard deviation of the mean, that is outside \bar{x} \pm \sigma.

Calculate the lower outlier boundary for the rocks at 25 \text{ m} from the base of the cliff.

Hence, explain why there cannot be any outliers at 25 \text{ m} which are smaller than the mean.

7a
1 mark

The incomplete box plot below shows data from the large data set regarding cloud cover between May and October 2015 in Cambourne.  Cloud cover is measured in Oktas on a scale from 0 (no cloud cover) to 8 (full cloud cover).

Grid chart showing cloud cover in oktas with a highlighted rectangle spanning values 5 to 7. Horizontal axis labelled "Cloud cover (Oktas)", marked 0 to 8.

Find the interquartile range.

7b
3 marks

An outlier is defined as any data value that falls either more than 1.5 cross times (interquartile range) above the upper quartile or less than 1.5 cross times (interquartile range) below the lower quartile.

(i) Find the boundaries (fences) at which outliers are defined.

(ii) Explain why, using your knowledge of how cloud cover is measured in the large data set, there cannot be any high valued outliers.

7c
2 marks

Complete the box plot given that, where appropriate, the maximum and minimum values should be located at the boundaries (fences) at which outliers are defined. (You are not required to mark any outliers on the box plot.)

1a
1 mark

Taruni is studying the time it takes members of her company to travel to the office.

Taruni decided to ask every member of the company the time, x minutes, it takes them to travel to the office.

Taruni’s results are summarised by the box plot and summary statistics below.

Box plot showing journey times from 20 to 90 minutes, with outliers at 118 and 124 minutes. The median is 40 minutes. The quartiles are at 26 and 58.

n equals 95 space space space space space sum x equals 4133 space space space space space sum x squared equals 202 294

Write down the interquartile range for these data.

1b
Sme Calculator
3 marks

Calculate the mean and the standard deviation for these data.

1c
2 marks

State, giving a reason, whether you would recommend using the mean and standard deviation or the median and interquartile range to describe these data.

2a
4 marks

The cumulative frequency diagram below shows completion times for 100 competitors at the 2019 Rubik’s cube championships.  The quickest completion time was 9.8 seconds and the slowest time was 52.4 seconds.

Cumulative frequency graph of task completion time (seconds), S-shaped curve rising from 0 to about 100, steepest between 25 and 35 seconds.

The grid below shows a box plot of the 2020 championship data.  Draw a box plot on the grid to represent the 2019 championship data.

Box-and-whisker plot for 2020 completion times: minimum ~5s, lower quartile 20s, median 22s, upper quartile 30s, maximum 37s, on a 0–60s scale.
2b
3 marks

(i) Compare the distributions of the completion times for the 2019 and 2020 championships.

(ii) Given that the 2020 championships happened after the global pandemic, during which many competitors spent months at home, interpret your findings from part (b)(i).

3a
Sme Calculator
3 marks

Students at two karate schools, Miyagi Dojo and Cobra Kicks, measured the force, in newtons, with which they could perform a particular style of hit.

The mean and standard deviation for the students at Cobra Kicks are x with bar on top equals 1740 text  N end text and sigma equals 251 text  N end text (both to 4 significant figures).

For the students at Miyagi Dojo, the summary statistics are

n = 12, \quad \Sigma x = 21873, \quad \Sigma x^{2} = 41532545

Calculate the mean and standard deviation for the force with which the students at Miyagi Dojo can hit.

3b
2 marks

Compare the distributions of hitting force for the two karate schools.

4a
5 marks

The heights, in metres, of a flock of 20 flamingos are recorded and shown below:

0.4

0.9

1.0

1.0

1.2

1.2

1.2

1.2

1.2

1.2

1.3

1.3

1.3

1.4

1.4

1.4

1.4

1.5

1.5

1.6

An outlier is an observation that falls either more than 1.5 \times \text{IQR} above the upper quartile or less than 1.5 \times \text{IQR} below the lower quartile.

(i) Find the median.

(ii) Find the interquartile range.

(iii) Determine, giving a reason, whether there are any outliers in the data.

4b
3 marks

Using your answers to part (a), draw a box plot for the data.

Blank rectangular grid with small grey squares above a horizontal axis arrow pointing right, ready for plotting data or drawing a graph.
5a
Sme Calculator
3 marks

The number of packages processed daily at a sorting office over a 14-day period are given below:

237, \; 264, \; 308, \; 313, \; 319, \; 352, \; 378

378, \; 405, \; 421, \; 428, \; 450, \; 465, \; 583

Given that \Sigma x = 5301 and \Sigma x^{2} = 2113195, calculate the mean and standard deviation for the number of daily packages processed.

5b
Sme Calculator
2 marks

An outlier is defined as any data value lying more than 2 standard deviations away from the mean.

Determine, giving a reason, whether there are any outliers in the data.

5c
Sme Calculator
3 marks

By removing the outlier identified in part (b), clean the data and recalculate the mean and standard deviation.

6a
3 marks

The cumulative frequency diagram below shows the distribution of income of 120 managers across a supermarket chain.

Cumulative frequency diagram of manager incomes from 0 to 220 thousand pounds, cumulative frequency from 0 to 120

The income of a sample of 120 other employees across the supermarket chain are recorded in the table below.

Income I (£1000)

Frequency

0 \leq I < 20

34

20 \leq I < 40

28

40 \leq I < 60

27

60 \leq I < 80

17

80 \leq I < 100

10

100 \leq I < 120

4

On the grid above, draw a cumulative frequency graph to show the data for the other employees.

6b
2 marks

Compare the income of the managers and the other employees.

7a
Sme Calculator
2 marks

Summary statistics from the large data set for the daily mean windspeed (knots) measured in Heathrow throughout October 1987 and October 2015 are given in the table below.

 

Min

Max

Median

straight capital sigma x

straight capital sigma x squared

1987

2

16

5

185

1401

2015

3

10

6

197

1357

Calculate the mean of the daily mean windspeeds for each of the two years.

7b
Sme Calculator
4 marks

The standard deviation for 2015 was 1.84.

Calculate the standard deviation for 1987 and compare the daily mean windspeeds for each of the two years.

1a
Sme Calculator
1 mark

Each member of a group of 27 people was timed when completing a puzzle.

The time taken, x minutes, for each member of the group was recorded.

For these 27 people sum for blank of x equals 607.5 and sum for blank of x squared equals 17 623.25

Calculate the mean time taken to complete the puzzle.

1b
Sme Calculator
2 marks

Calculate the standard deviation of the times taken to complete the puzzle.

1c
1 mark

The times are summarised in the following box and whisker plot.

Box plot showing data distribution in minutes from 0 to 70. 'Box' goes from 14 to 25 with a line at 20. 'Whiskers' extend to 7 and 40. 'x' marks at 46 and 68.

Taruni defines an outlier as a value more than 3 standard deviations above the mean.

State how many outliers Taruni would say there are in these data, giving a reason for your answer.

1d
3 marks

Adam and Beth also completed the puzzle in a minutes and b minutes respectively, where a greater than b.

When their times are included with the data of the other 27 people

  • the median time increases

  • the mean time does not change

Suggest a possible value for a and a possible value for b, explaining how your values satisfy the above conditions.

1e
1 mark

Without carrying out any further calculations, explain why the standard deviation of all 29 times will be lower than your answer to part (b).

2a
Sme Calculator
4 marks

As part of an experiment, 15 maths teachers are asked to solve a puzzle and their times, in minutes, are recorded:

8

12

19

20

20

21

22

23

23

23

25

26

27

37

39

An outlier is an observation which lies more than \pm 2 standard deviations away from the mean.

Show that there is exactly one outlier.

2b
2 marks

State, with a reason, whether the mean or the median would be the most suitable measure of central tendency for these data.

2c
2 marks

15 history teachers also completed the riddle; their times are shown below in the box plot:

Box-and-whisker plot of time in minutes, with an outlier near 8, whiskers from about 12 to 50, and a box from about 30 to 42 showing the median

Explain what the cross (×) represents on the box plot above. Interpret this in context.

2d
2 marks

Compare the distributions of the times taken to complete the puzzle by the two sets of teachers.

3a
3 marks

Hugo, a newly appointed HR administrator for a company, has been asked to investigate the number of absences within the IT department.  The department contains 23 employees, and the box plot below summarises the data for the number of days that individual employees were absent during the previous quarter.

Box-and-whisker plot of number of absences in days, from 0 to 30, showing quartiles and spread of data along a horizontal axis.

An outlier is an observation that falls either more than 1.5 (interquartile range) above the upper quartile or less than 1.5 (interquartile range) below the lower quartile.

Show that these data have an outlier, and state its value.

3b
Sme Calculator
4 marks

For the 23 employees within the department, Hugo has the summary statistics:

 straight capital sigma x equals 286 and  straight capital sigma x squared equals 4328x2

Hugo investigates the employee corresponding to the outlier value found in part (a) and discovers that this employee had a long-term illness.  Hugo decides not to include that value in the data for the department.

Assuming that there are no other outliers, calculate the mean and standard deviation of the number of days absent for the remaining employees.

4a
Sme Calculator
4 marks

Sam, a zoologist, is a member of a group researching the masses of gentoo penguins.  The research group takes a sample of 100 male and 100 female penguins and records their masses.

An outlier is an observation that falls either more than 1.5 cross times(interquartile range) above the upper quartile or less than 1.5 cross times  (interquartile range) below the lower quartile.

Given that values are outliers if they are less than 4.2kg or more than 8.5kg, calculate the upper and lower quartiles for the mass of the 200 gentoo penguins.

4b
2 marks

Casey is another member of Sam's research group. She believes that the masses of male and female gentoo penguins follow different distributions. The cumulative frequency graphs below show the masses of the male and female gentoo penguins in the sample.

Cumulative frequency graphs showing mass distributions for male and female gentoo penguins, mass from 4 to 10 kg, cumulative frequency from 0 to 100

Use the graphs to compare the distributions of the masses of male and female gentoo penguins.

5a
Sme Calculator
4 marks

Ms Chew is an accountant who is examining the length of time it takes her to complete jobs for her clients.  Ms Chew looks at her spreadsheet and lists the number of hours it took her to complete her last 12 jobs:

9

2

-

6

5

2

-

6

21

5

4

8

‘-’ represents a job for which the length of time taken was not recorded.

An outlier is an observation which lies more than  ±2  standard deviations away from the mean.

By first cleaning the data, show that 21 is the only outlier.

5b
3 marks

Ms Chew looks at her handwritten records and finds that the value 21 was typed into the spreadsheet incorrectly.  It should have been 12.

Without further calculations, explain the effect this would have on the:

(i) mean

(ii) standard deviation

(iii) median.

6a
Sme Calculator
3 marks

David and Bowey are planning a trip in June to Beijing, Jacksonville or Perth.  The temperature of the city and the atmospheric pressure will be deciding factors, so they investigate these three cities using all of the data for June 2015 from the large data set.

Using all of the days in June 2015, the following summary statistics for the daily mean air temperatures (t ℃) and the daily mean pressure (p hPa) are calculated:

 

Daily mean air temperature

Daily Mean Pressure

 

t with bar on top

sigma subscript t

p with bar on top

sigma subscript p

Beijing

a

b

1004

3.81

Jacksonville

26.4

1.80

1017

1.88

Perth

14.8

2.37

1021

5.63

David also has the following information for Beijing in June:

straight capital sigma t space equals space 741.8  and  straight capital sigma t squared space equals space 18513.2

Calculate the values of  a and  b.

6b
Sme Calculator
3 marks

David suffers from headaches when the atmospheric pressure changes quickly so he would like to choose a city where the pressure does not vary a lot. Additionally, Bowey does not like it when the temperature is higher than 30 °C.

It is known that all the temperatures for Beijing in June 2015 were within 2 standard deviations of the mean, whereas in Jacksonville there were temperatures that were higher than the mean by more than 2 standard deviations.

(i) Use the data to explain why Bowey would not be happy visiting Jacksonville.

(ii) Hence, suggest with reasons which other city both David and Bowey would be happy to visit.

1a
Sme Calculator
2 marks

Marya is consistently late for work. David, Marya’s boss, records the number of minutes that she is late during the next six days. David calculates the mean is 18 minutes and the variance is 210 minutes². On one of the six days, Marya was 50 minutes late.

Show that 50 is an outlier, using the definition that outliers are more than 2 standard deviations away from the mean.

1b
2 marks

(i) Give a reason why the value of 50 should be excluded from the data set.

(ii) Give a reason why the value of 50 should be included in the data set.

1c
Sme Calculator
5 marks

Marya tells David that she was 50 minutes late that day because her car broke down on the way to work, and she shows him the breakdown receipt as evidence.

David agrees to remove the 50 from the data set. Calculate the new mean and standard deviation for the remaining values.

2a
3 marks

The cumulative frequency graph below shows the information about the lengths of time taken for 80 students to run a lap of the sports hall.

Cumulative frequency graph of times in seconds, rising from 0 at 20s to about 80 at 100s, showing an S-shaped increasing curve over a grid.

Complete the table below:

Time (t seconds)

20 less than t less or equal than 40

40 less than t less or equal than 60

60 less than t less or equal than 80

80 less than t less or equal than 100

Frequency

8

 

 

 

2b
Sme Calculator
3 marks

Hence estimate the mean and the standard deviation of the times.

2c
Sme Calculator
3 marks

Given that the fastest time was 21 seconds and the slowest time was 100 seconds, show that these values are outliers using the definition that an outlier is more than 2 standard deviations away from the mean.

3a
Sme Calculator
3 marks

Tim has just moved to a new town and is trying to choose a doctor’s surgery to join, HealthHut or FitFirst. He wants to register with the one where patients get seen faster.
He takes of sample of 150 patients from HealthHut and calculates the range of waiting times as 45 minutes and the variance as 121 minutes².

An outlier is defined as a value which is more than 2 standard deviations away from the mean.

Prove that the sample contains an outlier.

3b
2 marks

Tim finds out that the outlier is a valid piece of data and decides to keep the value in his sample.

Which pair of statistical measures would be more appropriate to use when using the sample to compare the doctor’s surgeries: the mean and standard deviation or the median and interquartile range? Give a reason for your answer.

3c
Sme Calculator
1 mark

The box plots below show the waiting times for the two surgeries.

Box plots comparing HealthHut and FitFirst waiting times in minutes, with FitFirst showing a longer upper whisker and an outlier around 45 minutes

Given that there is only one outlier for HealthHut, label it on the box plot with a cross (×).

3d
2 marks

Compare the two distributions of waiting times.

4a
2 marks

Ororo, a meteorologist, is investigating the great storm of 1987 which largely affected the south of England. Ororo would like to compare the daily maximum gust in Hurn during the months of October 1987 and October 2015.

Using your knowledge of the large data set

(i) suggest one other city from the large data set that Ororo could use to investigate the great storm of 1987

(ii) state the units that are used in large data set to measure the daily maximum gust.

4b
Sme Calculator
5 marks

Ororo calculates the following summary statistics for the daily maximum gust in Hurn using the available data for October 1987.

 

Number of available days

Maximum value

straight capital sigma x

S subscript x x end subscript

1987

25

61

665

3462

An outlier is defined as a value which is more than 2 standard deviations away from the mean.

(i) Show that the maximum value in 1987 is an outlier.

(ii) Give a reason why Ororo should include the outlier when comparing the data from the two years.

4c
2 marks

Ororo calculates the following statistics for the daily maximum gust in Hurn for October 2015:

  • Mean is 18.9 knots

  • Standard deviation is 4.45 knows

Compare the daily maximum gust in Hurn for October 1987 and October 2015.