Large Data Set (Edexcel A Level Maths: Statistics): Exam Questions

1 hour10 questions
1a1 mark

Jiang is studying the variable Daily Mean Pressure from the large data set.

He drew the following box and whisker plot for these data for one of the months for one location using a linear scale but

  • he failed to label all the values on the scale

  • he gave an incorrect value for the median

Box plot showing daily mean pressure in hPa with a central box, whiskers, and an arrowed axis labelled Daily Mean Pressure (hPa). The centre line of the 'box' is directly above 1200 on the horizontal axis.

Using your knowledge of the large data set, suggest a suitable value for the median.

(You are not expected to have memorised values from the large data set. The question is simply looking for sensible answers.)

1b1 mark

Using your knowledge of the large data set, suggest a suitable value for the range.

(You are not expected to have memorised values from the large data set. The question is simply looking for sensible answers.)

Did this page help you?

2a1 mark

Fred and Nadine are investigating whether there is a linear relationship between Daily Mean Pressure, p hPa, and Daily Mean Air Temperature, t °C, in Beijing using the 2015 data from the large data set.

Fred randomly selects one month from the data set and draws the scatter diagram in Figure 1 using the data from that month.

The scale has been left off the horizontal axis.

Scatter plot showing daily mean air temperature (°C) against daily mean pressure (hPa), with points clustered between 20-30°C and overlapping pressure values.
Figure 1

Describe the correlation shown in Figure 1.

2b1 mark

Nadine chooses to use all of the data for Beijing from 2015 and draws the scatter diagram in Figure 2.

She uses the same scales as Fred.

Scatter plot showing the relationship between daily mean air temperature (°C) and daily mean pressure (hPa), displaying a negative correlation.
Figure 2

Explain, in context, what Nadine can infer about the relationship between p and t using the information shown in Figure 2.

2c1 mark

Using your knowledge of the large data set, state a value of p for which interpolation can be used with Figure 2 to predict a value of t.

2d1 mark

Using your knowledge of the large data set, explain why it is not meaningful to look for a linear relationship between Daily Mean Wind Speed (Beaufort Conversion) and Daily Mean Air Temperature in Beijing in 2015.

Did this page help you?

1a2 marks

Helen believes that the random variable C, representing cloud cover from the large data set, can be modelled by a discrete uniform distribution.

Write down the probability distribution for C.

1b1 mark

Using this model, find the probability that cloud cover is less than 50%.

1c1 mark

Helen used all the data from the large data set for Hurn in 2015 and found that the proportion of days with cloud cover of less than 50% was 0.315

Comment on the suitability of Helen’s model in the light of this information.

1d1 mark

Suggest an appropriate refinement to Helen’s model.

Did this page help you?

2a
Sme Calculator
1 mark

Magali is studying the mean total cloud cover, in oktas, for Leuchars in 1987 using data from the large data set. The daily mean total cloud cover for all 184 days from the large data set is summarised in the table below.

Daily mean total cloud cover (oktas)

0

1

2

3

4

5

6

7

8

Frequency (number of days)

0

1

4

7

10

30

52

52

28

One of the 184 days is selected at random.

Find the probability that it has a daily mean total cloud cover of 6 or greater.

2b
Sme Calculator
4 marks

Magali is investigating whether the daily mean total cloud cover can be modelled using a binomial distribution.

She uses the random variable X to denote the daily mean total cloud cover and believes that X tilde straight B left parenthesis 8 comma space 0.76 right parenthesis.

Using Magali’s model,

(i)  find straight P open parentheses X greater or equal than 6 close parentheses

(ii)  find, to 1 decimal place, the expected number of days in a sample of 184 days with a daily mean total cloud cover of 7.

2c1 mark

Explain whether or not your answers to part (b) support the use of Magali’s model.

2d
Sme Calculator
1 mark

There were 28 days that had a daily mean total cloud cover of 8.

For these 28 days, the daily mean total cloud cover for the following day is shown in the table below.

Dailymean total clou cover (oktas)

0

1

2

3

4

5

6

7

8

Frequency (number of days)

0

0

1

1

2

1

5

9

9

Find the proportion of these days when the daily mean total cloud cover was 6 or greater.

2e2 marks

Comment on Magali’s model in light of your answer to part (d).

Did this page help you?

3a2 marks

Ben is studying the Daily Total Rainfall, x mm, in Leeming for 1987.

He used all the data from the large data set and summarised the information in the following table.

x

0

0.1-0.5

0.6-1.0

1.1-1.9

2.0-4.0

4.1-6.9

7.0-12.0

12.1-20.9

21.0-32.0

tr

Frequency

55

18

18

21

17

9

9

6

2

29

Explain how the data will need to be cleaned before Ben can start to calculate statistics such as the mean and standard deviation.

3b
Sme Calculator
3 marks

Using all 184 of these values, Ben estimates sum x equals 390 and sum x squared equals 4336

Calculate estimates for

(i) the mean Daily Total Rainfall,

(ii) the standard deviation of the Daily Total Rainfall.

3c2 marks

Ben suggests using the statistic calculated in part (b)(i) to estimate the annual mean Daily Total Rainfall in Leeming for 1987.

Using your knowledge of the large data set,

(i) give a reason why these data would not be suitable,

(ii) state, giving a reason, how you would expect the estimate in part (b)(i) to differ from the actual annual mean Daily Total Rainfall in Leeming for 1987.

Did this page help you?

1a1 mark

Stav is studying the large data set for September 2015.

He codes the variable Daily Mean Pressure, x, using the formula y equals x minus 1010.

The data for all 30 days from Hurn are summarised by

sum y equals 214 space space space sum y squared equals 5912

State the units of the variable x.

1b
Sme Calculator
2 marks

Find the mean Daily Mean Pressure for these 30 days.

1c
Sme Calculator
3 marks

Find the standard deviation of Daily Mean Pressure for these 30 days.

1d2 marks

Stav knows that, in the UK, winds circulate

  • in a clockwise direction around a region of high pressure

  • in an anticlockwise direction around a region of low pressure

The table gives the Daily Mean Pressure for 3 locations from the large data set on 26/09/2015

Location

Heathrow

Hurn

Leuchars

Daily Mean Pressure

1029

1028

1028

Cardinal Wind Direction

The Cardinal Wind Directions for these 3 locations on 26/09/2015 were, in random order,

W     NE     E

You may assume that these 3 locations were under a single region of pressure.

Using your knowledge of the large data set, place each of these Cardinal Wind Directions in the correct location in the table.

Give a reason for your answer.

Did this page help you?

2a1 mark

Dian uses the large data set to investigate the Daily Total Rainfall, r mm, for Camborne.

Write down how a value of 0 less than r less or equal than 0.05 is recorded in the large data set.

2b
Sme Calculator
3 marks

Dian uses the data for the 31 days of August 2015 for Camborne and calculates the following statistics

n equals 31 space space space space space space space space space space space space space sum r equals 174.9 space space space space space space space space space space space space space sum r squared equals 3523.283

Use these statistics to calculate

(i) the mean of the Daily Total Rainfall in Camborne for August 2015,

(ii) the standard deviation of the Daily Total Rainfall in Camborne for August 2015.

2c2 marks

Dian believes that the mean Daily Total Rainfall in August is less in the South of the UK than in the North of the UK.

The mean Daily Total Rainfall in Leuchars for August 2015 is 1.72 mm to 2 decimal places.

State, giving a reason, whether this provides evidence to support Dian's belief.

2d1 mark

Dian uses the large data set to estimate the proportion of days with no rain in Camborne for 1987 to be 0.27 to 2 decimal places.

Explain why the distribution straight B left parenthesis 14 comma space 0.27 right parenthesis might not be a reasonable model for the number of days without rain for a 14‐day summer event.

Did this page help you?

3a1 mark

Helen is studying one of the qualitative variables from the large data set for Heathrow from 2015.

She started with the data from 3rd May and then took every 10th reading.

There were only 3 different outcomes with the following frequencies

Outcome

A

B

C

Frequency

16

2

1

State the sampling technique Helen used.

3b2 marks

From your knowledge of the large data set

(i) suggest which variable was being studied,

(ii) state the name of outcome A.

3c1 mark

George is also studying the same variable from the large data set for Heathrow from 2015.

He started with the data from 5th May and then took every 10th reading and obtained the following

Outcome

A

B

C

Frequency

16

1

1

Helen and George decided they should examine all of the data for this variable for Heathrow from 2015 and obtained the following

Outcome

A

B

C

Frequency

155

26

3

State what inference Helen and George could reliably make from their original samples about the outcomes of this variable at Heathrow, for the period covered by the large data set in 2015.

Did this page help you?

4a1 mark

A random sample of 15 days is taken from the large data set for Perth in June and July 1987.

The scatter diagram in Figure 1 displays the values of two of the variables for these 15 days.

Scatter plot with points scattered across a grid, having x-axis ranging from 0 to 20 and y-axis marked from 0 upwards, displaying a downward trend.
Figure 1

Describe the correlation.

4b2 marks

The variable on the x-axis is Daily Mean Temperature measured in °C.

Using your knowledge of the large data set,

(i) suggest which variable is on the y-axis,

(ii) state the units that are used in the large data set for this variable.

4c3 marks

Stav believes that there is a correlation between Daily Total Sunshine and Daily Maximum Relative Humidity at Heathrow.

He calculates the product moment correlation coefficient between these two variables for a random sample of 30 days and obtains r equals negative 0.377.

Carry out a suitable test to investigate Stav’s belief at a 5% level of significance.

State clearly

  • your hypotheses

  • your critical value

4d1 mark

On a random day at Heathrow the Daily Maximum Relative Humidity was 97%.

Comment on the number of hours of sunshine you would expect on that day, giving a reason for your answer.

Did this page help you?

5a
Sme Calculator
4 marks
Partially completed box plot on a grid with horizontal axis numbered from 7 to 33. A rectangle is drawn going from 19.4 to 26.6, with a vertical line at 23.6 separating it into two parts.
Figure 1

The partially completed box plot in Figure 1 shows the distribution of daily mean air temperatures using the data from the large data set for Beijing in 2015.

An outlier is defined as a value

  • more than 1.5 cross times IQR below Q subscript 1 or

  • more than 1.5 cross times IQR above Q subscript 3

The three lowest air temperatures in the data set are 7.6 °C, 8.1 °C and 9.1 °C.

The highest air temperature in the data set is 32.5 °C.

Complete the box plot in Figure 1 showing clearly any outliers.

5b1 mark

Using your knowledge of the large data set, suggest from which month the two outliers are likely to have come.

5c
Sme Calculator
1 mark

Using the data from the large data set, Simon produced the following summary statistics for the daily mean air temperature, x°C, for Beijing in 2015

n equals 184 space space space space space space space space space space space sum for blank of x equals 4153.6 space space space space space space space space space space space straight S subscript x x end subscript equals 4952.906

Show that, to 3 significant figures, the standard deviation is 5.19 °C.

5d
Sme Calculator
3 marks

Simon decides to model the air temperatures with the random variable

T tilde straight N left parenthesis 22.6 comma space 5.19 squared right parenthesis

Using Simon’s model, calculate the 10th to 90th interpercentile range.

5e2 marks

Simon wants to model another variable from the large data set for Beijing using a normal distribution.

State two variables from the large data set for Beijing that are not suitable to be modelled by a normal distribution. Give a reason for each answer.

Did this page help you?