Further Correlation & Regression (Edexcel A Level Maths: Statistics): Exam Questions

Exam code: 9MA0

4 hours32 questions
1
3 marks

Write suitable null and alternative hypotheses for each of the following situations.

(i) A recording studio is interested in whether the increasing age of a band's lead singer decreases the number of records the band will sell.

(ii) A researcher for an online gaming company believes that the higher the number of free revivals available in a game, the more time people will spend playing the game.

(iii) A beach umbrella manufacturer is carrying out a test to see if there is any correlation between temperature and the number of beach umbrellas sold.

2
3 marks

The table below gives the critical values, for different significance levels, of the product moment correlation coefficient, r, for a sample of size 30.

One tail

10%

5%

2.5%

1%

0.5%

One tail

Two tail

20%

10%

5%

2%

1%

Two tail

 

0.2407

0.3061

0.3610

0.4226

0.4692

 

For each set of hypotheses below, use the table above to determine the critical region for a hypothesis test at the 10% level of significance for a sample of size 30.

(i) \text{H}_{0} : \rho = 0, \text{H}_{1} : \rho > 0

(ii) \text{H}_{0} : \rho = 0, \text{H}_{1} : \rho \neq 0

(iii) \text{H}_{0} : \rho = 0, \text{H}_{1} : \rho < 0

3a
3 marks

It is claimed that there is negative correlation between two variables x and y. A hypothesis test is carried out to test this claim using the null hypothesis \text{H}_{0} : \rho = 0.

(i) Describe what the null hypothesis \rho = 0 means about the relationship between x and y.

(ii) Describe what negative correlation would suggest about the relationship between x and y.

(iii) State a suitable alternative hypothesis \text{H}_{1} to test for negative correlation.

3b
2 marks

The critical value for this hypothesis test is -0.3674.

(i) Explain what is meant by a critical value in the context of hypothesis testing.

(ii) Write down the critical region for this hypothesis test.

3c
1 mark

The product moment correlation coefficient is calculated from the sample to be r = -0.3175.

Explain the difference between the statistic r and the parameter \rho.

3d
1 mark

By comparing the test statistic with the critical value, conclude the hypothesis test.

4a
2 marks

Pim collects data on the amount of time she can hold plank each morning, t minutes, and the amount of sleep, s hours, she got the night before.

Amount of sleep, s hours

6.21

8.15

7.52

7.19

6.18

5.28

9.03

6.01

7.55

8.39

Tim holding plank, t mins

0.92

1.13

1.07

x

0.99

0.96

1.12

0.98

1.20

1.09

The product moment correlation coefficient for these data is r = 0.7536.

(i) Describe the correlation between s and t.

(ii) State, with a reason, whether a line of best fit drawn through the data should have a positive or a negative gradient.

4b
Sme Calculator
3 marks

Pim calculates the equation of the regression line of t on s to be

t = 0.08s + 0.45

(i) Using the regression line, estimate the value of x in the table above,

(ii) Give an interpretation of the value 0.45 in the equation of the regression line.

(iii) Give an interpretation of the value 0.08 in the equation of the regression line.

4c
1 mark

Pim says that if she sleeps for 13 hours she will be able to hold plank for roughly 1.5 minutes.

Give a reason why Pim's claim could be incorrect.

4d
1 mark

One morning Pim can hold plank for one minute. Explain why the regression line should not be used to predict how long Pim slept the night before.

5a
Sme Calculator
1 mark

Andy, a preschool teacher, is exploring whether a new 'Mindfulness for Toddlers' course is helping children to learn more quickly. He records the time, m minutes, each of nine toddlers spent meditating and the time, p minutes, it took them to solve a puzzle afterwards.

m

5

4

2

10

3

5

1

2

4

p

2.8

3.6

4.5

1.8

5.1

2.8

7.0

8.0

2.5

Andy suspects an exponential relationship between the times and codes the data using X = m and Y = \ln p.

Complete the table below for X and Y, giving each value of Y to two decimal places.

X

5

4

2

10

3

5

1

2

4

Y

1.03

1.28

1.50

0.59

1.63

1.03

5b
1 mark

Andy calculates the product moment correlation coefficient between m and p to be r_1 = -0.772, and between X and Y to be r_2 = -0.862.

State, giving a reason, whether there is stronger correlation between m and p, or between X and Y.

5c
Sme Calculator
2 marks

Andy calculates the equation of the regression line of Y on X to be

Y = 1.98 - 0.162X

A new student joins the class and spends 4 minutes meditating.

(i) Use the regression line to estimate the value of Y.

(ii) Hence estimate how long it takes the student to solve the puzzle.

6a
2 marks

It is believed that the relationship between two variables, x and y, can be modelled by y = bp^{x}.

By taking logarithms of both sides and using the laws of logarithms, show that y = bp^{x} can be written as

log subscript 10 y equals log subscript 10 b plus x log subscript 10 p

6b
Sme Calculator
3 marks

The scatter diagram below shows the relationship between x and \log y. The regression line of \log y on x passes through the points (3,\ 0.7) and (7,\ 1.3).

Graph showing points scattered around a line with log y-axis and x-axis. Points (3, 0.7) and (7, 1.3) are labelled on the line.

(i) Using the given coordinates, find the gradient of the regression line.

(ii) Find the equation of the regression line of log subscript 10 y on x in the form log subscript 10 y equals a plus m x, where a and m are constants to be found.

6c
Sme Calculator
4 marks

(i) By comparing the equation in part (a) with the equation in part (b)(ii), show that b = 1.778 to three decimal places.

(ii) Find the value of p to three decimal places.

7a
2 marks

The graph below shows the heights, h metres, and the time spent sleeping, t hours, of a group of young giraffes. It is believed the data can be modelled using t = kh^{n}.

q8-easy-2-5-further-correlation-and-regression-edexcel-a-level-maths-statistics

By taking logarithms of both sides, show that t = kh^{n} can be written as

log subscript 10 t equals log subscript 10 k plus n log subscript 10 h

7b
Sme Calculator
4 marks

The data are coded using the substitutions x equals log subscript 10 h and y equals log subscript 10 t. The regression line of y on x is found to be

y = 0.3 - 1.2x

(i) Find the values of x and y for a giraffe that is 2.1 metres tall and sleeps for 4.3 hours per day, giving your answers to four decimal places.

(ii) Using the regression line, show that a giraffe of height 3.2 metres would be expected to sleep for approximately half an hour per day.

(iii) State an assumption made in order to use the regression line in part (ii).

7c
Sme Calculator
3 marks

By substituting x equals log subscript 10 h and y equals log subscript 10 t into the equation of the regression line, and using the result from part (a), show that the relationship between height and sleeping time can be modelled by

t = 1.995\,h^{-1.2}

8a
Sme Calculator
2 marks

The table below shows the daily total sunshine, x hours, and the daily mean total cloud cover, y oktas, for the first 10 days in May 2015 at Heathrow, taken from the large data set.

x

a

0.7

3.3

6.9

4.7

5.4

5.5

0.1

5.7

7.5

y

5

6

7

5

6

6

5

7

4

4

(i) Explain what is meant by 5 oktas of cloud cover.

(ii) Show that a = 4.4, given that there were 4 hours and 24 minutes of sunshine recorded at Heathrow on the first day of May 2015.

8b
Sme Calculator
1 mark

Calculate the product moment correlation coefficient, r, between x and y.

8c
4 marks

(i) State suitable null and alternative hypotheses to test whether there is evidence of negative correlation between daily total sunshine and daily mean cloud cover.

(ii) Using the table of critical values for correlation coefficients in your formula booklet, find the critical value for this test at the 0.5% level of significance.

(iii) Test, at the 0.5% level of significance, whether there is evidence of a negative correlation between daily total sunshine and daily mean cloud cover at Heathrow in May 2015.

1a
3 marks

Stav believes that there is a correlation between Daily Total Sunshine and Daily Maximum Relative Humidity at Heathrow.

He calculates the product moment correlation coefficient between these two variables for a random sample of 30 days and obtains r equals negative 0.377.

Carry out a suitable test to investigate Stav’s belief at a 5% level of significance.

State clearly

  • your hypotheses

  • your critical value

1b
1 mark

On a random day at Heathrow the Daily Maximum Relative Humidity was 97%.

Comment on the number of hours of sunshine you would expect on that day, giving a reason for your answer.

2a
1 mark

Marc took a random sample of 16 students from a school and for each student recorded

  • the number of letters, x, in their last name

  • the number of letters, y, in their first name

His results are shown in the scatter diagram.

Scatter plot on a grid with points at various coordinates, displaying data distribution patterns; x-axis ranges 0-12, y-axis 0-10.

Describe the correlation between x and y.

2b
1 mark

Marc suggests that parents with long last names tend to give their children shorter first names.

Using the scatter diagram comment on Marc’s suggestion, giving a reason for your answer.

2c
Sme Calculator
1 mark

The results from Marc’s random sample of 16 observations are given in the table below.

x

3

6

8

7

5

3

11

3

4

5

4

9

7

10

6

6

y

7

7

4

4

6

8

5

5

8

4

7

4

5

5

6

3

Use your calculator to find the product moment correlation coefficient between x and y for these data.

2d
3 marks

Test whether or not there is evidence of a negative correlation between the number of letters in the last name and the number of letters in the first name.

You should

  • state your hypotheses clearly

  • use a 5% level of significance

3a
3 marks

Tessa owns a small clothes shop in a seaside town. She records the weekly sales figures, £w, and the average weekly temperature, t °C, for 8 weeks during the summer.

The product moment correlation coefficient for these data is −0.915

Stating your hypotheses clearly and using a 5% level of significance, test whether or not the correlation between sales figures and average weekly temperature is negative.

3b
1 mark

Suggest a possible reason for this correlation.

3c
1 mark

Tessa suggests that a linear regression model could be used to model these data.

State, giving a reason, whether or not the correlation coefficient is consistent with Tessa’s suggestion.

3d
1 mark

State, giving a reason, which variable would be the explanatory variable.

3e
1 mark

Tessa calculated the linear regression equation as w equals 10755 – 171 t

Give an interpretation of the gradient of this regression equation.

4a
1 mark

A teacher, Ms Pearman, claims that there is a positive correlation between the number of hours spent studying for a test and the percentage scored on it.

Write down suitable null and alternative hypotheses to test Ms Pearman's claim.

4b
Sme Calculator
2 marks

Ms Pearman takes a random sample of 25 students and gives them a week to prepare for a test. She records the percentage they score in the test, s %, and the amount of revision they did, h hours.

Ms Pearman calculates the product moment correlation coefficient for these data as r = 0.874.

Given that the p-value for the test statistic r = 0.874 is 0.0217, test at the 5% level of significance whether Ms Pearman's claim is justified.

4c
2 marks

Ms Pearman decides to use a linear regression model for these data. She calculates the equation of the regression line of s on h to be s = 21.3 + 5.29h.

(i) Give an interpretation of the value 21.3 in context.

(ii) Give an interpretation of the value 5.29 in context.

5a
3 marks

The following table shows the number of hours spent learning to drive, d, and the number of mistakes made in the driving test, m, of ten college students.

d

48

51

51

57

61

68

70

72

73

75

m

19

21

17

12

8

16

7

4

0

1

The product moment correlation coefficient for these data is r = -0.869. A driving instructor, Dave, believes there is a negative correlation between the number of hours spent learning to drive and the number of mistakes made in the driving test.

(i) Write down suitable null and alternative hypotheses to test Dave's claim.

(ii) Test, at the 1% level of significance, whether Dave's claim is justified, given that the relevant critical value is -0.7155.

5b
1 mark

Dave calculates the equation of the regression line of m on d to be m = 50.7 - 0.642d.

State, giving a reason, whether or not the correlation coefficient is consistent with the use of a linear regression model.

5c
Sme Calculator
2 marks

(i) Explain why the linear regression model could be unreliable for predicting the number of mistakes a student would make on their driving test after learning for 30 hours.

(ii) By considering a student who has spent 80 hours learning to drive, give a limitation to the linear regression model.

6a
1 mark

The table below shows data from the United States regarding annual per capita chicken consumption (in pounds) and the unemployment rate (% of population) between the years 2005 and 2014.

Year

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

Chicken consumption (pounds)

86.4

86.9

85.5

83.8

80.0

82.8

83.3

80.8

82.3

83.8

Unemployment rate (%)

5.08

4.62

4.62

5.78

9.25

9.63

8.95

8.07

7.38

6.17

The product moment correlation coefficient for these data is r = -0.821. The critical values for a 10% two-tailed test are \pm 0.5495.

State what is measured by the product moment correlation coefficient.

6b
2 marks

(i) Write down suitable null and alternative hypotheses for a two-tailed test of the correlation coefficient.

(ii) Show that, at the 10% level of significance, there is evidence that the correlation coefficient is different from zero.

6c
1 mark

A newspaper's headline states:

"Eating chicken is the secret to reducing the unemployment rate in the US!"

Explain whether this headline is fully justified.

7a
1 mark

Jessica is researching whether there is a correlation between the productivity of university students and the number of hours sleep they get per night.

Write suitable null and alternative hypotheses to test for linear correlation.

7b
2 marks

Jessica takes a random sample of 25 students, measures their productivity during the day, and records how many hours sleep they had during the previous night. She calculates the product moment correlation coefficient and finds that r = -0.107.

The table below gives the critical values, for different significance levels, of the product moment correlation coefficient, r, for a sample of size 25.

One tail

10%

5%

2.5%

1%

0.5%

One tail

Two tail

20%

10%

5%

2%

1%

Two tail

0.2653

0.3365

0.3961

0.4622

0.5052

Jessica wishes to test, at the 10% level of significance, whether there is evidence that the correlation coefficient for the population is different from zero.

(i) Find the critical regions for Jessica's test.

(ii) Show that, at the 10% level of significance, there is no evidence of a linear correlation.

7c
1 mark

State, with a reason, whether there could be a relationship between students’ hours of sleep and their productivity.

8a
3 marks

Nicole is a biologist studying the growth of bacteria. She records the number of bacteria on an organism every hour. The table below shows her results for the first eight hours.

Hours (t)

1

2

3

4

5

6

7

8

Number of bacteria (B)

10

50

170

520

1730

5200

17020

58140

Nicole calculates the product moment correlation coefficient as r = 0.735.

Nicole claims that there is a positive linear correlation between the number of hours and the number of bacteria.

Test, at the 1% level of significance, whether Nicole's claim is justified. State your hypotheses clearly.

8b
Sme Calculator
1 mark

Mariam, Nicole's lab assistant, claims that there is an exponential relationship between the two variables. To test this, Mariam calculates the values of \ln(B) for the different values of t.

Complete the table, giving your answers to three decimal places.

t

1

2

3

4

5

6

7

8

\ln(B)

2.303

3.912

5.136

6.254

7.456

8.556

8c
Sme Calculator
2 marks

(i) Calculate the product moment correlation coefficient between t and \ln(B).

(ii) Comment on Mariam's claim that there is an exponential relationship between B and t.

9a
Sme Calculator
1 mark

An estate agent, Terry, claims that there is a correlation between the value of a house, v (£1000), and the distance between that house and the nearest nightclub, d (miles).

Terry has a database containing over 100 houses and he takes a random sample of seven houses to investigate his claim. The results are recorded below.

d

1.8

2.1

2.5

3.7

4.9

5.2

7.2

v

500

560

330

250

260

180

190

Calculate the product moment correlation coefficient for this sample.

9b
3 marks

(i) Write down suitable null and alternative hypotheses for a two-tailed test to investigate Terry's claim.

(ii) Test Terry's claim using a 5% level of significance.

9c
1 mark

Suggest one way in which Terry could improve his investigation.

10a
1 mark

A student studying plants measures the hours of sunshine per day, s, and the growth of a specific plant species, g mm, over a two-week period.

A random sample of 12 days is selected. The product moment correlation coefficient (PMCC) for these data is r equals 0.534.

State, with a reason, whether a linear regression model is appropriate for this data.

10b
3 marks

The student believes there is a positive correlation between hours of sunshine and plant growth.

Stating your hypotheses clearly, test at the 5% significance level whether there is evidence to support the student's belief.

10c
2 marks

The equation of the regression line of g on s is found to be g equals 1.2 plus 3.8 s.

Give an interpretation of:

(i) the value 3.8 in this model,

(ii) the value 1.2 in this model.

10d
1 mark

The student converts the growth data from millimetres to centimetres. State the value of the new product moment correlation coefficient.

10e
1 mark

State one limitation of using the product moment correlation coefficient to test for a relationship between two variables.

1a
3 marks

Barbara is investigating the relationship between average income (GDP per capita), x US dollars, and average annual carbon dioxide (CO₂) emissions, y tonnes, for different countries.

She takes a random sample of 24 countries and finds the product moment correlation coefficient between average annual CO₂ emissions and average income to be 0.446

Stating your hypotheses clearly, test, at the 5% level of significance, whether or not the product moment correlation coefficient for all countries is greater than zero.

1b
1 mark

Barbara believes that a non-linear model would be a better fit to the data.
She codes the data using the coding m equals log subscript 10 x and c equals log subscript 10 y and obtains the model c equals negative 1.82 plus 0.89 m

The product moment correlation coefficient between c and m is found to be 0.882

Explain how this value supports Barbara’s belief.

1c
Sme Calculator
5 marks

Show that the relationship between y and x can be written in the form y equals a x to the power of n where a and n are constants to be found.

2a
1 mark

Anna is investigating the relationship between exercise and resting heart rate.

She takes a random sample of 19 people in her year at school and records for each person

  • their resting heart rate, h beats per minute

  • the number of minutes, m, spent exercising each week

Her results are shown on the scatter diagram.

Scatter diagram on a grid showing a downward trend. Vertical axis labelled from 60 to 80,  horizontal axis labelled from 0 to 400.  Vertical axis is labelled h, and horizontal axis is labelled m .

Interpret the nature of the relationship between h and m.

2b
Sme Calculator
3 marks

Anna codes the data using the formulae

x equals log subscript 10 m

y equals log subscript 10 h

The product moment correlation coefficient between x and y is – 0.897

Test whether or not there is significant evidence of a negative correlation between x and y.

You should

  • state your hypotheses clearly

  • use a 5% level of significance

  • state the critical value used

2c
5 marks

The equation of the line of best fit of y on x is

y equals – 0.05 x plus 1.92

Use the equation of the line of best fit of y on xto find a model for h on m in the form

h equals a m to the power of k

where a and k are constants to be found.

3a
2 marks

A snack shop owner has noticed that the sale of energy drinks seems to increase later in the school term. He conducts a hypothesis test at the 1% level of significance to see if the sale of the drinks, d, increases as the number of days until the school holidays, h, decreases.

(i) What type of correlation is the snack shop owner testing for?

(ii) State which of the two variables is the explanatory variable.

3b
4 marks

Over the final thirty days of term the owner keeps a record of the number of sales of energy drinks and, using this data, calculates the product moment correlation coefficient to be r = -0.4187.

The table below gives the critical values, for different significance levels, of the product moment correlation coefficient, r, for a sample size of 30.

Level

10%

5%

2.5%

1%

0.5%

n = 30

0.2407

0.3061

0.3610

0.4226

0.4629

(i) Write down the critical region for the hypothesis test.

(ii) Stating your hypotheses clearly, test the snack shop owner's suspicion that more energy drinks are sold closer to the school holidays.

3c
2 marks

The snack shop owner calculates the regression line of d on h and uses it to predict the number of energy drinks he will sell on the first day of the new term, when there are still 90 days until the holidays.

State two reasons why this is unlikely to give a reliable prediction.

4a
3 marks

Adriana is a conservationist researching whether there is any correlation between the population sizes of king cobras, c, and their biggest enemy, the Indian grey mongoose, m. She collects data on population sizes of both species from a random sample of 15 wildlife reserves and calculates the product moment correlation coefficient to be r = -0.3264.

Carry out a suitable test to investigate if there is a linear correlation between c and m. You should:

  • state your hypotheses clearly

  • use a 5% level of significance

  • state the critical value used

4b
1 mark

Adriana concludes that the test indicates that there is no correlation between population sizes of king cobras, c, and the Indian grey mongoose, m.

Explain why Adriana's conclusion is not fully correct.

5a
3 marks

A biologist is researching a connection between the mass of an animal, M kg, and its expected lifespan, L years. The biologist suggests that there exists a relationship of the form L = AM^B, where A and B are constants to be found.

Show that the relationship can be rewritten using logarithms as

log subscript 10 L equals log subscript 10 A plus B log subscript 10 M

5b
Sme Calculator
2 marks

Using data from a wide range of animals, when y equals log subscript 10 L is plotted against x equals log subscript 10 M on a scatter diagram there seems to be a strong positive correlation. When the regression line of y on x is calculated, the equation is found to be y = 0.18x + 0.98.

By relating the equation of the regression line to the equation found in (a), or otherwise, find the constants A and B correct to 2 decimal places where appropriate.

5c
Sme Calculator
1 mark

Hence, estimate the lifespan of a horse with a mass of 600 kg to the nearest year.

5d
1 mark

The biologist concludes the research by suggesting that one way to increase your lifespan is to increase your mass.

Explain, based on these data, why the biologist may be incorrect.

6a
1 mark

M. Hatter has noticed that over the past 50 years there seems to be fewer hatmakers in London. He also knows that global temperatures have been rising over the same time period. He decides to see if there could be any correlation, so he collects data on the number of hatmakers, h, and the global mean temperature, t °C, from the past 50 years and represents the information in the graph below.

Scatter plot showing a negative correlation between global temperature in degrees Celsius on the x-axis and number of hatmakers on the y-axis.

Explain why a model of h = at + b is unlikely to fit these data.

6b
Sme Calculator
5 marks

Hatter suggests that the equation for h in terms of t can be written in the form h = ab^t. He codes the data using x = t and y equals log subscript 10 h and calculates the regression line of y on x to be y = 1.903 - 1.005x.

(i) Show that a = 80.0 correct to 3 significant figures.

(ii) Find the value of b to 3 significant figures.

(iii) Give an interpretation, in context, of the value of a in your answer to (b)(i).

6c
1 mark

M. Hatter calculates the product moment correlation coefficient between x and y to be r = -0.952 and concludes that the rise in mean global temperature is what is causing hatmakers in London to go out of business.

Explain whether M. Hatter's conclusion is fully justified.

7a
1 mark

A restaurant owner, Mr Capazio, suspects that there is positive correlation between the number of drinks consumed, d, and the amount of time taken to pay the bill, t minutes. He decides to conduct a one-tailed hypothesis test at the 5% level of significance to test his theory.

In the context of this question, describe what positive correlation would mean.

7b
Sme Calculator
4 marks

The table below shows the number of drinks consumed, d, and the amount of time taken to pay the bill, t minutes, for a random sample of 10 visitors to the restaurant.

Number of drinks, d

0

1

3

2

8

4

2

0

3

2

Time taken, t (minutes)

2.6

3.1

5.3

2.0

6.3

9.3

1.5

3.2

5.7

4.2

(i) Find the product moment correlation coefficient for these data.

(ii) Carry out a suitable test at the 5% level of significance to investigate whether there is evidence to support Mr Capazio's theory. You should:

  • state your hypotheses clearly

  • state the critical value used

7c
3 marks

Mr Capazio calculates the regression line of t on d to be t = 2.75 + 0.619d.

(i) Give an interpretation of the values 2.75 and 0.619 in the context of the question.

(ii) A person took 4.5 minutes to pay their bill. Explain why the regression line should not be used to estimate the number of drinks they had had.

8a
1 mark

Medhi is a meteorologist investigating the weather in Heathrow and claims that there is negative correlation between the daily total rainfall, f mm, and daily total sunshine, s hours.

Medhi decides to use the large data set to investigate this claim and forms a sample using all the days in June 2015 relating to Heathrow.

Some values for the daily total rainfall, f mm, are labelled as 'tr'. Using your knowledge of the large data set, state the range of values Medhi could assign to these values.

8b
3 marks

Medhi uses all the days in June 2015 as a sample and calculates the product moment correlation coefficient to be r = -0.2659.

Carry out a suitable test at the 5% level of significance to investigate Medhi's claim. You should:

  • state your hypotheses clearly

  • state the critical value used

8c
2 marks

Medhi uses this data to calculate the equation for the regression line of f on s. He plans to use the regression line to estimate the amount of rainfall there will be in Heathrow during a day in December.

Give two reasons why this is unlikely to produce a reliable estimate.

1a
1 mark

A doctor is collecting data on how a certain illness affects the weight of a person. Let w be the number of weeks that a patient had the illness and d kg be the amount of weight that the patient lost whilst they were ill. The doctor A doctor is collecting data on how a certain illness affects the weight of a person. Let w be the number of weeks that a patient had the illness and d kg be the amount of weight that the patient lost whilst they were ill. The doctor suspects that d and w have a relationship of the form d = aw^b, where a and b are constants to be found.

After plotting y equals log subscript 10 d against x equals log subscript 10 w, the doctor found there to be a strong correlation and the equation of the regression line of y on x was

y = 1.47x - 0.11

Explain why the relationship between x and y must have shown positive correlation.

1b
Sme Calculator
4 marks

By using the equation of the regression line of y on x, or otherwise, find the values of the constants a and b correct to 3 significant figures where appropriate.

1c
Sme Calculator
3 marks

Stating any assumptions you make, estimate the weight loss expected of a patient who has been sick for 20 days, to the nearest whole kilogram.

2a
Sme Calculator
5 marks

Scientists in Wuhan, China, started tracking the total number of cases of the CoViD-19 virus in January 2020. The graph below shows the number of days, d, after the first reported case, and the total number of cases, c, of the virus for a period of 12 days.Scientists in Wuhan, China, started tracking the total number of cases of the CoViD-19 virus in January 2020. The graph below shows the number of days, d, after the first reported case, and the total number of cases, c, of the virus for a period of 12 days.

Scatter plot showing the total number of cases (y-axis) over days (x-axis), with points increasing steeply over time, labelled with variables c and d.

(i) Give a reason why the scientists should not use a regression line to model the relationship between the number of days and the total number of cases.

(ii) After two days the scientists tried to model the relationship using an exponential model of the form c = kb^d. Given that after 1 day there were 278 cases and after 2 days there were 326 cases, calculate the values of k and b.

(iii) After 11 days there were 9700 reported cases in China. Comment on the suitability of this model.

2b
Sme Calculator
4 marks

Another group of scientists code the data using x = d and y equals log subscript 10 c. The regression line of y on x was found to be

y = 2.2476 + 0.1606x

(i) Using the regression line of y on x, find an equation for c in terms of d in the form c = ap^d. State the values of a and p to 4 significant figures.

(ii) Explain what the value of p represents in your answer to (b)(i).

(iii) One of the scientists used this model and estimated that after three months there would have been over 4.89 \times 10^{16} cases. This is more than the world's population. Give a reason to explain why this estimate was unreliable.

3a
Sme Calculator
5 marks

Rory is studying the relationship between two variables x and y. He believes they could be modelled by the equation y = ax^m where a and m are constants. He codes his data and plots a scatter graph of X = \log x against Y = \log y. Rory draws, by eye, a line of best fit between X and Y which passes through the points (2, 2.68) and (5, 3.10).

Using Rory's line of best fit, find the values of the constants a and m in his model. Give your answers correct to 3 significant figures where appropriate.

3b
5 marks

The product moment correlation coefficient between X = \log x and Y = \log y is r = 0.5047 for Rory's data.

(i) Given that the critical value at 1% significance (one-tailed) is 0.6581, write down the size of Rory's sample.

(ii) Carry out a suitable test at the 1% level of significance to investigate whether there is positive linear correlation between X and Y.

You should:

  • state your hypotheses clearly

  • state the critical value used

(iii) Comment on the suitability of Rory's equation y = ax^m.

3c
Sme Calculator
3 marks

Rory later discovers the relationship between x and y is better suited to the equation model y = kb^x. From his raw data he calculates the regression line of log subscript 10 y on x to be

log subscript 10 y equals 2.56 plus 1.12 x

Find the values of the constants k and b in Rory's new model, giving your answers to 3 significant figures.

4
Sme Calculator
5 marks

A researcher has been collecting data within a particular city on the number of sleeveless T-shirts sold per week, T, and the number of new gym memberships per week, G. The data is shown in the table below along with the values of log subscript 10 T and log subscript 10 G.

T

119

54

92

25

442

340

9

261

G

50

15

25

12

129

22

8

21

log subscript 10 T

2.0755

1.7324

1.9638

1.3979

2.6454

2.5315

0.9542

2.4166

log subscript 10 G

1.6990

1.1761

1.3979

1.0792

2.1106

1.3424

0.9031

1.3222

The researcher suspects that T and G are related in one of two ways:

T = aG^m \quad \text{or} \quad T = bp^G

where a, b, m and p are constants.

By calculating the product moment correlation coefficient for two appropriate coded variable pairs from the table, decide which of the two models better represents the relationship between T and G.

5a
Sme Calculator
2 marks

Charlie is interested to find out if there is positive correlation between the number of letters in someone's name, l, and the time t seconds (rounded to the nearest 5 seconds) it takes her six-year-old sister to correctly guess the spelling of the name. She looks at a random sample of names and records the results below.

Letters, l

3

3

4

4

4

5

5

5

6

7

Time, t

0

5

5

10

15

5

15

25

60

80

Frequency

2

16

4

x

17

3

29

17

4

1

Charlie carries out a one-tailed test at the 1% level of significance and finds that the critical value for this test is 0.2324.

Find the total number of names in Charlie's sample and hence find the value of x.

5b
Sme Calculator
4 marks

Carry out a suitable test at the 1% level of significance to investigate whether there is positive linear correlation between the number of letters in a name and the time it takes Charlie's sister to guess the correct spelling.

You should:

  • state your hypotheses clearly

  • state the PMCC

5c
2 marks

Charlie calculates the equation of the regression line of t on l to be

t = 11.2l - 33.4

(i) Give an interpretation of the value of the gradient of this regression line.

(ii) Explain why Charlie should not use this equation to estimate the number of letters in someone's name if it took her sister 70 seconds to guess the spelling.

6a
Sme Calculator
5 marks

When brewing beer, the temperature at which beer is stored during fermentation, T °C, affects the alcohol content, A%, at the end of the process. A group of brewers model the relationship as A = bp^T, where b and p are unknown constants.

They plot the regression line of y = \ln A on x = T and find it has a gradient of 0.0392 and passes through the point (0,\ 0.811).

Calculate the values of b and p, giving your answers to 3 significant figures.

6b
Sme Calculator
5 marks

In the data collected by the brewers, the range of values for T was 15 and the range of values for A was 4. The minimum alcohol content occurred at the minimum temperature and the maximum alcohol content occurred at the maximum temperature.

Find estimates for the minimum values of T and A, giving your answers to 2 significant figures.

6c
1 mark

Hence explain why it would not be appropriate to use the model to predict the alcohol content of beer when the temperature during fermentation is 50°C.