Syllabus Edition
First teaching 2025
First exams 2027
Descriptive Statistics & Outliers (DP IB Psychology): Revision Note
Descriptive statistics & measures of central tendency
Descriptive statistics include measures of central tendency, as they describe the central or typical value of a data set
Measures of central tendency are used to summarise large amounts of data into typical mid-point scores
The mean
This calculates the average score of a data set
The mean indicates what a researcher would expect to find (as the average score) if they were to replicate the procedure of a given study
The mean is calculated using the total score of all the values in the data set divided by the number of values in that set
E.g., 4 + 6 + 7 + 9 = 26
26 ÷ 4 = 6.5
mean = 6.5
The median
This calculates the middle value of a data set (the positional average)
The data has to be arranged into numerical order first (with the lowest score at the beginning of the list)
E.g., 20, 43, 56, 78, 92, 67, 48 is ordered into 20, 43, 48, 56, 67, 78, 92
Median = 56 as this is the value at the halfway point in the set
Sometimes there may be two middle numbers in a set of data
E.g., 15, 16, 18, 19, 22, 24
The halfway point is between 18 and 19
In this case, add the two middle values (18 + 19 = 37)
Divide the total by 2 (37 divided by 2 = 18.5)
Thus, the median = 18.5
The mode
This calculates the most frequently occurring score in a data set
Some data sets may have:
no mode
two modes (known as bi-modal)
more than two modes (known as multi-modal)
The mode is used when the researcher cannot use the mean or the median
E.g., a researcher wishes to measure how many times litter is dropped in a naturalistic observation
E.g., with a data set of 3, 3, 3, 4, 4, 5, 6, 6, 6, 6, 7, 8, count the most frequently occurring number
Thus, the mode = 6
Descriptive statistics & measures of dispersion
Measures of dispersion calculate the spread of scores and how much they vary in terms of how distant they are from the mean or median
A data set with low dispersion will have scores that cluster around the measure of central tendency (the mean or median)
A data set with high dispersion will have scores that are spread apart from the central measure with much variation among them
If a data set contained exactly the same score per participant (e.g., everyone scored 15 out of 20 on a memory test), then the dispersion score would be zero, as there would be no variation at all in the scores (plus the mean, mode and median would be identical = 15)
The range
This describes the difference between the lowest and the highest scores in a data set
The range provides information as to the gap between the highest and lowest scores
To calculate the range subtract the lowest value from the highest value in the data set, e.g.,
to calculate the range of 4, 4, 6, 7, 9, 9, subtract the lowest number (4) from the highest number (9)
The range is 9
When dealing with data that has been rounded, +1 is added to the data set to account for any rounding up or down which has been applied to the original scores
9 - 4 = 5 + 1
Thus, the range = 6
Standard deviation
This calculates how a set of scores deviates from the mean
Standard deviation provides insight into how clustered or spread out the scores are from the mean
A low standard deviation indicates that the scores are clustered tightly around the mean, which indicates the reliability of the data set
A high standard deviation indicates that the scores are more spread out from the mean, which indicates lower reliability
Normal distributions have a low standard deviation, as they reflect the fact that the scores are clustered close to the mean
There are six steps to calculating the standard deviation
Calculate the mean
Subtract the mean from each score in the data set
Square the scores which have just been calculated at step 2
Add all of the squared scores together
Divide the total squared score by the number of scores minus 1
Work out the square root of the variance (using a calculator)
The effect of outliers
An outlier is a score or value that falls far beyond the other values in a data set
These extreme values can be caused by:
variability within the data
E.g., two people in a sample of 50 have abnormally good memory
novel data
E.g., people self-report the number of times they look at their fitness score on their smart watch
errors in how the data has been collected
E.g., some participants' memory scores were mistakenly not added to the statistical analysis
Outliers can significantly affect calculation and interpretation of the mean
In a data set with outliers, the median is preferred over the mean, as it is not affected by extreme values
E.g., a data set comprising scores of 4, 6, 3, 7, 16, 2, 9, 4 would not be calculated using the mean due to the presence of the value 16 as this is significantly higher than the other values
Unlock more, it's free!
Did this page help you?