Compression - Huffman Coding (AQA GCSE Computer Science): Revision Note
Exam code: 8525
Huffman Coding
What is Huffman coding?
Huffman coding is a method of lossless compression primarily used on text based data (documents)
A Huffman coding tree is used to compress the data whilst keeping all the data so that it can be uncompressed back to its original state
Example (step 1)
A text file contains the following characters
Text file | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
I |
| L | O | V | E |
| C | O | M | P | U | T | E | R | S |
The text file contain 16 characters including spaces
A frequency analysis shows there is some repetition in the characters, for example there are 2 x E's
Frequency | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
I | space | L | O | V | E | C | M | P | U | T | R | S |
1 | 2 | 1 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Huffman Trees
What is a Huffman tree?
A Huffman tree is a type of binary tree used in Huffman coding to achieve lossless compression of text data
Each character is represented as a leaf in the tree
Internal nodes always have two children, while leaves have none
The tree is built based on the frequency of characters in the text, so that more frequent characters are given shorter codes
Binary trees are covered in much more detail at A-Level
Example (step 2)
List the frequency of each character and order them from least to most frequent:

Combine the two least frequent characters (I + L = 2) into a new node, then reorder:

Update the frequency data to show the combined characters

Repeat until all characters are combined into a single tree

Assigning bit patterns
Follow each branch of the tree, writing 0 on left branches and 1 on right branches
Each character ends up with a unique binary code

More frequent characters get shorter codes
Character | Bit pattern | Times used | Total bits |
---|---|---|---|
space |
| 2 | 3*2 = 6 |
O |
| 2 | 3*2 = 6 |
E |
| 2 | 3*2 = 6 |
I |
| 1 | 4*1 = 4 |
L |
| 1 | 4*1 = 4 |
V |
| 1 | 4*1 = 4 |
C |
| 1 | 4*1 = 4 |
M |
| 1 | 4*1 = 4 |
P |
| 1 | 4*1 = 4 |
U |
| 1 | 4*1 = 4 |
T |
| 1 | 4*1 = 4 |
R |
| 1 | 4*1 = 4 |
S |
| 1 | 4*1 = 4 |
|
| Total number of bits | 58 |
Compression comparison
Using ASCII: 16 characters × 7 bits = 112 bits
Using Huffman coding: 58 bits
In this example, Huffman coding reduces the file size from 112 bits to 58 bits, a saving of about 48%, which is almost half the original space
Examiner Tips and Tricks
Huffman trees are not unique; different trees can produce slightly different codes but the same compression ratio
Unlock more, it's free!
Did this page help you?