Compression - Huffman Coding (AQA GCSE Computer Science): Revision Note

Exam code: 8525

Robert Hampton

Written by: Robert Hampton

Reviewed by: James Woodhouse

Updated on

Huffman Coding

What is Huffman coding?

  • Huffman coding is a method of lossless compression primarily used on text based data (documents)

  • A Huffman coding tree is used to compress the data whilst keeping all the data so that it can be uncompressed back to its original state

Example (step 1)

  • A text file contains the following characters

Text file

I

 

L

O

V

E

 

C

O

M

P

U

T

E

R

S

  • The text file contain 16 characters including spaces

  • A frequency analysis shows there is some repetition in the characters, for example there are 2 x E's

Frequency

I

space

L

O

V

E

C

M

P

U

T

R

S

1

2

1

2

1

2

1

1

1

1

1

1

1

Huffman Trees

What is a Huffman tree?

  • A Huffman tree is a type of binary tree used in Huffman coding to achieve lossless compression of text data

  • Each character is represented as a leaf in the tree

  • Internal nodes always have two children, while leaves have none

  • The tree is built based on the frequency of characters in the text, so that more frequent characters are given shorter codes

  • Binary trees are covered in much more detail at A-Level 

Example (step 2)

  • List the frequency of each character and order them from least to most frequent:

Table with two rows; the first row has the letters "I L V C M P U T R S" and "space O E," and the second row numbers "1 1 1 1 1 1 1 1 1 1 2 2 2".
  • Combine the two least frequent characters (I + L = 2) into a new node, then reorder:

huffman-coding-image1
  • Update the frequency data to show the combined characters

Table with letters "V C M P U T R S I L space O E" in the first row and corresponding numbers "1 1 1 1 1 1 1 1 2 2 2 2 2" in the second row.
  • Repeat until all characters are combined into a single tree

huffman-coding-image2
  • Assigning bit patterns

    • Follow each branch of the tree, writing 0 on left branches and 1 on right branches

    • Each character ends up with a unique binary code

huffman-coding-image3
  • More frequent characters get shorter codes

Character

Bit pattern

Times used

Total bits

space

000

2

3*2 = 6

O

001

2

3*2 = 6

E

010

2

3*2 = 6

I

0110

1

4*1 = 4

L

0111

1

4*1 = 4

V

1000

1

4*1 = 4

C

1001

1

4*1 = 4

M

1010

1

4*1 = 4

P

1011

1

4*1 = 4

U

1100

1

4*1 = 4

T

1101

1

4*1 = 4

R

1110

1

4*1 = 4

S

1111

1

4*1 = 4

 

 

Total number of bits

58

  • Compression comparison

    • Using ASCII: 16 characters × 7 bits = 112 bits

    • Using Huffman coding: 58 bits

  • In this example, Huffman coding reduces the file size from 112 bits to 58 bits, a saving of about 48%, which is almost half the original space

Examiner Tips and Tricks

Huffman trees are not unique; different trees can produce slightly different codes but the same compression ratio

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Robert Hampton

Author: Robert Hampton

Expertise: Computer Science Content Creator

Rob has over 16 years' experience teaching Computer Science and ICT at KS3 & GCSE levels. Rob has demonstrated strong leadership as Head of Department since 2012 and previously supported teacher development as a Specialist Leader of Education, empowering departments to excel in Computer Science. Beyond his tech expertise, Robert embraces the virtual world as an avid gamer, conquering digital battlefields when he's not coding.

James Woodhouse

Reviewer: James Woodhouse

Expertise: Computer Science & English Subject Lead

James graduated from the University of Sunderland with a degree in ICT and Computing education. He has over 14 years of experience both teaching and leading in Computer Science, specialising in teaching GCSE and A-level. James has held various leadership roles, including Head of Computer Science and coordinator positions for Key Stage 3 and Key Stage 4. James has a keen interest in networking security and technologies aimed at preventing security breaches.