Compression Algorithms (College Board AP® Computer Science Principles): Revision Note
Compression fundamentals
What is data compression?
Compression is the process of reducing the number of bits needed to represent data
Compression decreases data size, which reduces the amount of storage space required and speeds up transmission over a network
Compression works by identifying patterns and redundancy in the data, or by removing detail that is unlikely to be noticed
The goal is to reduce size while preserving as much useful information as possible
Why compression matters
Smaller files use less storage on devices and servers
Compressed data transfers faster over networks, reducing download and upload times
Without compression, many everyday tasks (streaming music, sending images, loading web pages) would require significantly more time and bandwidth
How much can compression reduce file size?
The amount of size reduction from compression depends on both the amount of redundancy in the original data representation and the compression algorithm applied
Fewer bits does not necessarily mean less information, a good compression algorithm can reduce the number of bits needed without losing any of the original meaning
Data that contains a lot of repetition (for example, a text file where one word appears hundreds of times) can often be compressed more effectively than data that is already dense and non-repetitive
Lossless vs lossy compression
What is the difference between lossless and lossy compression?
Lossless compression can usually reduce file size without losing any data, the original data can be perfectly reconstructed from the compressed version
Lossy compression reduces file size by permanently removing some data: the result is an approximation of the original that cannot be fully restored
Lossy compression can usually reduce file size more than lossless compression
Feature | Lossless | Lossy |
|---|---|---|
Data after decompression | Identical to original; fully preserved | An approximation; some detail is permanently lost |
Typical file size reduction | Moderate | Significant |
Best used for | Text, code, spreadsheets, medical images; where every detail matters | Audio, images, video; where small losses are not noticeable to humans |
Compression trade-offs & selection
How do you choose between lossless and lossy compression?
Choosing a compression method involves weighing trade-offs between file size and quality
Lossless is the right choice when preserving the original data exactly is the priority (for example, compressing a program's source code or a legal document)
Lossy is the right choice when reducing file size is more important than keeping every detail (for example, compressing a photograph for a website where a small loss in quality is acceptable)
The decision depends on the purpose of the data and how it will be used
Factors that influence the choice
Purpose: will the data need to be reconstructed exactly, or is an approximation acceptable?
Storage constraints: is storage space limited, making aggressive compression necessary?
Transmission speed: does the file need to transfer quickly over a slow network?
Audience: will the end user notice the difference in quality?
Examiner Tips and Tricks
When the AP exam describes a scenario and asks which compression type is appropriate, identify what matters most: if accuracy and full reconstruction are essential, the answer is lossless. If reducing file size is the priority and minor quality loss is acceptable, the answer is lossy. Common exam distractors describe lossy compression as "losing the file" — lossy means some detail is removed, not that data is lost entirely or corrupted.
For the AP Create Performance Task, if your program uses images, audio, or video, the file formats you choose will affect storage size and quality — understanding the difference between lossless and lossy formats helps you make informed decisions about the media you include
Worked Example
Which of the following best describes what happens when a file is compressed using a lossless algorithm?
(A) Some information is removed to reduce the file size, so the original cannot be reconstructed
(B) The file size is reduced, and less information is stored than in the original
(C) The file size is reduced, but the original data can be perfectly reconstructed from the compressed version
(D) The file size stays the same, but the data is reorganized
[1]
Answer:
(C) The file size is reduced, but the original data can be perfectly reconstructed from the compressed version [1 mark]
Lossless compression reduces the number of bits stored or transmitted while guaranteeing complete reconstruction of the original data. Fewer bits does not necessarily mean less information — the original data is preserved even though the compressed file is smaller.
Unlock more, it's free!
Was this revision note helpful?