Character Sets (OCR A Level Computer Science)
Revision Note
Author
James WoodhouseExpertise
Computer Science
Character Sets
How are characters represented?
Computers only understand binary and therefore we need to represent characters using binary codes
For example, the letter 'A' might be represented as 01000001 in binary
Character sets
A character set is a list of all of the characters and their associated binary code
Character sets standardise the binary codes for each character
Without a character set, one system might interpret 01000001 differently from another
Two common character sets are:
American Standard Code for Information Interchange (ASCII)
UNICODE
ASCII
ASCII uses 7-bits to encode each character, providing for 128 distinct characters
For example, 'A' is represented as 65 in decimal, which is 1000001 in binary
ASCII was created to provide a common standard for encoding characters, which was necessary for compatibility among various types of hardware and software
An extended version of ASCII exists which encodes each character using 8-bits creating 256 characters
ASCII table
The ASCII table shows the relationship between characters that humans recognise and the denary values that represent them in the system
The denary values can then be converted to binary, representing the original character as binary
ASCII Table
Limitations of ASCII
1. It has a limited number of characters
ASCII is limited to 128 characters, which include English alphabets, numerals, and some special and control characters.
A, B, C, ..., Z
a, b, c, ..., z
0, 1, ..., 9
!, @, #, ...
2. It is not suitable for multilingual text
ASCII cannot represent characters from languages other than English, limiting its applicability globally.
No representation for: 'α', 'ö', 'ñ',
3. There is no provision for modern symbols
ASCII does not include modern symbols or emoji's common in today's digital communication.
Unicode
UNICODE was created to be a solution to the limitations of ASCII
UNICODE uses a much larger bit range, up to 32-bits (depending on the encoding method), allowing for a wide variety of characters from different languages and scripts
Example: The Greek character Lambda 'λ' is represented as U+03BB
U+03BB breaks down to:
U+, meaning this is a Unicode character
03BB, meaning character 03BB in the UNICODE set
Impact on storage
ASCII is more storage-efficient, with characters requiring only 7-bits
UNICODE characters can require up to 32-bits, thus potentially using more storage space
Comparison
| ASCII | UNICODE |
---|---|---|
Encoding system | 7-Bits | 16-bits or 32-bits |
Number of characters | 128 characters | 65,536 characters (16-bit) |
Uses | Used to represent characters in the English language. | Used to represent characters across the world. |
Benefits | It uses a lot less storage space than UNICODE. | It can represent more characters than ASCII. It can support all common characters across the world. It can represent special characters such as emoji's. |
Drawbacks | It can only represent 128 characters. It cannot store special characters such as emoji's. | It uses a lot more storage space than ASCII. |
You've read 0 of your 0 free revision notes
Get unlimited access
to absolutely everything:
- Downloadable PDFs
- Unlimited Revision Notes
- Topic Questions
- Past Papers
- Model Answers
- Videos (Maths and Science)
Did this page help you?