Character Sets (OCR A Level Computer Science)

Revision Note

James Woodhouse

Expertise

Computer Science

Character Sets

How are characters represented?

  • Computers only understand binary and therefore we need to represent characters using binary codes

  • For example, the letter 'A' might be represented as 01000001 in binary

Character sets

  • A character set is a list of all of the characters and their associated binary code

  • Character sets standardise the binary codes for each character

  • Without a character set, one system might interpret 01000001 differently from another

  • Two common character sets are:

    • American Standard Code for Information Interchange (ASCII)

    • UNICODE

ASCII

  • ASCII uses 7-bits to encode each character, providing for 128 distinct characters

  • For example, 'A' is represented as 65 in decimal, which is 1000001 in binary

  • ASCII was created to provide a common standard for encoding characters, which was necessary for compatibility among various types of hardware and software

  • An extended version of ASCII exists which encodes each character using 8-bits creating 256 characters

ASCII table

  • The ASCII table shows the relationship between characters that humans recognise and the denary values that represent them in the system

  • The denary values can then be converted to binary, representing the original character as binary

ascii-table

ASCII Table

Limitations of ASCII

1. It has a limited number of characters

ASCII is limited to 128 characters, which include English alphabets, numerals, and some special and control characters.

A, B, C, ..., Z
a, b, c, ..., z
0, 1, ..., 9
!, @, #, ...

2. It is not suitable for multilingual text

ASCII cannot represent characters from languages other than English, limiting its applicability globally.

No representation for: 'α', 'ö', 'ñ',

3. There is no provision for modern symbols

ASCII does not include modern symbols or emoji's common in today's digital communication.

Unicode

  • UNICODE was created to be a solution to the limitations of ASCII

  • UNICODE uses a much larger bit range, up to 32-bits (depending on the encoding method), allowing for a wide variety of characters from different languages and scripts

    • Example: The Greek character Lambda 'λ' is represented as U+03BB

    • U+03BB breaks down to:

      • U+, meaning this is a Unicode character

      • 03BB, meaning character 03BB in the UNICODE set

Impact on storage

  • ASCII is more storage-efficient, with characters requiring only 7-bits

  • UNICODE characters can require up to 32-bits, thus potentially using more storage space

Comparison

 

ASCII

UNICODE

Encoding system

7-Bits

16-bits or 32-bits

Number of characters

128 characters 

65,536 characters (16-bit)

Uses

Used to represent characters in the English language.

Used to represent characters across the world.

Benefits

It uses a lot less storage space than UNICODE. 

It can represent more characters than ASCII. 

It can support all common characters across the world.

It can represent special characters such as emoji's.

Drawbacks

It can only represent 128 characters. 

It cannot store special characters such as emoji's.

It uses a lot more storage space than ASCII. 

You've read 0 of your 0 free revision notes

Get unlimited access

to absolutely everything:

  • Downloadable PDFs
  • Unlimited Revision Notes
  • Topic Questions
  • Past Papers
  • Model Answers
  • Videos (Maths and Science)

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

James Woodhouse

Author: James Woodhouse

James graduated from the University of Sunderland with a degree in ICT and Computing education. He has over 14 years of experience both teaching and leading in Computer Science, specialising in teaching GCSE and A-level. James has held various leadership roles, including Head of Computer Science and coordinator positions for Key Stage 3 and Key Stage 4. James has a keen interest in networking security and technologies aimed at preventing security breaches.