Hash Tables (OCR A Level Computer Science)

Revision Note

Author

Jennifer Page

Expertise

Computer Science

Hash Tables

What is a Hash Table?

A hash table is an associative array which is coupled with a hash function. The function takes in data and releases an output. The role of the hash function is to map the key to an index in the hash table.
Each piece of data is mapped to a unique value using the hash function
It is sometimes possible for two inputs to result in the same hashed value. This is known as a collision
A good hashing algorithm should have a low probability of collisions occurring but if it does occur, the item is typically placed in the next available location. This is called collision resolution
Hash tables are normally used for indexing as they provide fast access to data due to keys having a unique 1-1 relationship with the address at which they are stored.
Hash tables enable very efficient searching. In the best case, data can be retrieved from a hash table in constant time

A Character Hash Table

Not all key values are simple integers so if it’s a character then the hash function will need to do more processing.
An alphanumeric key that is made up of any four letters or numbers. An example would be C6IA
To hash the code, the characters can be converted to their ASCII value codes:

Character	ASCII code
C	67
6	54
I	73
A	65

Then add the values to get a total: 67 + 54 + 73 + 65 = 259
Then we apply the MOD function: 259 MOD 97 = 65
This then means the data associated with C6IA will be stored in position 65 in the hash table

Collisions in a Hash Table

A collision occurs when the hash function produces the same hash value for two or more keys
B7MF produces the same hash value (74) as A5RD

Collisions are a problem when using hash tables because you can’t store two sets of data at the same location.
There are two ways to handle collisions Linear probing or chaining

Linear Probing

When there is a collision the data is placed in the next free position in the hash table
Either it can probe sequentially or use an interval (e.g check every 3rd slot) until an empty bucket is found

In the diagram the data with the key A5RD has already been inserted into the location 74 of the table. When you want to insert B7MF the same hash value is produced
So because the space is already occupied by A5RD it checks the next location which is 75, as this is occupied it checks the next location which is 76 which is empty so the data will be placed in this location
If there are lots of collisions the has table can end up very messy

Retrieving Data:

When there has been a collision the data will not be in the correct location. The process of retrieving data from the table where collisions have been dealt with by linear probing is as follows:
- Hash the key to generate the index value
- Examine the indexed position to see if the key (which is stored together with the data) matches the key of the data you are looking for
- If there is no match then each location that follows is checked (at the appropriate interval) until the matching record is found
- If you reach an empty bucket without finding a matching key then the data is not in the hash table
This significantly reduces the efficiency of using a hash table for searching.
Hash tables should always be designed to have extra capacity

Chaining

Another way of dealing with collisions is to use linked lists. Instead of storing the actual data in the hash table, you store a pointer to a location where the data is stored. Each data item is stored as a node with 3 attributes:
- The key value
- The data
- A pointer to the next node
The first value to be stored will have a Null pointer as it is the only item in the list
When there is a collision the pointer for the first node location will be updated to point to the new node
This will create a chain of nodes whose keys resolve to the same hash value

Retrieving data

When collisions have been dealt with using chaining the data will always be accessed through a single location in the hash table. The process for retrieving it is as follows:
- Hash the key to generate the index value
- Examine the key of the node to see if it matches the key that you are looking for
- If it is not the node you are looking for follow the linked list until you find the node
- If you reach the end of the list without finding a matching key (or the head pointer was null), the data is not in the hash table
In the worse case (where every key hashes to the same value), you could still end up carrying out a linear search
It is important that the hash function is designed and tested to minimise the number of possible collisions

Rehashing

Items will be added and deleted from the hash table. If the table starts to fill up, or a large number of items are in the wrong place, the performance of the hash table will degrade. Rehashing is used to address this problem
Rehashing created a new table (larger if needed). The key for each item in the existing table will be rehashed and the item will be inserted into the new table
If the new table is larger the hashing algorithm will need to be modified as it will need to generate a larger range of hash values
Rehashing is also an opportunity to review the overall effectiveness of the hash function and to review how the collisions are handled

Hash Tables: Adding, Removing & Traversing Data

How do you add data to a Hash Table?

To insert data into a hash table you use the hash function to generate the index of the position in the array that will be used to store the data
The key must be stored as part of or alongside the data
C6IA will be stored at position 65

Hash Key	Hashing Function	Hash Value
C6IA	Applying hashing function	Hash value = 65

To be sure this is possible, position 65 will be checked to see if it is empty. If it is, the new data item is inserted into this position
If position 65 already contains data the overflow table will be used
- This would take place by checking the first position in the overflow table to see if it is empty.
- If it is not empty, it will iterate through the table until an empty position is found.

Algorithm for Adding to a Hash Table

Below is the pseudocode for the algorithm.

FUNCTION hash_function(key, table_size):
    // Compute the hash value for the given key
    hash_value = key MOD table_size
    RETURN hash_value

FUNCTION add_to_hash_table(hash_table, key, value, table_size):
    // Compute the hash value for the key using the hash function
    hash_value = hash_function(key, table_size)

    // Create a new entry with the key and value
    entry = (key, value)

    // Check if the hash value already exists in the hash table
    if hash_table[hash_value] is empty:
        // If the slot is empty, add the entry directly
        hash_table[hash_value] = [entry]
    else:
        // If the slot is not empty, handle collisions using a separate chaining approach
        // Append the entry to the existing list in the slot
        hash_table[hash_value].append(entry)

ENDFUNCTION

Removing Data from a Hash Table

How do you remove data from a hash table?

To retrieve data from a hash table, the hash function is applied to the key to generate the index for the position within the array
The key to B4YJ gives the hash value of 87
The data can be retrieved in a single operation. You don’t need to use a binary or linear search, you can go to the correct position because an array allows direct access by index
If the data item is found it is removed (the memory address is marked as available)
If the data item is not found, then the overflow table is to be searched using the same linear approach mentioned above

Hash Key	Hashing Function	Hash Value
B4YJ	Applying hashing function	Hash value = 87

FUNCTION removeFromHashTable(hashTable, key):
    // Calculate the hash value for the given key
    hashValue = hashFunction(key)

    // Get the list of entries at the computed hash value
    entryList = hashTable[hashValue]

    // Find the entry with the matching key and remove it
    FOR entry IN entryList:
        IF entry.key == key:
            entryList.remove(entry)
            RETURN

    // Key not found
    PRINT "Key not found in the hash table"

ENDFUNCTION

Traversing a Hash Table

How do you traverse a Hash Table?

Traversing, Adding and Removing data from a hash table are all done in a similar fashion.
A hash value is generated in the same manner as before.
- If the item being searched for is found, it is output
- If the item being searched for is not found then the overflow table must be checked in a linear manner.
The main advantage of a hash table is that long searches are not necessary - additional searches are only required if there are any collisions

You've read 0 of your 0 free revision notes

Get unlimited access

to absolutely everything:

Downloadable PDFs
Unlimited Revision Notes
Topic Questions
Past Papers
Model Answers
Videos (Maths and Science)

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Next topic

Did this page help you?