Hash Tables (OCR A Level Computer Science)
Revision Note
Author
Jennifer PageExpertise
Computer Science
Hash Tables
What is a Hash Table?
A hash table is an associative array which is coupled with a hash function. The function takes in data and releases an output. The role of the hash function is to map the key to an index in the hash table.
Each piece of data is mapped to a unique value using the hash function
It is sometimes possible for two inputs to result in the same hashed value. This is known as a collision
A good hashing algorithm should have a low probability of collisions occurring but if it does occur, the item is typically placed in the next available location. This is called collision resolution
Hash tables are normally used for indexing as they provide fast access to data due to keys having a unique 1-1 relationship with the address at which they are stored.
Hash tables enable very efficient searching. In the best case, data can be retrieved from a hash table in constant time
A Character Hash Table
Not all key values are simple integers so if it’s a character then the hash function will need to do more processing.
An alphanumeric key that is made up of any four letters or numbers. An example would be C6IA
To hash the code, the characters can be converted to their ASCII value codes:
Character | ASCII code |
---|---|
C | 67 |
6 | 54 |
I | 73 |
A | 65 |
Then add the values to get a total: 67 + 54 + 73 + 65 = 259
Then we apply the MOD function: 259 MOD 97 = 65
This then means the data associated with C6IA will be stored in position 65 in the hash table
Collisions in a Hash Table
A collision occurs when the hash function produces the same hash value for two or more keys
B7MF produces the same hash value (74) as A5RD
Collisions are a problem when using hash tables because you can’t store two sets of data at the same location.
There are two ways to handle collisions Linear probing or chaining
Linear Probing
When there is a collision the data is placed in the next free position in the hash table
Either it can probe sequentially or use an interval (e.g check every 3rd slot) until an empty bucket is found
In the diagram the data with the key A5RD has already been inserted into the location 74 of the table. When you want to insert B7MF the same hash value is produced
So because the space is already occupied by A5RD it checks the next location which is 75, as this is occupied it checks the next location which is 76 which is empty so the data will be placed in this location
If there are lots of collisions the has table can end up very messy
Retrieving Data:
When there has been a collision the data will not be in the correct location. The process of retrieving data from the table where collisions have been dealt with by linear probing is as follows:
Hash the key to generate the index value
Examine the indexed position to see if the key (which is stored together with the data) matches the key of the data you are looking for
If there is no match then each location that follows is checked (at the appropriate interval) until the matching record is found
If you reach an empty bucket without finding a matching key then the data is not in the hash table
This significantly reduces the efficiency of using a hash table for searching.
Hash tables should always be designed to have extra capacity
Chaining
Another way of dealing with collisions is to use linked lists. Instead of storing the actual data in the hash table, you store a pointer to a location where the data is stored. Each data item is stored as a node with 3 attributes:
The key value
The data
A pointer to the next node
The first value to be stored will have a Null pointer as it is the only item in the list
When there is a collision the pointer for the first node location will be updated to point to the new node
This will create a chain of nodes whose keys resolve to the same hash value
Retrieving data
When collisions have been dealt with using chaining the data will always be accessed through a single location in the hash table. The process for retrieving it is as follows:
Hash the key to generate the index value
Examine the key of the node to see if it matches the key that you are looking for
If it is not the node you are looking for follow the linked list until you find the node
If you reach the end of the list without finding a matching key (or the head pointer was null), the data is not in the hash table
In the worse case (where every key hashes to the same value), you could still end up carrying out a linear search
It is important that the hash function is designed and tested to minimise the number of possible collisions
Rehashing
Items will be added and deleted from the hash table. If the table starts to fill up, or a large number of items are in the wrong place, the performance of the hash table will degrade. Rehashing is used to address this problem
Rehashing created a new table (larger if needed). The key for each item in the existing table will be rehashed and the item will be inserted into the new table
If the new table is larger the hashing algorithm will need to be modified as it will need to generate a larger range of hash values
Rehashing is also an opportunity to review the overall effectiveness of the hash function and to review how the collisions are handled
Hash Tables: Adding, Removing & Traversing Data
How do you add data to a Hash Table?
To insert data into a hash table you use the hash function to generate the index of the position in the array that will be used to store the data
The key must be stored as part of or alongside the data
C6IA will be stored at position 65
Hash Key | Hashing Function | Hash Value |
---|---|---|
C6IA | Applying hashing function | Hash value = 65 |
To be sure this is possible, position 65 will be checked to see if it is empty. If it is, the new data item is inserted into this position
If position 65 already contains data the overflow table will be used
This would take place by checking the first position in the overflow table to see if it is empty.
If it is not empty, it will iterate through the table until an empty position is found.
Algorithm for Adding to a Hash Table
Below is the pseudocode for the algorithm.
FUNCTION hash_function(key, table_size):
// Compute the hash value for the given key
hash_value = key MOD table_size
RETURN hash_value
FUNCTION add_to_hash_table(hash_table, key, value, table_size):
// Compute the hash value for the key using the hash function
hash_value = hash_function(key, table_size)
// Create a new entry with the key and value
entry = (key, value)
// Check if the hash value already exists in the hash table
if hash_table[hash_value] is empty:
// If the slot is empty, add the entry directly
hash_table[hash_value] = [entry]
else:
// If the slot is not empty, handle collisions using a separate chaining approach
// Append the entry to the existing list in the slot
hash_table[hash_value].append(entry)
ENDFUNCTION
Removing Data from a Hash Table
How do you remove data from a hash table?
To retrieve data from a hash table, the hash function is applied to the key to generate the index for the position within the array
The key to B4YJ gives the hash value of 87
The data can be retrieved in a single operation. You don’t need to use a binary or linear search, you can go to the correct position because an array allows direct access by index
If the data item is found it is removed (the memory address is marked as available)
If the data item is not found, then the overflow table is to be searched using the same linear approach mentioned above
Hash Key | Hashing Function | Hash Value |
B4YJ | Applying hashing function | Hash value = 87 |
FUNCTION removeFromHashTable(hashTable, key):
// Calculate the hash value for the given key
hashValue = hashFunction(key)
// Get the list of entries at the computed hash value
entryList = hashTable[hashValue]
// Find the entry with the matching key and remove it
FOR entry IN entryList:
IF entry.key == key:
entryList.remove(entry)
RETURN
// Key not found
PRINT "Key not found in the hash table"
ENDFUNCTION
Traversing a Hash Table
How do you traverse a Hash Table?
Traversing, Adding and Removing data from a hash table are all done in a similar fashion.
A hash value is generated in the same manner as before.
If the item being searched for is found, it is output
If the item being searched for is not found then the overflow table must be checked in a linear manner.
The main advantage of a hash table is that long searches are not necessary - additional searches are only required if there are any collisions
You've read 0 of your 0 free revision notes
Get unlimited access
to absolutely everything:
- Downloadable PDFs
- Unlimited Revision Notes
- Topic Questions
- Past Papers
- Model Answers
- Videos (Maths and Science)
Did this page help you?