Data Structure and Complexity part 6 : Hash tables

Gourine ayoub
7 min readFeb 18, 2021

Definition :

  • a hash table is a data structure that lets you map keys to a values in other words it will associate a given value to a given key and by that the data structure will be constructed.
  • to access a value in a hash-table you need to know it’s key.
  • a key must be unique but the value can be duplicated.
  • the values can be basically any type but for the keys we need to be sure that the key is hash-able with the hash function that we are using.
  • a hash table uses a hash function in order to calculate the indices and each time we do a lookup using the key the function will hash the key and get the appropriate index to it in order to give you the value associated to that key. (this is the best case ever where we suppose that the hash function will never give the same result for any given value)
  • since a hash function result can be the same for two entries for the same hash function (collision problem) the hash tables stores the key-value pairs in buckets so each time a lookup happens the hashing result of the key will give you the bucket where the value is stored.
  • The hash table in it self uses another data structure to hold the data and the type of that data structure my be different from one implementation to another arrays,list …etc
  • A good Hash Table will always find information at the same speed , no matter how much data is put in.
  • a bucket can be any data structures is used to store value as a part of the hash table implementation.

why we use hash-tables ?

  • The most big advantage for hash-tables is the fast access to the data.
  • It used to count items frequency ex : how many time a word appeared in a paragraph.

what is a hash function ?

A hash function is a function that takes a set of inputs and return a data in a known size range and that’s known as hashing in other worlds it maps the input that you give it to an index in a data structure.

Ex : H(x) = X2 + 8x+10 mod 10.

here the range is 10.

as you can notice there can be different inputs with the same output (collision).

Hash function are not only for simple types like “int” but also you can create hash functions for complex objects.

An example of hash function for string is a function that sums up the ASCI code for all the characters that it map it to a size range.

Example of hash function for a complex data type :

  • sometimes we have to do hashes for different and complex object or objects that represent a real world entity like a person that has a name a second name and an age (just keeping it simple) as a hash maybe we can set a hash function like this : length of name + length of second name * age then we do a modulo with the number of the persons that we assume that we assume that we will have so it will be like this :
  • personHash(x) = (x.name.length + x.secondName.length * x.age ) mod size.
  • this is just a simple function you must test it and aim for less hash collision.

proprieties of a hash-function ?

  • If H(x) = H(y) where x and y are different inputs this means that that there is a possibility that x=y but if the x != y then we are sure that H(x) != H(y), this can help you in comparing object so you first compare the hashes if they are different the you are sure that x and y are different else you go and check x and y. for example if we are treating big files and we have already calculated the hash for each file , if you want to check if the files contain the same elements than first compare the hashes then go to the file char by char in case the hashes are the same.

Note : fore files there is different and advanced types of hash like cryptographic hash functions and checksum.

  • a hash function must be deterministic that means that if H(y) = m , each time we execute H() with the input ‘y’ we must get ‘m’ as a result.
  • a hash collision is where tow different inputs ‘x,y‘ and the same hash function H() give as a result the same output ‘m’.
  • you have to aim for the hash function to be uniform so the output is unique for different inputs and there is no hash collision.
  • to make sure that function is deterministic most programming languages force that the key used are immutable it means you can change the keys.
  • the advantage of hash tables that is constant time access is only guaranty as long as your hash function is uniform and there is no collision.

How the hash tables work ?

In this example we suppose that the keys are integers and the values are strings and the hash function is name H(x) and we assume that the size range is 50. the process will be the next steps :

1- hash the input (the key) and get the hashing result.

2- the result of H(key) will give us the position where to place the value from (0–50) , the collision is treated differently from one implementation to another.

3- to check what is the value of the key x we need to :

3.1 — calculate the hash of the key.

3.2 — access to the array with index from the hash result.

Note : a hash collision can appear if two different key has the same index as a result.

Collision problem and possible solutions :

To solve this problem we can use many different approaches like :

Separate chaining :

in this solution instead of having only a one value per key now the key is associated to a data structure like a linked list that contain all the elements that has the same key hashing result.

  • once your linked list get really long and you want a faster access or a constant access time you can create another hash table larger then the one before and rehash all the values so the range will be bigger and linked lists smaller. (this can cost more time complexity but the appearance of this operation is only when you reach the size limitation so it’s not every time we add a new item).
  • one thing you need to know is that if we resize the bucket we need to rehash the values. so the linked list will have less values since some values of the same hash will go to different places cause the size has changed and there is more space.

Open Addressing :

In this solution we try to find another place for the value that we found that it’s key hash already exists by offsetting it’s index and finding another place for it.

Example : in this example Jack williams and Andrew Wilson has the same hash result so we changed one of theme by offsetting it and giving it a new place.

Complexity :

for the insertion , deletion and search all this operation has a bets case of O(1) it means constant time when our hash function has no collision (uniform hash functions) and O(n) at max when when there is more than value per key.

note : i am not doing a an implementation for this article data structure since it has many variation in each language and many implementations and we had talked about the theory and how it works you can know implements it base don what you have learned in this article and create your own hash algorithms (but it;s better to use predefined packages since there hash functions are better ). as an example in java the hash data structure is represented with : hash-Maps , hash-Linked List …etc and each one has it’s own characteristics.

Conclusion :

  • hash tables is a must know structure that will help you a lot in optimizing you solution in terms of complexity.
  • the problem of hash collision is a famous problem consider take a look at it cause it’s it self a wide topic and there is many different solutions that we need to discuss it in it’s own article if we want to go deep in it.
  • note : the illustrations and figures are not my work.

hope you learn had a good reading time and learned something new if you want more about data structures you can check the previous article talking about different data structures.

--

--