Data structure and Complexity Part 4 : Priority Queues and Heaps

13 min readNov 21, 2020

Priority Queues

1- Definition :

Priority queue is an abstract data structure that has the same mechanics like a normal queue but each element has priority that will be used when we want to serve an element it means that when we want to get an element out of it (serve or offer an element) we will not use the order of entering the element in the queue as a mechanism but rather use the priority that is attached to the elements in a way that the element with the highest priority will be served first. And in order to maintain that logic we must be sure that the data that we want to store is comparable so we can know which one has a bigger or lower priority based on a condition(invariant) .

Example :

if we say that we have a priority queue that has this numbers = {9,100,2,1,3,5,8,6,0} :

in this example the priority is based on the value of that integer in a way that the smaller the number is the the higher priority it has. In that way if we want to get an element from this PQ (priority queue ) with the poll() method the result will be : -1 then with another poll we will have :2 then 3 …. etc but if it was a normal Queue then it will follow the FIFO(First in first out) principal in a way that the first poll() will return as a result 9 then 100 then 2 …etc.

2- Why we use PQ ?

used to know the next best element to serve based on a condition.
used in some djikstra shortest path algorithm.
used on the minimum spanning tree algorithms (MST).
used in the A* algorithm and Best search algorithms and Artificial Intelligence.
used in a lot of algorithms like the prime algorithm .
used in Operating systems to for loading.

3- How a PQ can be implemented ?

A priority queue is implemented with the help of another data structure called heaps so in order to implement the PQ data structure we need to understand what is a heap and how to use it, and In our implementation we will use binary heaps .

Note : there is other methods to implement the PQ with linked lists and dynamic arrays but with heaps we will get the best time complexity.

Heaps :

1- Definition :

A heap data structure is a tree that have an invariant or property that need’s to be respected always, it means when ever you check the heap DS the invariant must always be true as an example of an invariant :

Each parent node value must be grater then the value of all it’s children node.

Note : heaps are based on tree but for the sake of using heaps in the implementation of priority queues we have to know them but we will discuss trees in another article with it’s different types, operation and more details.

Note : in trees we have nodes and each nodes has it’s children (no specified numbers it changes from a tree to another based on the tree type) in case when there is no children then we call that node a leaf also the top node is called the root of the tree , Each tree can be divided to sub trees.

In our case we will talk about binary heaps which means each parent node has at max tow children and we will have this tow types of binary heaps :

max heap : where each parent node must have a value that is grater then all it’s children in a way that the root node has the biggest value in the tree and the leaf nodes has the smallest values in the tree.

Min heap : where each parent node must have a value that is smaller then all it’s children in a way that the root node has the smallest value in the tree and the leaf nodes has the biggest values in the tree.

Note : a complete binary tree is a tree where each node has two children.

2- How we can implement heaps ?

Heaps can be represented with arrays or objects and pointers but in our case we will use arrays since it’s most used and more easy in manipulation and also very fast because of the access time in arrays is constant ,

we have two arrays one is for the nodes value and one is for the nodes priority in a way that if we have in the first array the value 5 as the first element that implies that if we want to know the priority of that node we will go the second array the first element and we will find it.

3- Benefits of using arrays as an implementation in the binary heap :

when you read from the array it will be always be BFS(breadth first search) read automatically, it means you print with a loop the elements of the array the result will be breadth first search result.
the parent node and it’s two child has a relation so that :

if the index of the parent node is I then :

left child index = 2*I +1

right child index = 2*I +2

example : if we have a cretin node with the index 6 then we have :

left child index = 2*6 + 1 = 13 and right child index = 2*16 +2 = 17.

3- How to make sure that the invariant is always respected after the heap manipulation ?

3.1- After adding elements to the heap :

add the new element at the bottom of the tree
perform cretin process that will make sure that the invariant is respected (the process can be called : bubbling up/sifting up/swimming ) , so the node will go up the tree until it reaches it’s propriety place in the tree based on the invariant.
Example : in this case we are adding the value 3 to a min heap.

3.2- After removing an element from the heap :

here is two types of removing :

The first is polling poll() because in the heap we are most of the time interested in the element with the max priority and that will be the root node.

-1/ first we need to make the root in the bottom so we swap it with the last leaf node and since it’s represented with an array we know that the root has an index of 0 and the last leaf will have and index n(n is the size of the heap) then we remove it.

2/ since the root now is the previous last leaf , we are sure that the invariant is not satisfied so we need to fix it by doing the opposite of bubbling up (bubbling down/sink) until we find the place that satisfies the invariant.

the second one is normal removing based on a value that we need to search for it first then remove it :

1/ search for the value in the tree.

2/ after finding the node with the value that we want to remove we swap that node with the last leaf node

3/ delete the node.

4/ re-balance the heap with either a bubble up or a bubble down depends on the value on the last leaf node that we swapped with.

Example : in this example we are deleting the value 1 from a min heap.

4- Removing nodes but with less Time complexity (using hash tables) :

note: hash table is another data structure that we will discuss in details in the next articles just for now you should know that it helps us to access data in constant O(1) time complexity due to the hashing mechanism , although you will access those value not by indexes but by things called keys.

The removing algorithm will always take more time then the adding because we have to search for the value first so we have the complexity if the search step plus the complexity of the balancing of the heap but there is a way to eliminate the first step and get know where is the node that we want to delete without searching and that is with using has tables.

Hash table is a data structure that has two main element or proprieties :

key : which is more like an index and it can be any type.
Value : which is a value that is associated with a given key.

So what hash tables do is that it maps the key to a value so that we don’t have to search for the value associated to that key each time and if we consider the key to be value that we want to remove than the value of it will be the index of the node that we want to remove based.

In case where there is more then one node with the same value , the hash table now will store as a value a set of values instead of a one value so the key will be associated to a set of values and when we want to delete we will take the first one of them (this problem is called hash collision and will be discussed in details in the next articles that re about hash data structures)

note : it does note matter which node of the set we remove if we have multiple repeated nodes because we are going to balance the heap after that so it will always respect the heap invariant.

5- Turning a min PQ to a Max PQ :

most of the libraries has a min PQ only so if you need a max PQ you need to write something that will convert a min PQ to a max PQ.

To do the converting we have a couple of ways that we will talk about :

the first method will be the negation of the invariant as an example if we have the invariant says that if we have tow nodes X and Y if X<=Y then X has a higher priority and it should get out first if we want to convert this then we do the negation of the invariant and our new invariant will be X>=Y in this case we know that y will have a higher priority then the first invariant .
the second method will be the modification of the nodes value in the adding and the polling process in a way that each time we want to add a value we add the negation of it and in the polling we return the negation of it and this will have the same effect as the negation of the invariant.

4- Implementation of the PQ using the binary heaps :

for the sake of good understanding of the implementation we will divide this part into 4 section :

PQ representation implementation : how we can represent the PQ and what DS we use.
Hash-map implementation : methods that manipulates the hash-map that we are using to keep track of the nodes values .
Heap balancing methods : methods that will help us balance the heap each time we change something in the heap(after add/remove operations).
General PQ methods : methods pf the PQ like add and remove using the previous implemented methods (hash-map methods and the balancing methods).

4.1- PQ representation implementation :

note : the implementation file is in this git-hub link : https://github.com/GourineAyoubPrf/Priority-Queues

In The representation of the priority queue we will use as we talked in the previous section this two data structures :

a list to represent the heap and store the heap nodes.
a hash data structure in our case a hash-map that will store for each value all the node indexes of the heap that has that value.
the comparable interface will help us to compare tow objects .

4.2- Hash-map implementation :

In our code we need to manipulate the hash-map that contains the values with the indexes each time we add or remove a node in order to keep the heap balanced we need to make some changes in the hash-map also , the changes that we need to do can be represented in this operations :

addToMap method : it’s a method that will allow us to add the value to the hash map. when we add a new node to the heap we need to keep the two DS synchronized, what we do here is we just check if there is already a tree-Set of the index that we are about to add if the answer is yes then we just add the new value to it ,if not we create a new Tree-Set add the new index to it then add the Tree-Set to the hash-map.

getFromMap method : simply it’s a method that will give us the index of the value passed as a parameter to it in a constant time since it’s a hash data structure and this is why we used it.

removeFromMap methde : this method will allow us to remove the index of a node in case that we removed that node from the heap to keep the two data structures synchronized.

mapSwapping method : this method is used when we are in the process of balancing the heap because we swap the nodes until we make sure that the heap invariant is respected and the heap is balanced. so when we swap the nodes in the heap we need to swap there indexes in the hash-map.

4.3- Heap balancing methods :

This methods are used when ever we need to re-balance the heap after some changes caused by the add/remove operations :

swap method : a method used to swap tow nodes in the heap and respectively call the mapSwapiing method to make sure that every thing is synchronized .

isSmaller method : a method used to test if a node value is smaller the other since we extended the comparable interface the comparing will be possible between the two nodes.

sink method : a method used to balance the heap , giving it a node index it will compare the value with child's of that nodes and swap the nodes until the heap is balanced and the node is in the proper place(used when we delete a value).

swim method : a method used to balance the heap, giving it a node index it will compare it with the node parent and swap the nodes until the heap is balanced and the node is in the proper place (used when we add a new node).

4.4- General PQ methods :

This 4 methods will use all the previous methods in order to keep the heap balanced when they get executed.

add method : this method will add the value passed by parameters to the heap and to the map using the methods defined earlier and since the adding is happening in the and of the heap we will do a swim operation in order to balance the heap and find the proper place for the new added value.

peek method : this method will return the root element of the heap it mean the element with the highest priority in the PQ.

remove method : this method will remove the node that has the value that is passed by parameters , because the removing happens after we swap the last element with the element that we want to remove we have to re-balance the heap and since the value of the last node that we swapped with can either be smaller or greater then the parent node we need to test both the swim and sink method in order to make sure that the heap will be balanced after the removing.

poll method : this method will return the element with the highest priority which is the root and it will remove it using the remove method.

Conclusion :

knowing heaps and priority queues can be a big advantage in some problems specially when you have a serve mechanism based on a condition that implies the comparing of the nodes.
although each data structure is special in it;s way of working some of the data structures depends on other to use in it’s implementation.
each data structure you learn give you a new approach and a new view to see the problems that you want to solve.

note : This series of article are inspired from some searching in wikipedia websites , geeksForGeeks website … etc and the crash course of William Fiset about data structures.