Data Structures And Complexity Part 5: trees
Definition:
a tree is non-linear data structure that has a number of nodes connected in hierarchical order you can also call it and undirected graph , each nod holds a value , a tree has the flowing aspect :
- the first node (the top node) is called the root and has no parents .
- each node can have from 0 to n children , each children has a parent node.
- nodes that has no children's are called leafs.
- no cycles in the tree because each tow nodes are connected with only one arc(edge) it means there is only one path from a node to another.
- each tree can be divided into sub-trees.
- each tree has levels (in depth and breadth).
- the node it self is a data structure that has a value(key) and pointers to the children's of it.
- the height of the tree is the number of levels.
- nodes of the same level are called siblings.
- there is many types of trees: binary trees, binary search trees, AVL , black red …etc
Note : a tree is a special case of a graph.
why we use trees ?
- we use trees so we can structure our elements in a hierarchical way so that some operation can be done in a lower time complexity such as search in binary search tree.
- create a relation between nodes (parent and children's) that can help you to model some of real word situations.
- it can easily mode real word situations.
binary trees:
definition:
a binary tree is a tree where each node has at max two children's called left node/child and right node/child , a complete binary tree is where every node in the tree has 2 children's.
binary search trees:
definition :
a binary search tree is the same as a binary tree except this time we have an invariant that we need to respect in order to say that our binary tree is a binary search tree and that’s is :
- the left sub-tree of a node must contains only values that are smaller than the value of the node itself.
- the right sub-tree of a node must contains only values that are greater than the value of the node itself.
- The elements stored in a BST must bee comparable in order to make an invariant and compare the elements to balance the tree.
Note : duplicate in the BST(binary search tree) is a choice of implementation it means there is no rule that says that a BTS should have no duplicates.
Why we use BST :
- they are used in other tree types implementation like Red Black trees, Splay,AVL … etc.
- maps is on of the things that uses BST.
- implementation of the binary heaps.
- used in other data structures like a Treap.
- used in compiler and calculators to generate the syntax tree.
binary search tree vs heaps :
- in heaps you can always access the max at constant time complexity (O(1)) but don’t guaranty that al the elements are sorted so the search for other values will generally take o(n) , where in the BST all values are sorted (from left to right) so the search time will always b O(logn) for any value . And the BST can be modified to keep track of the large value in it .
- if you want something that is based on priority and you want always to get the element with the highest priority go with heaps but if you want all the values sorted then go with BST.
Note : to know more about heaps you can read this article : https://gourineayoub-prf.medium.com/data-structure-and-complexity-part-4-priority-queues-and-heaps-7fffee1f1377
Complexity :
all operation (add/delete/search) in a BST will take in average O(log n) and at worst O(n) time complexity.
Operation :
adding :
to add(insert) an element we need to put it in the correct place so in the BST the invariant is respected , we do that by comparing the element we want to add and with the tree nodes and based on we decide we ether go to search for a spot in the right sub-tree, left sub-tree or create the new node , also the duplicates handling in case that you want duplicates in your BST.
To add an new element you can follow the next steps :
- compare the new element with the root and then based on the comparison decide to go to left sub-tree or right sub-tree.
- continue comparing and descending as long as the node that you are comparing with has children's (not a leaf).
- when you reach a leaf node you create a new node and add it to the leaf node to right or left based on the new element value compared with the leaf value.
- in case you want duplicates you have to consider reaching a value as the same one that you want to add also ass a stop point and then add node to right or left (does not matter because the have the same value).
- in case you don’t want duplicates then when you reach a value already exists then just end the process.
- this process will take O(log n) most of the time but there is some worst cases when all your values are inserted in a sorted way so each new value is bigger /smaller then the previous one your tree will be like a diagonal line and each time you have to reach the leaf node passing by all the tree elements to add the new node so it will be O(n) time complexity. This problem is solved by another trees called balanced BST.
in this example we will add the value 95 to the tree :
Removing:
to remove an element you have to first find the node where the value is and then remove the node.
in the searching part we start from the root and we compare the value that we want to remove with the node value and decide to go to left sub-tree or right sub-tree based in the result of comparison , if you reach the null it means the value does not exists.
after you reach the node that you want to delete you will have 4 cases :
- the node is a leaf node :
remove it normally because it will note effect the tree and it will still be a BST.
- the node has a left sub-tree and no right sub-tree or the reverse (no left sub-tree but has a right sub-tree) :
in this case the child of that node will be in place of the node that we want to remove.
- the case where the node has both left and right sub-tree:
in this case what we do first is we swap the node that we want to remove with the smallest value in the right sub-tree or the largest value in the left sub-tree after that we have a duplicate value since we swapped the value so we have to remove the previous node of the value that we swapped with and for that we need to do the same process recursively until we reach one of the 3 first case (leaf node / right sub-tree with no left sub-tree/ left sub-tree with no right sub-tree)., the search will be O(n) at worst.
searching :
we just go from the root and decide where to go based on the value that we are searching until we either reach it and return true or we reach null pointer and return false.,This process will be O(log n) (at worst o(n)).
tree traversals methods :
each method of traversal is implemented using resistivity because we have to go through all the nodes and we already know that all the nodes are connected through a parent-child relation that we can use to go over all the nodes from the root to the leafs , the different is when do we print the value after/before/between the recursive call for the parent node children.
We will have this tree as an example and see the result of each traversal method:
- Pre-order : for this case we print the value of the parent node before we do the recursive call for both left and right child's , in this way the start by printing the node then we go always left until we reach the leaf at the most left then we go up and we repeat the process.
- In-order: for this case we print the value of the parent node between the recursive call for both left and right childs , in this way the first node that will ge printed is the leaf at the most left then we go up. By the inrder traversal you can get the nodes printed in an increasing order .
Note :in In-Order the nodes will be printed in an increasing order because we are working with BST .
- Post-Order: for this case we print the value of the parent node after we do the recursive call for both left and right child’s , in this way the first value that gets printed is the leaf at the most left then the other child if exists then it goes up and repeat the same thing.
- Level order: simply in a level order what we do is we print all the values of the level starting form the root then we go to the next level until we reach the leafs. This is possible by doing a search called BFS using the queues data structure .
Note : to know more about queues you can read this article : https://gourineayoub-prf.medium.com/data-structures-and-complexity-part-3-linear-data-structures-stacks-and-queues-5e8e1e72a041
Implementation :
Basic BST implementation :
based on what we said in the first part each tree has a value plus two pointer one to left child and the other to the right child.
Add new Elements :
to add new elements we have to find the place to add then we add the new element .
we use another method to help us find the place to add our new element.
remove an Element :
to remove an element we have to respect the BST invariant and for that we have to see the case that we have after removing and we will face one of the four cases that we talked about earlier and based on that we decide what to do.
we used some methods to get the largest element in the left or the smallest in the right to swap with:
search for an element :
i think the search is the most simple method here because you just have to go through the elements and decide which sub-tree you go based on the value that you are searching for until you reach what you want or you hit the null pointer.
Conclusion :
- trees are used so much in modeling real world situations and in your development career you will find many different types of trees , make sure you understand the basic so you can understand all the other types.
- as you can see different problems require different data structures to get the best solution , even you can tweak this basic data structures if it’s best for your solution in terms of time and space complexity.
- this article talked only about few types there is more known and used types like AVL , black/red …etc that we will talk about in next articles.
note : pictures in this article are not original .
- hope you had a good reading and find some useful information , stay tuned for the next articles and a new data structure in this article series.
- you can check other previous articles if you want to start from the basic data structure :
- data structure definition and complexity basics : https://gourineayoub-prf.medium.com/data-structure-and-complexity-a-programmer-best-tow-weapons-part-1-introduction-a941cda2f557
- arrays and linked lists : https://gourineayoub-prf.medium.com/data-structures-and-complexity-part2-linear-data-part1-arrays-and-linked-lists-713a53360e64