Tree – Data Structure
A tree data structure is a hierarchical structure that is used to represent and
organize data in a way that is easy to navigate and search. It is a collection
of nodes that are connected by edges and has a hierarchical relationship
between the nodes.
The topmost node of the tree is called the root, and the nodes below it are
called the child nodes. Each node can have multiple child nodes, and these
child nodes can also have their own child nodes, forming a recursive
structure.
This data structure is a specialized method to organize and store data in the
computer to be used more effectively. It consists of a central node, structural
nodes, and sub-nodes, which are connected via edges. We can also say that
tree data structure has roots, branches, and leaves connected with one
another.
Introduction to Tree – Data Structure and Algorithm Tutorials
Basic Terminologies In Tree Data Structure:
Parent Node: The node which is a predecessor of a node is called the
parent node of that node. {B} is the parent node of {D, E}.
Child Node: The node which is the immediate successor of a node is called
the child node of that node. Examples: {D, E} are the child nodes of {B}.
Root Node: The topmost node of a tree or the node which does not have
any parent node is called the root node. {A } is the root node of the tree. A
non-empty tree must contain exactly one root node and exactly one path
from the root to all other nodes of the tree.
Leaf Node or External Node: The nodes which do not have any child nodes
are called leaf nodes. {K, L, M, N, O, P} are the leaf nodes of the tree.
Ancestor of a Node: Any predecessor nodes on the path of the root to that
node are called Ancestors of that node. {A,B} are the ancestor nodes of
the node {E}
Descendant: Any successor node on the path from the leaf node to that
node. {E,I} are the descendants of the node {B}.
Sibling: Children of the same parent node are called siblings. {D,E} are
called siblings.
Level of a node: The count of edges on the path from the root node to that
node. The root node has level 0.
Internal node: A node with at least one child is called Internal Node.
Neighbour of a Node: Parent or child nodes of that node are called
neighbors of that node.
Subtree: Any node of the tree along with its descendant.
Representation of Tree Data Structure:
A tree consists of a root, and zero or more subtrees T 1, T2, … , Tk such that
there is an edge from the root of the tree to the root of each subtree.
Properties of Tree Data Structure
In computer science, a tree is a widely used data structure that
simulates a hierarchical tree structure with a set of linked nodes. A tree
data structure has the following properties:
1. Recursive data structure
2. Number of edges
3. Depth of node x
4. Height of node x
Read on to learn more about each of these properties of tree data
structure in detail!
1. Recursive Data Structure: A tree is a recursive data structure because
it has a root node, and every node has zero or more child nodes. The root
node is the topmost node in the tree, and the child nodes are located
below the root node. If each node in the tree has zero or more child nodes,
then the tree is said to be an n-aryl tree.
2. Number of Edges: The number of edges in a tree is always one less than
the number of nodes. This is because there is always one fewer edge than
there are nodes in any given path from the root to any leaf node.
3. Depth of Node x: The depth of a node is defined as the length of the
shortest path from the root to that node. In other words, it is simply the
number of edges on the path from the root to that particular node.
4. Height of Node x: The height of a node is expressed as the length of the
longest path from the node to any leaf node. In other words, it is simply
the number of edges on the path from that particular node to the deepest
leaf node
Applications
In computer science, the tree data structure can be used to store
hierarchical data. A tree traversal is a process of visiting each node in a
tree. This can be done in different ways like pre-order, post-order, or in-
order. Trees are also used to store data that naturally have hierarchical
relationships, like the file system on our computers. Besides that, trees
are also used in several applications like heaps, tries, and suffix trees.
Let's take a look at some of these applications:
1) Storing Naturally Hierarchical Data
One of the main applications of tree data structure is to store
hierarchical data. A lot of real-world data falls into this category. For
instance, think about the file system on your computer. The files and
folders are stored in a hierarchical fashion with a root folder (usually
denoted by /). Each subfolder can further have more subfolders and so
on. So when you want to store such data, a tree data structure is the
most intuitive way to do it.
2) Organize Data
Trees can also be used as an organizational tool. For instance, a family
tree is one such example where family relationships are represented
using a tree-like structure. Similarly, trees can also be used to represent
geographical features like states and cities in the USA or Countries and
Continents in the world etc.
3) Trie
Trie is an efficient information retrieval data structure that is based on
the key concept of words being prefixes of other words. It's also known
as a radix or prefix tree. A Trie has three main properties – keys have
consistent length.
This makes it suitable for dictionary operations like autocomplete or
spell check etc., which have become very popular these days with
internet users all over the world.
4) Heap
A heap is a special type of binary tree where every parent node has
either two child nodes or no child nodes, and every node satisfies one
heap property – min heap or max heap
property software programming. The Min heap property states that
every parent node must have a value less than or equal to its child
node, while the max heap property specifies that the parent node's
value must be greater than or equal to the value of its child nodes (in
the case of two children).
So depending upon which property we need to enforce upon our nodes,
we call it min heap or max heap accordingly. Since heaps are complete
binary trees, they provide a good performance guarantee for insertion
and deletion operation with the time complexity of O(logN), where N
number of elements currently present inside our heap data structure.
Heaps also play an important role behind recommendation algorithms
like the "People Also Watched" section on Netflix or Amazon Prime.
5) B-Tree and T-Tree
B-trees and T-trees are two types of trees in the data structure that are
used to efficiently store large amounts of data. These trees are often
used in databases because they allow for quick insertion and deletion of
records while still maintaining fast access times.
6) Routing Table
A routing table maintains information about routes to particular network
destinations, possibly via multiple network links/routers, thereby
allowing efficient route discovery based on partial match instead of
comparison against all known routes leading up to the destination,
reducing routing overhead considerably, especially in high volume traffic
networks.
Binary Tree Data Structure
Binary Tree is defined as a tree data structure where each node has at most 2
children. Since each element in a binary tree can have only 2 children, we typically
name them the left and right child.
Binary Tree Representation
A Binary tree is represented by a pointer to the topmost node (commonly
known as the “root”) of the tree. If the tree is empty, then the value of the
root is NULL. Each node of a Binary Tree contains the following parts:
1. Data
2. Pointer to left child
3. Pointer to right child
Basic Operation On Binary Tree:
Inserting an element.
Removing an element.
Searching for an element.
Traversing the tree.
Auxiliary Operation On Binary Tree:
Finding the height of the tree
Find the level of a node of the tree
Finding the size of the entire tree.
Binary Tree (Array implementation)
Given an array that represents a tree in such a way that array indexes are
values in tree nodes and array values give the parent node of that particular
index (or node). The value of the root node index would always be -1 as
there is no parent for root. Construct the standard linked representation of
given Binary Tree from this given representation. Do refer in order to
understand how to construct binary tree from given parent array
representation.
Ways to represent:
Trees can be represented in two ways as listed below:
1. Dynamic Node Representation (Linked Representation).
2. Array Representation (Sequential Representation).
Now, we are going to talk about the sequential representation of the trees.
In order to represent a tree using an array, the numbering of nodes can start
either from 0–(n-1) consider the below illustration as follows:
Illustration:
A(0)
/ \
B(1) C(2)
/ \ \
D(3) E(4) F(6)
OR,
A(1)
/ \
B(2) C(3)
/ \ \
D(4) E(5) F(7)
Procedure:
Note: father, left_son and right_son are the values of indices of the array.
Case 1: (0—n-1)
if (say)father=p;
then left_son=(2*p)+1;
and right_son=(2*p)+2;
Case 2: 1—n
if (say)father=p;
then left_son=(2*p);
and right_son=(2*p)+1;
Difference between B tree and B+ tree
B-Tree: B-Tree is known as a self-balancing tree as its nodes are sorted in
the inorder traversal. In B-tree, a node can have more than two children. B-
tree has a height of logM N (Where ‘M’ is the order of tree and N is the
number of nodes). And the height is adjusted automatically at each update.
In the B-tree data is sorted in a specific order, with the lowest value on the
left and the highest value on the right. To insert the data or key in B-tree is
more complicated than a binary tree. Some conditions must be held by the
B-Tree:
All the leaf nodes of the B-tree must be at the same level.
Above the leaf nodes of the B-tree, there should be no empty sub-trees.
B- tree’s height should lie as low as possible.
B+ Tree B+ tree eliminates the drawback B-tree used for indexing by storing
data pointers only at the leaf nodes of the tree. Thus, the structure of leaf
nodes of a B+ tree is quite different from the structure of internal nodes of
the B tree. It may be noted here that, since data pointers are present only at
the leaf nodes, the leaf nodes must necessarily store all the key values along
with their corresponding data pointers to the disk file block, to access them.
Moreover, the leaf nodes are linked to providing ordered access to the
records. The leaf nodes, therefore form the first level of the index, with the
internal nodes forming the other levels of a multilevel index. Some of the key
values of the leaf nodes also appear in the internal nodes, to simply act as a
medium to control the searching of a record.
Basis of
Comparison B tree B+ tree
All internal and leaf nodes have Only leaf nodes have
Pointers
data pointers data pointers
All keys are at leaf
Since all keys are not available at
nodes, hence search is
Search leaf, search often takes more
faster and more
time.
accurate.
Duplicate of keys are
Redundant No duplicate of keys is maintained and all
Keys maintained in the tree. nodes are present at the
leaf.
Insertion is easier and
Insertion takes more time and it
Insertion the results are always
is not predictable sometimes.
the same.
Deletion of the internal node is Deletion of any node is
Deletion very complex and the tree has to easy because all node
undergo a lot of transformations. are found at leaf.
Basis of
Comparison B tree B+ tree
Leaf nodes are not stored as Leaf nodes are stored
Leaf Nodes
structural linked list. as structural linked list.
Sequential access is
Sequential access to nodes is not
Access possible just like linked
possible
list
Height is lesser than B
For a particular number nodes
Height tree for the same
height is larger
number of nodes
B+ Trees used in
B-Trees used in Databases,
Application Multilevel Indexing,
Search engines
Database indexing
Each intermediary node
Number of Number of nodes at any
can have n/2 to n
Nodes intermediary level ‘l’ is 2 l.
children.