Unit 1
Unit 1
The following are some fundamental terminologies used whenever the data structures are involved:
1. Data: We can define data as an elementary value or a collection of values. For example, the
Employee's name and ID are the data related to the Employee.
2. Data Items: A Single unit of value is known as Data Item.
3. Group Items: Data Items that have subordinate data items are known as Group Items. For
example, an employee's name can have a first, middle, and last name.
4. Elementary Items: Data Items that are unable to divide into sub-items are known as
Elementary Items. For example, the ID of an Employee.
Software
Values 1234 Stacey M. Hill Female
Developer
Entities with similar attributes form an Entity Set. Each attribute of an entity set has a range of
values, the set of all possible values that could be assigned to the specific attribute.
The term "information" is sometimes utilized for data with given attributes of meaningful or
processed data.
1. Field: A single elementary unit of information symbolizing the Attribute of an Entity is
known as Field.
2. Record: A collection of different data items are known as a Record. For example, if we talk
about the employee entity, then its name, id, address, and job title can be grouped to form
the record for the employee.
3. File: A collection of different Records of one entity type is known as a File. For example, if
there are 100 employees, there will be 25 records in the related file containing data about
each employee.
1. One-Dimensional Array: An Array with only one row of data elements is known as a One-
Dimensional Array. It is stored in ascending storage location.
2. Two-Dimensional Array: An Array consisting of multiple rows and columns of data
elements is called a Two-Dimensional Array. It is also known as a Matrix.
3. Multidimensional Array: We can define Multidimensional Array as an Array of Arrays.
Multidimensional Arrays are not bounded to two indices or two dimensions as they can
include as many indices are per the need.
2. LINKED LISTS:
A Linked List is another example of a linear data structure used to store a collection of data
elements dynamically.
Data elements in this data structure are represented by the Nodes, connected using links or
pointers.
Each node contains two fields, the information field consists of the actual data, and the
pointer field consists of the address of the subsequent nodes in the list.
The pointer of the last node of the linked list consists of a null pointer, as it points to
nothing. Unlike the Arrays, the user can dynamically adjust the size of a Linked List as per
the requirements.
1. Singly Linked List: A Singly Linked List is the most common type of Linked List. Each
node has data and a pointer field containing an address to the next node.
2. Doubly Linked List: A Doubly Linked List consists of an information field and two pointer
fields. The information field contains the data. The first pointer field contains an address of
the previous node, whereas another pointer field contains a reference to the next node. Thus,
we can go in both directions (backward as well as forward).
3. Circular Linked List: The Circular Linked List is similar to the Singly Linked List. The
only key difference is that the last node contains the address of the first node, forming a
circular loop in the Circular Linked List
3. STACK:
Stack is a linear data structure which follows a particular order in which the operations are
performed. The order may be LIFO(Last In First Out) or FILO(First In Last Out). In stack, all
insertion and deletion are permitted at only one end of the list.
Stack Operations:
push(): When this operation is performed, an element is inserted into the stack.
pop(): When this operation is performed, an element is removed from the top of the stack and is
returned.
top(): This operation will return the last inserted element that is at the top without removing it.
size(): This operation will return the size of the stack i.e. the total number of elements present in
the stack.
isEmpty(): This operation indicates whether the stack is empty or not.
4. QUEUE:
Like Stack, Queue is a linear structure which follows a particular order in which the operations are
performed. The order is First In First Out (FIFO). In the queue, items are inserted at one end and
deleted from the other end. A good example of the queue is any queue of consumers for a resource
where the consumer that came first is served first. The difference between stacks and queues is in
removing. In a stack we remove the item the most recently added; in a queue, we remove the item
the least recently added.
Queue Operations:
Enqueue(): Adds (or stores) an element to the end of the queue..
Dequeue(): Removal of elements from the queue.
Peek() or front(): Acquires the data element available at the front node of the queue without
deleting it.
rear(): This operation returns the element at the rear end without removing it.
isFull(): Validates if the queue is full.
isNull(): Checks if the queue is empty.
The following is the list of Non-Linear Data Structures that we generally use:
1. TREES
A Tree is a Non-Linear Data Structure and a hierarchy containing a collection of nodes such that
each node of the tree stores a value and a list of references to other nodes (the "children").
The Tree data structure is a specialized method to arrange and collect data in the computer to be
utilized more effectively. It contains a central node, structural nodes, and sub-nodes connected via
edges. We can also say that the tree data structure consists of roots, branches, and leaves connected.
Trees can be classified into different types:
1. Binary Tree: A Tree data structure where each parent node can have at most two children is
termed a Binary Tree.
2. Binary Search Tree: A Binary Search Tree is a Tree data structure where we can easily
maintain a sorted list of numbers.
3. AVL Tree: An AVL Tree is a self-balancing Binary Search Tree where each node
maintains extra information known as a Balance Factor whose value is either -1, 0, or +1.
4. B-Tree: A B-Tree is a special type of self-balancing Binary Search Tree where each node
consists of multiple keys and can have more than two children.
2. GRAPH:
A Graph is another example of a Non-Linear Data Structure comprising a finite number of nodes or
vertices and the edges connecting them. The Graphs are utilized to address problems of the real
world in which it denotes the problem area as a network such as social networks, circuit networks,
and telephone networks. For instance, the nodes or vertices of a Graph can represent a single user in
a telephone network, while the edges represent the link between them via telephone.
The Graph data structure, G is considered a mathematical structure comprised of a set of vertices, V
and a set of edges, E as shown below:
G = (V,E)
.
The above figure represents a Graph having seven vertices A, B, C, D, E, F, G, and ten edges [A,
B], [A, C], [B, C], [B, D], [B, E], [C, D], [D, E], [D, F], [E, F], and [E, G].
Depending upon the position of the vertices and edges, the Graphs can be classified into
different types:
1. Null Graph: A Graph with an empty set of edges is termed a Null Graph.
2. Trivial Graph: A Graph having only one vertex is termed a Trivial Graph.
3. Simple Graph: A Graph with neither self-loops nor multiple edges is known as a Simple
Graph.
4. Multi Graph: A Graph is said to be Multi if it consists of multiple edges but no self-loops.
5. Pseudo Graph: A Graph with self-loops and multiple edges is termed a Pseudo Graph.
6. Non-Directed Graph: A Graph consisting of non-directed edges is known as a Non-
Directed Graph.
7. Directed Graph: A Graph consisting of the directed edges between the vertices is known as
a Directed Graph.
8. Connected Graph: A Graph with at least a single path between every pair of vertices is
termed a Connected Graph.
9. Disconnected Graph: A Graph where there does not exist any path between at least one
pair of vertices is termed a Disconnected Graph.
10. Regular Graph: A Graph where all vertices have the same degree is termed a Regular
Graph.
11. Complete Graph: A Graph in which all vertices have an edge between every pair of
vertices is known as a Complete Graph.
12. Cycle Graph: A Graph is said to be a Cycle if it has at least three vertices and edges that
form a cycle.
13. Cyclic Graph: A Graph is said to be Cyclic if and only if at least one cycle exists.
14. Acyclic Graph: A Graph having zero cycles is termed an Acyclic Graph.
15. Finite Graph: A Graph with a finite number of vertices and edges is known as a Finite
Graph.
16. Infinite Graph: A Graph with an infinite number of vertices and edges is known as an
Infinite Graph.
17. Bipartite Graph: A Graph where the vertices can be divided into independent sets A and B,
and all the vertices of set A should only be connected to the vertices present in set B with
some edges is termed a Bipartite Graph.
18. Planar Graph: A Graph is said to be a Planar if we can draw it in a single plane with two
edges intersecting each other.
19. Euler Graph: A Graph is said to be Euler if and only if all the vertices are even degrees.
20. Hamiltonian Graph: A Connected Graph consisting of a Hamiltonian circuit is known as a
Hamiltonian Graph.
The user of data type does not need to know how that data type is implemented, for
example, we have been using Primitive values like int, float, char data types only with the
knowledge that these data type can operate and be performed on without any idea of how
they are implemented.
FEATURES OF ADT:
Abstract data types (ADTs) are a way of encapsulating data and operations on that data
into a single unit. Some of the key features of ADTs include:
Abstraction: The user does not need to know the implementation of the data structure only
essentials are provided.
Better Conceptualization: ADT gives us a better conceptualization of the real world.
Robust: The program is robust and has the ability to catch errors.
Encapsulation: ADTs hide the internal details of the data and provide a public interface for
users to interact with the data. This allows for easier maintenance and modification of the data
structure.
Data Abstraction: ADTs provide a level of abstraction from the implementation details of the
data. Users only need to know the operations that can be performed on the data, not how those
operations are implemented.
Data Structure Independence: ADTs can be implemented using different data structures,
such as arrays or linked lists, without affecting the functionality of the ADT.
Information Hiding: ADTs can protect the integrity of the data by allowing access only to
authorized users and operations. This helps prevent errors and misuse of the data.
Modularity: ADTs can be combined with other ADTs to form larger, more complex data
structures. This allows for greater flexibility and modularity in programming.
ADVANTAGES:
Encapsulation: ADTs provide a way to encapsulate data and operations into a single unit,
making it easier to manage and modify the data structure.
Abstraction: ADTs allow users to work with data structures without having to know the
implementation details, which can simplify programming and reduce errors.
Data Structure Independence: ADTs can be implemented using different data structures,
which can make it easier to adapt to changing needs and requirements.
Information Hiding: ADTs can protect the integrity of data by controlling access and
preventing unauthorized modifications.
Modularity: ADTs can be combined with other ADTs to form more complex data structures,
which can increase flexibility and modularity in programming.
DISADVANTAGES:
Overhead: Implementing ADTs can add overhead in terms of memory and processing, which
can affect performance.
Complexity: ADTs can be complex to implement, especially for large and complex data
structures.
Learning Curve: Using ADTs requires knowledge of their implementation and usage, which
can take time and effort to learn.
Limited Flexibility: Some ADTs may be limited in their functionality or may not be suitable
for all types of data structures.
Cost: Implementing ADTs may require additional resources and investment, which can
increase the cost of development.
Types of Abstract Data Types (ADTs)
1. List ADT
The data is generally stored in key sequence in a list which has a head structure consisting
of count, pointers and address of compare function needed to compare the data in the list.
The data node contains the pointer to a data structure and a self-referential pointer which
points to the next node in the list.
The List ADT Functions is given below:
get() – Return an element from the list at any given position.
insert() – Insert an element at any position of the list.
remove() – Remove the first occurrence of any element from a non-empty list.
removeAt() – Remove the element at a specified location from a non-empty list.
replace() – Replace an element at any position by another element.
size() – Return the number of elements in the list.
isEmpty() – Return true if the list is empty, otherwise return false.
isFull() – Return true if the list is full, otherwise return false.
2. Stack ADT
In Stack ADT Implementation instead of data being stored in each node, the pointer to data is
stored.
The program allocates memory for the data and address is passed to the stack ADT.
The head node and the data nodes are encapsulated in the ADT. The calling function can only
see the pointer to the stack.
The stack head structure also contains a pointer to top and count of number of entries
currently in stack.
push() – Insert an element at one end of the stack called top.
pop() – Remove and return the element at the top of the stack, if it is not empty.
peek() – Return the element at the top of the stack without removing it, if the stack is not
empty.
size() – Return the number of elements in the stack.
isEmpty() – Return true if the stack is empty, otherwise return false.
isFull() – Return true if the stack is full, otherwise return false.
3. Queue ADT
View of Queue
The queue abstract data type (ADT) follows the basic design of the stack abstract data type.
Each node contains a void pointer to the data and the link pointer to the next element in the
queue. The program’s responsibility is to allocate memory for storing the data.
enqueue() – Insert an element at the end of the queue.
dequeue() – Remove and return the first element of the queue, if the queue is not empty.
peek() – Return the element of the queue without removing it, if the queue is not empty.
size() – Return the number of elements in the queue.
isEmpty() – Return true if the queue is empty, otherwise return false.
isFull() – Return true if the queue is full, otherwise return false.
Clear and Unambiguous: The algorithm should be unambiguous. Each of its steps should be
clear in all aspects and must lead to only one meaning.
Well-Defined Inputs: If an algorithm says to take inputs, it should be well-defined inputs. It
may or may not take input.
Well-Defined Outputs: The algorithm must clearly define what output will be yielded and it
should be well-defined as well. It should produce at least 1 output.
Finite-ness: The algorithm must be finite, i.e. it should terminate after a finite time.
Feasible: The algorithm must be simple, generic, and practical, such that it can be executed
with the available resources. It must not contain some future technology or anything.
Language Independent: The Algorithm designed must be language-independent, i.e. it must
be just plain instructions that can be implemented in any language, and yet the output will be
the same, as expected.
Input: An algorithm has zero or more inputs. Each that contains a fundamental operator must
accept zero or more inputs.
Output: An algorithm produces at least one output. Every instruction that contains a
fundamental operator must accept zero or more inputs.
Definiteness: All instructions in an algorithm must be unambiguous, precise, and easy to
interpret. By referring to any of the instructions in an algorithm one can clearly understand
what is to be done. Every fundamental operator in instruction must be defined without any
ambiguity.
Finiteness: An algorithm must terminate after a finite number of steps in all test cases. Every
instruction which contains a fundamental operator must be terminated within a finite amount
of time. Infinite loops or recursive functions without base conditions do not possess finiteness.
Effectiveness: An algorithm must be developed by using very basic, simple, and feasible
operations so that one can trace it out by using just paper and pencil.
Properties of Algorithm:
It should terminate after a finite time.
It should produce at least one output.
It should take zero or more input.
It should be deterministic means giving the same output for the same input case.
Every step in the algorithm must be effective i.e. every step should do some work.
Examples of Algorithms
Below is some example of algorithms:
Sorting algorithms: Merge sort, Quick sort, Heap sort
Searching algorithms: Linear search, Binary search, Hashing
Graph algorithms: Dijkstra's algorithm, Prim's algorithm, Floyd-Warshall algorithm
String matching algorithms: Knuth-Morris-Pratt algorithm, Boyer-Moore algorithm
Theta notation
Examples :
{ 100 , log (2000) , 10^4 } belongs to Θ(1)
{ (n/4) , (2n+3) , (n/100 + log(n)) } belongs to Θ(n)
{ (n^2+n) , (2n^2) , (n^2+log(n))} belongs to Θ( n2)
2. Big-O Notation (O-notation):
Big-O notation represents the upper bound of the running time of an algorithm.
Therefore, it gives the worst-case complexity of an algorithm.
It is the most widely used notation for Asymptotic analysis.
It specifies the upper bound of a function.
The maximum time required by an algorithm or the worst-case time complexity.
It returns the highest possible output value(big-O) for a given input.
Big-O(Worst Case) It is defined as the condition that allows an algorithm to
complete statement execution in the longest amount of time possible.
If f(n) describes the running time of an algorithm, f(n) is O(g(n)) if there exist a positive constant
C and n0 such that, 0 ≤ f(n) ≤ cg(n) for all n ≥ n0
It returns the highest possible output value (big-O)for a given input.
The execution time serves as an upper bound on the algorithm’s time complexity.
Mathematical Representation of Big-O Notation:
O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }
For example, Consider the case of Insertion Sort. It takes linear time in the best case and
quadratic time in the worst case. We can safely say that the time complexity of the Insertion sort
is O(n2).
Note: O(n2) also covers linear time.
Examples :
{ 100 , log (2000) , 10^4 } belongs to O(1)
U { (n/4) , (2n+3) , (n/100 + log(n)) } belongs to O(n)
U { (n^2+n) , (2n^2) , (n^2+log(n))} belongs to O( n^2)
3. Omega Notation (Ω-Notation):
Omega notation represents the lower bound of the running time of an algorithm. Thus, it
provides the best case complexity of an algorithm.
The execution time serves as a lower bound on the algorithm’s time complexity.
It is defined as the condition that allows an algorithm to complete statement execution in
the shortest amount of time.
Let g and f be the function from the set of natural numbers to itself. The function f is said
to be Ω(g), if there is a constant c > 0 and a natural number n0 such that c*g(n) ≤ f(n) for
all n ≥ n0
EFFICIENCY OF AN ALGORITHM:
Computer resources are limited that should be utilized efficiently. The efficiency of an algorithm is
defined as the number of computational resources used by the algorithm. An algorithm must be
analyzed to determine its resource usage. The efficiency of an algorithm can be measured based on
the usage of different resources.
For maximum efficiency of algorithm we wish to minimize resource usage. The important resources
such as time and space complexity cannot be compared directly, so time and space complexity
could be considered for an algorithmic efficiency.
The efficiency of an algorithm depends on how efficiently it uses time and memory space.
The time efficiency of an algorithm is measured by different factors. For example, write a program
for a defined algorithm, execute it by using any programming language, and measure the total time
it takes to run.
The execution time that you measure in this case would depend on a number of factors such
as:
Space-Time tradeoff
A space-time or time-memory tradeoff is a way of solving in less time by using more storage
space or by solving a given algorithm in very little space by spending more time.
To solve a given programming problem, many different algorithms may be used. Some of these
algorithms may be extremely time-efficient and others extremely space-efficient.
Time/space trade off refers to a situation where you can reduce the use of memory at the cost of
slower program execution, or reduce the running time at the cost of increased memory usage.
Asymptotic Notations
Asymptotic Notations are languages that uses meaningful statements about time and space
complexity. The following three asymptotic notations are mostly used to represent time complexity
of algorithms:
(i) Big O
(ii) Big Ω
Big Omega is the reverse Big O, if Bi O is used to describe the upper bound (worst - case) of a
asymptotic function, Big Omega is used to describe the lower bound (best-case).
(iii) Big Θ
When an algorithm has a complexity with lower bound = upper bound, say that an algorithm has a
complexity O (n log n) and (n log n), it’s actually has the complexity Θ (n log n), which means the
running time of that algorithm always falls in n log n in the best-case and worst-case.
BEST, WORST, AND AVERAGE CASE EFFICIENCY
1. Best-case complexity (O(best)): This represents the minimum time required for an
algorithm to complete when given the optimal input. It denotes an algorithm operating at its
peak efficiency under ideal circumstances.
2. Worst-case complexity (O(worst)): This denotes the maximum time an algorithm will take
to finish for any given input. It represents the scenario where the algorithm encounters the
most unfavourable input.
3. Average-case complexity (O(average)): This estimates the typical running time of an
algorithm when averaged over all possible inputs. It provides a more realistic evaluation of
an algorithm's performance.
int main() {
vector<int> arr = {1, 10, 30, 15};
int x = 30;
cout << search(arr, x);
return 0;
}
Output
2
Best Case: Constant Time irrespective of input size. This will take place if the element to be
searched is on the first index of the given list. So, the number of comparisons, in this case, is 1.
Average Case: Linear Time, This will take place if the element to be searched is at the middle
index (in an average search) of the given list.
Worst Case: The element to be searched is not present in the list
The average case efficiency of an algorithm can be obtained by finding the average number of
comparisons as given below:
number of comparison = n
1. SPACE COMPLEXITY:
The space complexity of an algorithm refers to the amount of memory required by the algorithm
to store the variables and get the result. This can be for inputs, temporary operations, or outputs.
How to calculate Space Complexity?
The space complexity of an algorithm is calculated by determining the following 2 components:
Fixed Part: This refers to the space that is required by the algorithm. For example, input
variables, output variables, program size, etc.
Variable Part: This refers to the space that can be different based on the implementation of
the algorithm. For example, temporary variables, dynamic memory allocation, recursion stack
space, etc.
Therefore Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed
part and S(I) is the variable part of the algorithm, which depends on instance characteristic I.
Example : Addition of two scalar variables
Algorithm ADD SCALAR(A, B)
//Description: Perform arithmetic addition of two numbers
//Input: Two scalar variables A and B
//Output: variable C, which holds the addition of A and B
C <— A+B
return C
The addition of two scalar numbers requires one extra memory location to hold the result.
Thus the space complexity of this algorithm is constant, hence S(n) = O(1).
2.TIME COMPLEXITY:
The time complexity of an algorithm refers to the amount of time required by the algorithm to
execute and get the result. This can be for normal operations, conditional if-else statements, loop
statements, etc.
How to Calculate, Time Complexity?
The time complexity of an algorithm is also calculated by determining the following 2
components:
Constant time part: Any instruction that is executed just once comes in this part. For
example, input, output, if-else, switch, arithmetic operations, etc.
Variable Time Part: Any instruction that is executed more than once, say n times, comes in
this part. For example, loops, recursion, etc.
Therefore Time complexity T(P) of any algorithm P is T(P) = C + TP(I), where C is the
constant time part and TP(I) is the variable part of the algorithm, which depends on the
instance characteristic I.
Example: In the algorithm of Linear Search above, the time complexity is calculated as
follows:
Step 1: –Constant Time
Step 2: — Variable Time (Taking n inputs)
Step 3: –Variable Time (Till the length of the Array (n) or the index of the found element)
Step 4: –Constant Time
Step 5: –Constant Time
Step 6: –Constant Time
Hence, T(P) = 1 + n + n(1 + 1) + 1 = 2 + 3n, which can be said as T(n).
int main() {
int a = 5, b = 6;
cout<<sum(a,b)<<endl;
return 0;
}
Time Complexity:
The above code will take 2 units of time(constant):
o one for arithmetic operations and
o one for return. (as per the above conventions).
Therefore total cost to perform sum operation (Tsum) = 1 + 1 = 2
Time Complexity = O(2) = O(1), since 2 is constant