Unit 1
Unit 1
Data Structures
 Data structure means organizing the data in the memory.
 Data Structure can be defined as the group of data elements which provides an efficient way of storing
  and organizing data in the computer so that it can be used efficiently.
Data items that are divided into sub-items are called Group items. Ex: An Employee Name may be
divided into three subitems- first name, middle name, and last name.
Data items that are not able to divide into sub-items are called Elementary items. Ex: SSN
Entity: An entity is something that has certain attributes or properties which may be assigned values.
       The values may be either numeric or non-numeric.
Entities with similar attributes form an entity set. Each attribute of an entity set has a range of values, the
set of all possible values that could be assigned to the particular attribute. The term “information” is
sometimes used for data with given attributes, of, in other words meaningful or processed data.
                                                         1
Processor speed: To handle very large amount of data, high speed processing is required, but as the
data is growing day by day to the billions of files per entity, processor may fail to deal with that much
amount of data.
Data Search: Consider an inventory size of 106 items in a store; If our application needs to search for a
particular item, it needs to traverse 106 items every time, results in slowing down the search process.
Multiple requests: If thousands of users are searching the data simultaneously on a web server, then
there are the chances that a very large server can be failed during that process.
Reusability: Data structures are reusable, i.e. once we have implemented a particular data structure, we
can use it at any other place. Implementation of data structures can be compiled into libraries which can
be used by different clients.
Abstraction: Data structure is specified by the ADT which provides a level of abstraction. The client
program uses the data structure through interface only, without getting into the implementation details.
                                                     2
Difference between Sequential Organization and Linked Organization:
3. Allows for fast insertions and deletions Allows for fast traversal and access
       Can be used for implementing data              Can be used for implementing data
5.
       structures like linked lists and trees         structures like arrays and stacks
7. Can be used for dynamic data structures Suitable for static data structures
       Pointers may be pointing to null or non-       All elements are stored in contiguous
10.
       existent memory in case of broken links        memory
                                                  3
Types of Data Structure
Basically, data structures are divided into two categories:
      Primitive data structure is a fundamental type of data structure that stores the data of only one type
       (can hold a single type of value).
   o   Integer: The integer data type contains the numeric values. It contains the whole numbers that can
       be either negative or positive. When the range of integer data type is not large enough then in that
       case, we can use long.
   o   Float: The float is a data type that can hold decimal values. When the precision of decimal value
       increases then the Double data type is used.
   o   Boolean: It is a data type that can hold either a True or a False value. It is mainly used for checking
       the condition.
   o   Character: It is a data type that can hold a single character value both uppercase and lowercase
       such as 'A' or 'a'.
                                                      4
      The non-primitive data structure is derived from the primitive data structure and that is more
       complicated to create. It is responsible for structuring of a group of homogeneous and
       heterogeneous data type. This is used to process large amount of data.
      Non-primitive data structure is a type of data structure that can store the data of more than one
       type (Can hold multiple values).
      In case of linear data structure, the data is stored in a sequence, i.e., one data after another data.
       When we access the data from the linear data structure, we just need to start from one place and
       will find other data in a sequence.
      Linear data structures are easy to implement because computer memory is arranged in a linear
       way. Its examples are array, stack, queue, linked list, etc.
                      Linear                                               Non-linear
In a linear data structure, data elements are
arranged in a linear order where each and In a non-linear data structure, data elements
every element is attached to its previous and are attached in hierarchically manner.
next adjacent.
                                                    Whereas in non-linear data structure, multiple
In linear data structure, single level is involved.
                                                    levels are involved.
Its implementation is easy in comparison to While its implementation is complex in
non-linear data structure.                          comparison to linear data structure.
                                                    While in non-linear data structure, data
In linear data structure, data elements can be
                                                    elements can’t be traversed in a single run
traversed in a single run only.
                                                    only.
In a linear data structure, memory is not While in a non-linear data structure, memory is
utilized in an efficient way.                       utilized in an efficient way.
Its examples are: array, stack, queue, linked
                                                    While its examples are: trees and graphs.
list, etc.
Applications of linear data structures are Applications of non-linear data structures are
mainly in application software development.         in Artificial Intelligence and image processing.
                                                      5
    1. Array Data Structure
     An Array is a collection of items of same data type stored at contiguous memory locations.
Representation of Array
Types of arrays:
There are majorly two types of arrays:
   One-dimensional array (1-D arrays): You can imagine a 1d array as a row, where elements
    are stored one after another.
                                                6
Three-dimensional array: A 3-D Multidimensional array contains three dimensions, so it can
be considered an array of two-dimensional arrays.
   Arrays allow random access to elements. This makes accessing elements by position faster.
   Arrays have better cache locality which makes a pretty big difference in performance.
   Arrays represent multiple data items of the same type using a single name.
   Arrays store multiple data of similar types with the same name.
   Array data structures are used to implement the other data structures like linked lists, stacks,
    queues, trees, graphs, etc.
Disadvantages of Array:
                                                  7
   As arrays have a fixed size, once the memory is allocated to them, it cannot be increased or
    decreased, making it impossible to store extra data if required. An array of fixed size is
    referred to as a static array.
   Allocating less memory than required to an array leads to loss of data.
    An array is homogeneous in nature so, a single array cannot store values of different data
    types.
   Arrays store data in contiguous memory locations, which makes deletion and insertion very
    difficult to implement. This problem is overcome by implementing linked lists, which allow
    elements to be accessed sequentially.
Application of Array:
   They are used in the implementation of other data structures such as array lists, heaps, hash
    tables, vectors, and matrices.
   Database records are usually implemented as arrays.
   It is used in lookup tables by computer.
   It is used for different sorting algorithms such as bubble sort insertion sort, merge sort, and
    quick sort.
     Stack: Stack is a linear list in which insertion and deletions are allowed only at one end,
      called top.
     A stack is an abstract data type (ADT), can be implemented in most of the programming
      languages. It is named as stack because it behaves like a real-world stack, for example: -
      piles of plates or deck of cards etc.
                                                  8
Various Applications of Stack in Data Structure:
  o   Evaluation of Arithmetic Expressions: An arithmetic expression consists of operands and
      operators.
  o   Backtracking: Backtracking is another application of Stack. It is a recursive algorithm that is used
      for solving the optimization problem.
  o   Delimiter Checking: delimiter checking, i.e., parsing that involves analyzing a source program
      syntactically. It is also called parenthesis checking. When the compiler translates a source program
      written in some programming language such as C, C++ to a machine language, it parses the
      program into multiple individual parts such as variable names, keywords, etc. By scanning from left
      to right.
  o   Reverse a Data: To reverse a given set of data, we need to reorder the data so that the first and
      last elements are exchanged, the second and second last element are exchanged, and so on for
      all other elements.
  o   Processing Function Calls
   Queue: Queue is a linear list in which elements can be inserted only at one end
    called rear and deleted only at the other end called front.
   It is an abstract data structure, similar to stack. Queue is opened at both end therefore it
    follows First-In-First-Out (FIFO) methodology for storing the data items.
     Managing requests on a single shared resource such as CPU scheduling and disk
      scheduling
     Queues are used in asynchronous transfer of data (where data is not being transferred at
      the same rate between two processes) for eg. pipes, file IO, sockets.
     Handling hardware or real-time systems interrupts
                                                    9
     Handling website traffic
     Routers and switches in networking
     Queues are used to maintain the play list in media players in order to add and remove the
      songs from the play-list.
     Queues are used as buffers in most of the applications like MP3 media player, CD player,
      etc.
  4. Linked List
   Linked list is a linear data structure which is used to maintain a list in the memory. It can be
    seen as the collection of nodes stored at non-contiguous memory locations. Each node of
    the list contains a pointer to its adjacent node.
   Implementing advanced data structures: We can implement data structures like stacks
    and queues with the help of a linked list.
   Graph Adjacency list Representation: Linked List helps in storing the adjacent vertices of
    a graph in adjacency list representation.
                                                10
Non-linear data structure
    Data structures where data elements are not arranged sequentially or linearly are
       called non-linear data structures. In a non-linear data structure, single level is not
       involved. Therefore, we can’t traverse all the elements in single run only. Non-linear data
       structures are not easy to implement in comparison to linear data structure.
   1. Trees:
    Trees are multilevel data structures with a hierarchical relationship among its elements
     known as nodes. The bottommost nodes in the hierarchy are called leaf node while the
     topmost node is called root node. Each node contains pointers to point adjacent nodes.
    Tree data structure is based on the parent-child relationship among the nodes. Each node
     in the tree can have more than one children except the leaf nodes whereas each node can
     have atmost one parent except the root node.
      Store hierarchical data, like folder structure, organization structure, XML/HTML data.
      Binary Search Tree is a tree that allows fast search, insert, delete on a sorted data. It also allows
       finding closest item
      Heap is a tree data structure which is implemented using arrays and used to implement priority
       queues.
                                                     11
     B-Tree and B+ Tree : They are used to implement indexing in databases.
     Syntax Tree: Scanning, parsing , generation of code and evaluation of arithmetic expressions in
      Compiler design.
  2. Graph:
   Graphs can be defined as the pictorial representation of the set of elements (represented
    by vertices) connected by the links known as edges. A graph is different from tree in the
    sense that a graph can have cycle while the tree can not have the one.
     Used in Google maps for building transportation systems. In google maps, the intersection of two
      or more roads represents the node while the road connecting two nodes represents an edge.
      Google maps algorithm uses graphs to calculate the shortest distance between two vertices.
     Operating Systems use Resource Allocation Graph where every process and resource acts as a
      node while edges are drawn from resources to the allocated process.
 Used in the World Wide Web where the web pages represent the nodes.
     Blockchains also use graphs. The nodes are blocks that store many transactions while the edges
      connect subsequent blocks.
                                                  12
Data Structure operations
 1. Traversing:
   Every data structure contains the set of data elements. Traversing the data structure
    means visiting each element of the data structure in order to perform some specific
    operation like searching or sorting.
   To visit or process each data exactly once in the data structure
2. Insertion:
   Insertion can be defined as the process of adding the elements to the data structure at any
    location.
   To add a new value to the data structure
   If the size of data structure is n then we can only insert n-1 data elements into it.
3. Deletion:
   The process of removing an element from the data structure is called Deletion. We can
    delete an element from the data structure at any random location.
   To remove a value from the data structure
   If we try to delete an element from an empty data structure then underflow occurs.
4. Searching:
   The process of finding the location of an element within the data structure is called
    Searching. There are two algorithms to perform searching, Linear Search and Binary
    Search.
   To search for a particular value in the data structure for the given key value.
  5. Sorting:
   The process of arranging the data structure in a specific order is known as Sorting. There
     are many algorithms that can be used to perform sorting, for example, insertion sort,
     selection sort, bubble sort, etc.
   To arrange the values in the data structure in a particular order.
  6. Merging:
   When two lists List A and List B of size M and N respectively, of similar type of elements,
     clubbed or joined to produce the third list, List C of size (M+N), then this process is called
     merging.
   To join two same type of data structure values
                                               13
Algorithm
    An algorithm is defined as a set of rules or a step-by-step procedure that are to be
     executed in a specific order to get the desired output.
    The formal definition of an algorithm is a finite set of steps carried out in a specific time for
     specific problem-solving operations, especially by a Computer. So basically, it is not the
     complete code; rather it is the logic that can be implemented to solve a particular problem.
Note: Pseudocode is nothing but the description of the logic or the code in plain, simple
language.
Characteristics of an Algorithm
   o   Input: An algorithm has some input values. We can pass 0 or some input value to an algorithm.
   o   Output: We will get 1 or more output at the end of an algorithm.
   o   Unambiguity: An algorithm should be unambiguous which means that the instructions in an
       algorithm should be clear and simple.
   o   Finiteness: An algorithm should have finiteness. Here, finiteness means that the algorithm should
       contain a limited number of instructions, i.e., the instructions should be countable.
   o   Effectiveness: An algorithm should be effective as each instruction in an algorithm affects the
       overall process.
   o   Language independent: An algorithm must be language-independent so that the instructions in
       an algorithm can be implemented in any of the languages with the same output.
Need of Algorithms
We need algorithms because of the following reasons:
                                                      14
   o   Scalability: It helps us to understand the scalability. When we have a big real-world
       problem, we need to scale it down into small-small steps to easily analyze the problem.
   o   Performance: The real-world is not easily broken down into smaller steps. If the problem
       can be easily broken into smaller steps means that the problem is feasible.
Example of an algorithm
       The following are the steps required to add two numbers entered by the user:
   o   Step 1: Start
   o   Step 2: Declare three variables a, b, and sum.
   o   Step 3: Enter the values of a and b.
   o   Step 4: Add the values of a and b and store the result in the sum variable, i.e., sum=a+b.
   o   Step 5: Print sum
   o   Step 6: Stop
Algorithm Analysis
Algorithm analysis is an important part of computational complexity theory, which provides
theoretical estimation for the required resources of an algorithm to solve a specific computational
problem. Analysis of algorithms is the determination of the amount of time and space resources
required to execute it.
   1. Best case
   2. Worst case
   3. Average case
Best case: Define the input for which algorithm takes less time or minimum time. In the best case
calculate the lower bound of an algorithm. Example: In the linear search when search data is
present at the first location of large data then the best case occurs.
Worst Case: Define the input for which algorithm takes a long time or maximum time. In the
worst calculate the upper bound of an algorithm. Example: In the linear search when search data
is not present at all then the worst case occurs.
Average case: In the average case take all random inputs and calculate the computation time for
all inputs. And then we divide it by the total number of inputs.
Let's take the example of searching for an item sequentially within a list of unsorted items.
    If the searched item is always the first one, then complexity is O(1);
                                                    15
    if it's always the last one, then complexity is O(n).
    We can also calculate the average complexity, which will turn out to be O(n).The
     term "complexity" normally refers to worst-case complexity.
The analysis of an algorithm is done base on its efficiency. The two important terms used for the
analysis of an algorithm is “Priori Analysis” and “Posterior Analysis”.
Priori Analysis: It is done before the actual implementation of the algorithm when the algorithm
is written in the general theoretical language. In this, the efficiency of the algorithm is calculated
base on its complexity. It is just an approximate analysis.
Posterior Analysis: It is done after the actual implementation and execution of the algorithm
using any programming language like C, C++, Java, or Python. It is the actual analysis in which
space and time complexity of the algorithm is calculated more correctly.
Algorithm Complexity
    Complexity in algorithms refers to the amount of resources (such as time or memory)
     required to solve a problem or perform a task
It is difficult to calculate the exact space and running time of all algorithms because it depends on
several factors like which language you are using, which compiler, or which machine you are
using for the implementation of the algorithm.
Time Complexity:
Time complexity is the amount of time required by an algorithm for the execution and providing
the desired output.
The time complexity of an algorithm is the amount of time required to complete the execution. The
time complexity of an algorithm is denoted by the big O notation. Here, big O notation is the
asymptotic notation to represent the time complexity. The time complexity is mainly calculated by
counting the number of steps to finish the execution.
                                                  16
An algorithm is said to have a constant time complexity when the time taken by the algorithm
remains constant and does not depend upon the number of inputs.
main()
{
int a=5;
a=a+5;
printf(“%d”,a);
}
The time complexity of an algorithm is said to be linear when the number of operations performed
is directly proportional to the size of the input.
int a[100],n;
printf(“Enter n value”);
scanf(“%d”,&n);
for(i=0;i<n;i++)
    scanf(“%d,&a[i]);
An algorithm has quadratic time complexity if the time to execute it is proportional to the square of
the input size.
    for(i=0;i<r;i++)
     {
        for(j=0;j<c;j++)
          {
       scanf(“%d”,&a[i][j]);
        }
}
                                                 17
An algorithm is said to have a logarithmic time complexity when it reduces the size of the input
data in each step by 50 percent or the number of operations gets reduced by 50 percent as the
input size increases. This is because the algorithm divides the working area in half with each
iteration. this type of time complexity is to search for strategies like divide and conquer directly
affecting the number of operations.
Ex.
       Mid=(low high)/2
       total time=n/2
It should be quite clear from the notation itself, it is a combination of Linear and Logarithmic Time
Complexities. The time taken by this is slightly less than the linear time complexity but not as slow
as the quadratic time complexity.
Heap Sort and Quick Sort are some examples of Linearithmic Time Complexity
When we see exponential growth in the number of operations performed by the algorithm with the
increase in the size of the input, we can say that that algorithm has exponential time complexity.
The best example to explain this kind of time complexity is the Fibonacci Series.
Space Complexity:
    Space complexity refers to the amount of memory used by an algorithm to solve a problem.
     It measures the amount of memory space required to store the data and structures used by
     an algorithm.
An algorithm's space complexity is the amount of space required to solve a problem and produce
an output. Similar to the time complexity, space complexity is also expressed in big O notation.
                                                      18
Auxiliary space: The extra space required by the algorithm, excluding the input size, is known as
an auxiliary space. The space complexity considers both the spaces, i.e., auxiliary space, and
space used by the input.
So,
Example
int main()
{
   int a = 10;
   float b = 20.5;
   char c = 'A';
   int d[10];
    return 0;
}
To calculate the complexity of this algorithm, we need to determine the amount of memory used by each
of the variables. In this case:
                                                     19
Types of Space Complexity
Stack
     A stack can be defined as a container in which insertion and deletion can be done from the
      one end known as the top of the stack.
 A Stack is a linear data structure that follows the LIFO (Last-In-First-Out) principle.
  o   push(): When we insert an element in a stack then the operation is known as a push. If the stack is
      full then the overflow condition occurs.
  o   pop(): When we delete an element from the stack, the operation is known as a pop. If the stack is
      empty means that no element exists in the stack, this state is known as an underflow state.
  o   isEmpty(): It determines whether the stack is empty or not.
  o   isFull(): It determines whether the stack is full or not.'
  o   peek(): It returns the element at the given position.
  o   count(): It returns the total number of elements available in a stack.
  o   change(): It changes the element at the given position.
  o   display(): It prints all the elements available in the stack.
                                                       20
Array implementation of Stack (Static Stack Representation)
     In array implementation, the stack is formed by using the array. All the operations regarding the
      stack are performed using arrays.
Algorithm:
Algorithm:
Step -1:         begin
Step -2:         if top = 0 then stack empty;      // top == -1
Step -3:         item := stack (top);
Step -4:         top = top - 1;
Step -5:         end;
int pop ()
{
    if ( top == 0)                      // top == -1
                                                          22
    {
            printf("Underflow");
            return 0;
    }
    else
    {
    Item=stack [top];
    top=top-1;
    return item;
}
}
            In linked list implementation of stack, the nodes are maintained non-contiguously in the memory.
             Each node contains a pointer to its immediate successor node in the stack. Stack is said to be
             overflown if the space left in the memory heap is not enough to create a node.
                                                           23
struct stack *temp;
temp=allocate memory
temp->data=item;
temp->next=top;
top=temp;
}
24