Data Structures and Algorithms
1. Computer Science
Computer science is the study of problems, problem-solving, and the solutions that come out of the problem-
solving process. Given a problem, a computer scientist’s goal is to develop an algorithm, a step-by-step list of
instructions for solving any instance of the problem that might arise. Algorithms are finite processes that if
followed will solve the problem. Algorithms are solutions.
What is a Problem
  • A mapping/relation between a set of input instances (domain) and an output set (range)
Problem Specification
   • Specify what a typical input instance is
   • Specify what the output should be in terms of the input instance
Example: Sorting
Input: A sequence of N numbers a1…an
Output: the permutation (reordering) of the input sequence such that a 1 < a2 <… < an .
Data structures and algorithms
At the heart of virtually every computer program are its algorithms and its data structures. It is hard to
separate these two items, for data structures are meaningless without algorithms to create and manipulate
them, and algorithms are usually trivial unless there are data structures on which to operate.
Algorithms
An algorithm is a finite sequence of steps for accomplishing some computational task.
Algorithms referes to recipe, process, method, technique, procedure, routine,… with the following
requirements:
    • Finiteness -terminates after a finite number of steps
    • Definiteness-rigorously and unambiguously specified
    • Clearly specified input-valid inputs are clearly specified
    • Clearly specified/expected output-can be proved to produce the correct output given a valid input
    • Effectiveness-steps are sufficiently simple and basic
Properties of an Algorithm
An algorithm possesses following properties.
    1. It must be correct. In other words, it must compute the desired function, converting each input to the
        correct output. Note that every algorithm implements some function.
    2. It is composed of a series of concrete steps. Concrete means that the action described by that step is
        completely understood — and doable — by the person or machine that must perform the algorithm.
        Each step must also be doable in a finite amount of time. Thus, the algorithm gives us a “recipe” for
        solving the problem by performing a series of steps.
    3. There can be no ambiguity as to which step will be performed next.
    4. It must be composed of a finite number of steps. If the description for the algorithm were made up of
        an infinite number of steps, we could never hope to write it down, nor implement it as a computer
        program. Most languages for describing algorithms (including English and “pseudocode”) provide
        some way to perform repeated actions, known as iteration. Examples of iteration in programming
        languages include the while and for loop constructs of Python. Iteration allows for short descriptions,
        with the number of steps actually performed controlled by the input.
    5. It must terminate. In other words, it may not go into an infinite loop.
Abstract Data Type
An abstract data type (ADT) is a logical description of how we view the data and the operations that are
allowed without regard to how they will be implemented. This means that we are concerned only with what the
data is representing and not with how it will eventually be constructed.
An abstract data type (ADT) is a set of values (the carrier set), and operations on those values. Here are some
examples of ADTs:
Boolean—The carrier set of the Boolean ADT is the set { true, false }. The operations on these values are
negation, conjunction, disjunction, conditional, is equal to, and perhaps some others.
Integer—The carrier set of the Integer ADT is the set { ..., -2, -1, 0, 1, 2, ... }, and the operations on these
values are addition, subtraction, multiplication, division, remainder, is equal to, is less than, is greater than, and
so on. Note that although some of these operations yield other Integer values, some yield values from other
ADTs (like true and false), but all have at least one Integer value argument.
String—The carrier set of the String ADT is the set of all finite sequences of characters from some alphabet,
including the empty sequence (the empty string). Operations on string values include concatenation, length of,
substring, index of, and so forth.
Bit String—The carrier set of the Bit String ADT is the set of all finite sequences of bits, including the empty
strings of bits, which we denote λ: { λ, 0, 1, 00, 01, 10, 11, 000, ... }. Operations on bit strings include
complement (which reverses all the bits), shifts (which rotates a bit string left or right), conjunction and
disjunction (which combine bits at corresponding locations in the strings, and concatenation and truncation.
Data structure or Data type
   • A data structure is the implementation of an abstract data type.
   • A data structure is a systematic way of organizing and accessing data
   • A data structure is a particular way of storing and organizing data in a computer so that it can be used
        efficiently.
   • A data structure is an arrangement of data in memory locations to represent values of the carrier set of
        an abstract data type.
Why Study Algorithms?
Computer scientists learn by experience. We learn by seeing others solve problems and by solving problems
by ourselves. Being exposed to different problem-solving techniques and seeing how different algorithms are
designed helps us to take on the next challenging problem that we are given. By considering a number of
different algorithms, we can begin to develop pattern recognition so that the next time a similar problem arises,
we are better able to solve it.
As we study algorithms, we can learn analysis techniques that allow us to compare and contrast solutions
based solely on their own characteristics, not the characteristics of the program or computer used to
implement them.
Expressing Algorithms
   • Implementations
   • Pseudo-code
   • English
The main concern here is not the specific language used but the clarity of your expression
Algorithm Analysis
Algorithm analysis: The process of determining, as precisely as possible, how many resources (such as time
and memory) an algorithm consumes when it executes.
Algorithm analysis can be used to compare two or more algorithms for the same problem on their effiency.
Types of Analysis:
    1. Experimental Analysis
    2. Theoretical Analysis
Experimental Analysis
involves:
    •   Writing a program implementing the algorithm
    •   Running the program with inputs of varying size and composition
    •   Use an appropriate method to get an accurate measure of the actual running time. In Python, we can
        use the time() function which records the number of seconds or fractions thereof.
    •   Plot the results
Example:
from time import time 
start_time = time( )                                         # record the starting time 
run algorithm 
end_time = time( )                                           # record the ending time 
elapsed = end time − start time                              # compute the elapsed time
Challenges of Experimental Analysis
    1. Experimental running times of two algorithms are difficult to directly compare unless the experiments
       are performed in the same hardware and software environments.
    2. Experiments can be done only on a limited set of test inputs; hence, they leave out the running times
       of inputs not included in the experiment (and these inputs may be important).
    3. An algorithm must be fully implemented in order to execute it to study its running time experimentally.
Tutorial Questions:
    1. For each of the problems below design an algorithm and implement it using Python then using the
       experimental analysis approach determine and print to the console screen the total run-time of your
       algorithm:
            1. From first principles (without using built-in functions) search for a number x from a list of
               numbers
            2. From first principles (without using built-in functions) add numbers in a list of numbers
    2. On the same axis, plot graphs for the following functions for n >=0 and n <= 10 million:
          y = 1, y = n, y = logn, y = n2, n!, 2n, nlogn, n3
Theoretical Analysis
An approach to analyzing the efficiency of algorithms that:
    1. Allows us to evaluate the relative efficiency of any two algorithms in a way that is independent of the
       hardware and software environment.
    2. Is performed by studying a high-level description of the algorithm without need for implementation.
    3. Takes into account all possible inputs.
Computational complexity: The time (and perhaps the space) requirements of an algorithm.
Complexity C(n): The number of basic operations performed by an algorithm as a function of the size of its
input n when this value is the same for any input of size n.
Counting Primitive Operations
To analyze the running time of an algorithm without performing experiments, perform an analysis directly on a
high-level description of the algorithm (either in the form of an actual code fragment, or language-independent
pseudo-code). Below is a set of primitive operations:
    1. Assigning an identifier to an object
    2. Determining the object associated with an identifier
    3. Performing an arithmetic operation (for example, multiplying two numbers)
    4. Comparing two numbers
    5. Accessing a single element of a Python list by index
    6. Calling a function (excluding operations executed within the function)
    7. Returning from a function.
Formally, a primitive operation corresponds to a low-level instruction with an execution time that is constant.
Instead of trying to determine the specific execution time of each primitive operation, we will simply count how
many primitive operations are executed, and use this number t as a measure of the running time of the
algorithm.
This operation count will correlate to an actual running time in a specific computer, for each primitive operation
corresponds to a constant number of instructions, and there are only a fixed number of primitive operations.
The implicit assumption in this approach is that the running times of different primitive operations will be fairly
similar. Thus, the number, t, of primitive operations an algorithm performs will be proportional to the actual
running time of that algorithm.
Measuring Operations as a Function of Input Size
To capture the order of growth of an algorithm’s running time, we will associate, with each algorithm, a function
f (n) that characterizes the number of primitive operations that are performed as a function of the input size n.
Focusing on the Worst-Case Input
The worst case running time of an algorithm/worst case complexity W(n): The maximum number of basic
operations performed by an algorithm for any input of size n.
The best case running time of an algorithm /best case complexity B(n): The minimum number of basic
operations performed by an algorithm for any input of size n.
The average-case running time of an algorithm /Average case complexity A(n): The average number of
basic operations performed by an algorithm for all inputs of size n, given assumptions about the
characteristics of inputs of size n. An average-case analysis usually requires that we calculate expected
running times based on a given input distribution, which usually involves sophisticated probability theory
Therefore, in this course, unless we specify otherwise, we will characterize running times in terms of the worst
case, as a function of the input size, n, of the algorithm.
5 ms                                                  worst-case
4 ms
3 ms
                                                  }   average-case?
                                                      best-case
2 ms
1 ms
         A    B    C     D     E       F      G
                       Input
Rules for determining the number of operations
       1. We assume an arbitrary time unit.
       2. Execution of one of the following operations takes time 1:
              •    assignment operation
              •    single I/O operations
              •    single Boolean operations, numeric comparisons
              •    single arithmetic operations
              •    function return
              •    array index operations, pointer dereferences
              A sequence of operations:
                                   •       count = count + 1;          Cost: c1
                                   •       sum = sum + count;          Cost: c2
                                   •       Total Cost = c1 + c2
3. Running time of a selection statement (if, switch) is the time for the condition evaluation + the
   maximum of the running times for the individual clauses in the selection.
 Example: Simple If-Statement
                          Cost                                          Times
   if (n < 0)             c1                                              1
      absval = -n         c2                                              1
   else
      absval = n;         c3                                               1
 Total Cost <= c1 + max(c2,c3)
4. Loop execution time is the sum, over the number of times the loop is executed, of the body time + time
   for the loop check and update operations, + time for the loop setup.
        • Always assume that the loop executes the maximum number of iterations possible
    5. Running time of a function call is 1 for setup + the time for any parameter calculations + the time
       required for the execution of the function body
General rules for estimation
    1. Loops: The running time of a loop is at most the running time of the statements inside of that loop
       times the number of iterations.
    2.    Nested Loops: Running time of a nested loop containing a statement in the inner most loop is the
         running time of statement multiplied by the product of the sized of all loops.
    3. Consecutive Statements: Just add the running times of those consecutive statements.
    4. If/Else: Never more than the running time of the test plus the larger of running times of S1 and S2.
Examples:
Problem                                 Input size measure          Basic operation
Search for key in list of n items       Number of items in list n   Key comparison
Multiply two matrices of floating point Dimensions of matrices      Floating point multiplication
numbers
Compute an                             n                            Floating point multiplication
Graph problem                          #vertices and/or edges       Visiting a vertex or traversing an edge
How do we compare algorithms?
We need to define a number of objective measures.
    1. Compare execution times?
        Not good: times are specific to a particular computer !!
    2. Count the number of statements executed?
            Not good: number of statements vary with the programming language as well as the
                        style of the individual programmer.
    3. Express running time as a function of the input size n (i.e., f(n)).
               Ideal Solution: Compare different functions corresponding to running times. Such an analysis is
               independent of machine time, programming style, etc.
Asymptotic analysis
Asymptotic analysis is a method to describe the limiting behavior and the performance of algorithms when
applied to very large input datasets.
To compare two algorithms with running times f(n) and g(n), we need a rough measure that characterizes
how fast each function grows. Therefore, use rate of growth – ie compare functions in the limit, that is,
asymptotically! (i.e., for large values of n)
Motivation for Asymptotic analysis
    •   An exact computation of worst-case running time can be difficult because a function may have many
                             2                                             ⅔ 
        terms for example: 4n   3n log n + 17.5 n  43 n + 75
    •   An exact computation of worst-case running time is unnecessary becuase we are already
        approximating running time by using RAM model [each simple operation (+,,*,/,call) or
        memory access takes exactly one step].
Rate of Growth
In comparing the efficiency of algorithms, we are more interested in big differences that manifest themselves
as the size of the input becomes large than we are in small differences in running times that vary by a constant
or a multiple for inputs of all sizes. The theory of the asymptotic growth rate of functions, also called the order
of growth of functions, provides a basis for partitioning algorithms into groups with equivalent efficiency. The
low order terms in a function are relatively insignificant for large n. For exmple
                     4       2                        4
                    n  + 100n  + 10n + 50     ~     n
                    4       2                  4
 i.e., we say that n  + 100n  + 10n + 50  and n    have the same rate of growth
Asymptotic Notation
    1. The “Big-Oh” Notation
           Let f (n) and g(n) be functions mapping positive integers to positive real numbers.
           We say that f (n) is O(g(n)) if there is a real constant c > 0 and an integer constant n0 ≥ 1 such that
           f (n) ≤ cg(n), for n ≥ n0 .
           This definition is often referred to as the “big-Oh” notation, for it is sometimes pronounced as “ f (n)
           is big-Oh of g(n).”
The function f (n) is O(g(n)), since f (n) ≤ c · g(n) when n ≥ n 0 .
    2. Big-Omega
Let f (n) and g(n) be functions mapping positive integers to positive real numbers. We say that f (n) is Ω(g(n)),
pronounced “ f (n) is big-Omega of g(n),” if g(n) is O( f (n)), that is, there is a real constant c > 0 and an integer
constant n0 ≥ 1 such that
f (n) ≥ cg(n), for n ≥ n0 .
This definition allows us to say asymptotically that one function is greater than or equal to another, up to a
constant factor.
Big-Theta
Is a notation that allows us to say that two functions grow at the same rate, up to constant factors. We say that
f (n) is Θ(g(n)), pronounced “ f (n) is big-Theta of g(n),” if f (n) is O(g(n)) and f (n) is Ω(g(n)) , that is, there are
real constants c' > 0 and c'' > 0, and an integer constant n ≥ 1 such that c g(n) ≤ f (n) ≤ c g(n), for n ≥ n 0 .
                Relations Between , O, 
Establishing the Order of Growth of a Function
When confronted with the question of whether some function g is in O(f ), we can use the definition directly to
decide, but there is an easier way embodied in the following theorem.
Theorem: g ∈ O(f ) if the lim n>∞ g(n)/f(n) = c, for c ≥ 0. 
For example, to show that 3n2+2n1 ∈ O(n2 ) we need to consider lim n>∞ (3n2 +2n1)/n2 :
lim n>∞ (3n2 +2n1)/n2 = lim n>∞ 3n2 /n2 + lim n>∞ 2n/n2  lim n>∞ 1/n2 
= lim n>∞ 3 + lim n>∞ 2/n  lim n>∞ 1/n2 = 3 
Because this limit is not infinite, 3n2 +2n1 ∈ O(n2 ).
Another theorem that is also very useful in solving limit problems is L’HÔpital’s Rule:
Theorem: If lim n>∞ f(n) = lim n>∞ g(n) = ∞, and the derivatives f ' and g' exist,
then lim n>∞ f (n)/g(n) = lim n>∞ f '(n)/g'(n).
To illustrate the use of L’HÔpital’s Rule, lets determine whether n2 ∈ O(n lg n). First note
that lim n>∞ n2 = lim n>∞ n lg n = ∞, so L’HÔpital’s Rule applies.
lim n>∞ n2 /(n lg n) = lim n>∞ n/(lg n)
= (using L'HÔpital's Rule) lim n>∞ 1/((lg e)/n) 
= lim n>∞ n/(lg e) = ∞ 
Because this limit is infinite, we know that n2 ∉ O(n lg n), that is, we know that n2 grows faster than n lg n.
It is easy to use L’HÔpital’s Rule to show that n lg n ∈ O(n2 )
Tutorial Questions
    1. Show that n3 +n-4 ∉ O(2n2 -3).
    2. Show that lg 2 n ∈ O(n).
    3. Show that n lg n ∈ O(n2 ).
    4. Show that (ln n)2 ∈ O(n).
    5. Show that if a, b ≥ 0 and a ≤ b, then n a ∈ O(n b ).