Practical No.
1
 Aim: -Introduction of Artificial Intelligence and its application.
 Artificial Intelligence: - “The science and engineering of making intelligent machines,
 especially Intelligent computer programs”. Artificial Intelligence is an approach to make a
 computer, a robot, or a product to think how smart human think.AI is a study of how human brain think,
 learn, decide and work, when it tries to solve problems. And finally, this study outputs intelligent
 software systems. The aim of AI is to improve computer functions which are related to human
 knowledge, for example, reasoning, learning, and problem-solving. The intelligence is intangible. It is
 composed of
         • Reasoning
         • Learning
         • Problem Solving
         • Perception
         • Linguistic Intelligence
 The objectives of AI research are reasoning, knowledge representation, planning, learning,
 natural language processing, realization, and ability to move and manipulate objects. There are
 long-term goals in the general intelligence sector.
 Approaches include statistical methods, computational intelligence, and traditional coding AI.
 During the AI research related to search and mathematical optimization, artificial neural
 networks and methods based on statistics, probability, and economics, we use many tools.
 Computer science attracts AI in the field of science, mathematics, psychology, linguistics,
 philosophy and so on.
 Goals of AI:
   •   To Create Expert Systems − The systems which exhibit intelligent behavior,
       learn, demonstrate, explain, and advice its users.
   •   To Implement Human Intelligence in Machines − Creating systems that understand,
       think, learn ,and behave like humans.
Applications of AI :
         ➢ Gaming − AI plays important role for machine to think of large number of possible
             positions based on deep knowledge in strategic games. for example, chess,river
             crossing, N-queens problems and etc.
B.Tech CSE 6th Semester                             CTIEMT                                  Page No. 1
        ➢ Natural Language Processing − Interact with the computer that understands
            natural language spoken by humans.
        ➢ Expert Systems − Machine or software provide explanation and advice to the users.
        ➢ Expert Systems − Machine or software provide explanation and advice to the users.
        ➢ Speech Recognition − There are some AI based speech recognition systems have
            ability to hear and express as sentences and understand their meanings while a
            person talks to it. For example Siri and Google assistant
        ➢   Handwriting Recognition   − The handwriting recognition software reads the text
            written on paper and recognize the shapes of the letters and convert it into editable
            text
        ➢ Intelligent Robots − Robots are able to perform the instructions given by a human
B.Tech CSE 6th Semester                         CTIEMT                               Page No. 2
                                Practical No. 2
  Aim: Implementation of Depth-First Search (Uninformed Search Strategy)
  DFS is also an important type of uniform search. DFS visits all the vertices in the graph. This
  type of algorithm always chooses to go deeper into the graph. After DFS visited all the
  reachable vertices from a particular sources vertices it chooses one of the remaining
  undiscovered vertices and continues the search. DFS reminds the space limitation of breath
  first search by always generating next a child of the deepest unexpanded nodded. The data
  structure stack or (LIFO) is used for DFS. One interesting property of DFS isthat, the discover
  and finish time of each vertex from a parenthesis structure. If we use one open parenthesis
  when a vertex is finishedthen the result is properly nested set of parenthesis.
  Algorithm of Depth-First Search :
  1. PUSH the starting node into the stack.
  2. If the stack is empty then stop and return failure.
  3. If the top node of the stack is the goal node, then stop and return success.
  4. Else POP the top node from the stack and process it. Find all its neighbors that
  are in ready state andPUSH them into the stack in any order.
  5. Go to step 3.
  6. Exit.
  Advantages of Depth First Search:
   •    Memory requirement is only linear with respect to the search graph. This is in contrast
        with breadth-first search which requires more space. The reason is that the algorithm
        only needs to store a stack of nodes on the path from the root to the current node.
   •    The time complexity of a depth-first Search to depth d and branching factor b (the
        number of children at each node, the outdegree) is O(bd) since it generates the same set
        of nodes as breadth-first search, but simply in a different order. Thus practically depth-
        first search is time-limited rather than space-limited.
    Disadvantages of Depth First Search:
    •   Depth-First Search is not guaranteed to find the solution.
    •   And there is no guarantee to find a minimal solution, if more than one solution.
B.Tech CSE 6th Semester                           CTIEMT                              Page No. 3
 CODE:
 graph   =   {
   '5'   :   ['3','7'],
   '3'   :   ['2', '4'],
   '7'   :   ['8'],
   '2'   :   [],
   '4'   :   ['8'],
   '8'   :   []
 }
 visited = set() # Set to keep track of visited nodes of graph.
 def dfs(visited, graph, node): #function for dfs
     if node not in
        visited: print
        (node)
        visited.add(node
        )
        for neighbour in graph[node]:
            dfs(visited, graph,
            neighbour)
 # Driver Code
 print("Following is the Depth-First
 Search") dfs(visited, graph, '5')
 Output:
B.Tech CSE 6th Semester           CTIEMT                    Page No. 4
                                 Practical No. 3
  Aim: Implementation of Breadth-First Search(Uninformed Search Strategy)
  Breadth-first search (BFS) :- is an algorithm for traversing or searching tree or graph data
  structures. It starts at the tree root (or some arbitrary node of a graph, sometimes referred to
  as a 'search key'[1]), and explores all of the neighbor nodes at the present depth prior to moving
  on to the nodes at the next depth level.
  Advantages of Breadth First Search:
    •   BFS will never get trapped exploring the useful path forever.
    •   If there is a solution, BFS will definitely find it.
    •   If there is more than one solution then BFS can find the minimal one that requires less
        number of steps.
    •   Low storage requirement – linear with depth.
    •   Easily programmable.
  Disadvantages of Breadth First Search:
    The main drawback of BFS is its memory requirement. Since each level of the graph must
    be saved in order to generate the next level and the amount of memory is proportional to
    the number of nodes stored the space complexity of BFS is O(bd ), where b is the branching
    factor(the number of children at each node, the outdegree) and d is the depth. As a result,
    BFS is severely space-bound in practice so will exhaust the memory available on typical
    computers in a matter of minutes.
  Algorithm of Breadth-First Search :
  Step 1: SET STATUS = 1 (ready state) for each node in G
  Step 2: Enqueue the starting node A and set its STATUS = 2 (waiting state)
  Step 3: Repeat Steps 4 and 5 until QUEUE is empty
  Step 4: Dequeue a node N. Process it and set its STATUS = 3 (processed state).
  Step 5: Enqueue all the neighbours of N that are in the ready state (whose STATUS = 1) and
  Set their STATUS = 2
  (waiting state )
  [ END OF LOOP]
B.Tech CSE 6th Semester                            CTIEMT                               Page No. 5
  CODE:
  graph = {
      'A' : ['B','C'],
      'B' : ['D', 'E'],
      'C' : ['F'],
      'D' : [],
      'E' : ['F'],
      'F' : []
  }
  visited = [] # List to keep track of visited nodes.
  queue = []         #Initialize a queue
  def bfs(visited, graph, node):
      visited.append(node)
      queue.append(node)
      while queue:
       s = queue.pop(0)
       print (s, end = " ")
       for neighbour in graph[s]:
         if neighbour not in visited:
           visited.append(neighbour)
           queue.append(neighbour)
  # Driver Code
  print("bfs traversal is as follow:")
  bfs(visited, graph, 'A')
  OUTPUT:
B.Tech CSE 6th Semester                           CTIEMT   Page No. 6
                                     Practical No. 04
 Aim: Implementation of A* Algorithm (Informed Search Strategy).
 A* Search algorithm is one of the best and popular technique used in path-finding and graph
 traversals.
 A * algorithm has 3 paramters:
  •    g(n): The actual cost of traversal from initial state to the current state.
  •    h(n): The estimated cost of traversal from the current state to the goal state.
  •    f(n): The actual cost of traversal from the initial state to the goal state.
       f(n) = g(n) + h(n)
 Advantages of A* Algorithm:
        •   It is optimal search algorithm in terms of heuristics.
        •   It is one of the best heuristic search techniques.
        •   It is used to solve complex search problems.
        •   There is no other optimal algorithm guaranteed to expand fewer nodes than A*.
 Disadvantages of A* Algorithm:
        •    This algorithm is complete if the branching factor is finite and every action has fixed
             cost.
        •   The performance of A* search is dependant on accuracy of heuristic algorithm used
            to compute thefunction h(n).
 Algorithm:
 // A* Search Algorithm
 1. Initialize the open list
 2. Initialize the closed list put the starting node on the open list (you can leave its f at zero)
 3. while the open list is not empty
      a) find the node with the least f on
        the open list, call it "q"
      b) pop q off the open list
B.Tech CSE 6th Semester                               CTIEMT                             Page No. 7
   c) generate q's 8 successors and set their
     parents to q
   d) for each successor
     i) if successor is the goal, stop search
     ii) else, compute both g and h for successor
       successor.g = q.g + distance between successor and q
       successor.h = distance from goal to successor (This can be done using many
       ways, we will discuss three heuristics- Manhattan, Diagonal and Euclidean
       Heuristics)
       successor.f = successor.g + successor.h
     iii) if a node with the same position as successor is in the OPEN list which has a
       lower f than successor, skip this successor
     iV) if a node with the same position as successor is in the CLOSED list which has
          a lower f than successor, skip this successor otherwise, add the node to the open list
          end (for loop)
   e) push q on the closed list
   end (while loop)
 CODE:
 from collections import dequeclass
 Graph:
   # example of adjacency list (or rather map)#
   adjacency_list = {
   # 'A': [('B', 1), ('C', 3), ('D', 7)],# 'B':
   [('D', 5)],
   # 'C': [('D', 12)]
B.Tech CSE 6th Semester                           CTIEMT                               Page No. 8
    #}
   def      init     (self, adjacency_list):
      self.adjacency_list = adjacency_list
   def get_neighbors(self, v): return
      self.adjacency_list[v]
   # heuristic function with equal values for all nodesdef
   h(self, n):
      H={
           'A': 1,
           'B': 1,
           'C': 1,
           'D': 1
         return H[n]
   def a_star_algorithm(self, start_node, stop_node):
      # open_list is a list of nodes which have been visited, but who's neighbors# haven't all
      been inspected, starts off with the start node
      # closed_list is a list of nodes which have been visited# and
      who's neighbors have been inspected
      open_list = set([start_node])
      closed_list = set([])
      # g contains current distances from start_node to all other nodes# the
      default value (if it's not found in the map) is +infinity
      g = {}
      g[start_node] = 0
      # parents contains an adjacency map of all nodesparents
B.Tech CSE 6th Semester                          CTIEMT                             Page No. 9
      = {}
      parents[start_node] = start_nodewhile
      len(open_list) > 0:
         n = None
         # find a node with the lowest value of f() - evaluation functionfor v in
         open_list:
             if n == None or g[v] + self.h(v) < g[n] + self.h(n):n = v;
         if n == None:
             print('Path does not exist!')
             return None
         # if the current node is the stop_node
         # then we begin reconstructin the path from it to the start_nodeif n ==
         stop_node:
             reconst_path = [] while
             parents[n] != n:
                reconst_path.append(n)
             n = parents[n]
             reconst_path.append(start_node)
             reconst_path.reverse()
             print('Path found: {}'.format(reconst_path))return
             reconst_path
         # for all neighbors of the current node dofor (m,
         weight) in self.get_neighbors(n):
             # if the current node isn't in both open_list and closed_list# add it
             to open_list and note n as it's parent
             if m not in open_list and m not in closed_list:
                open_list.add(m)
B.Tech CSE 6th Semester                           CTIEMT                             Page No. 10
                   parents[m] = n
                   g[m] = g[n] + weight
               # otherwise, check if it's quicker to first visit n, then m# and if
               it is, update parent data and g data
               # and if the node was in the closed_list, move it to open_listelse:
                   if g[m] > g[n] + weight:g[m]
                        =      g[n]    +    weight
                        parents[m] = n
                        if     m      in    closed_list:
                             closed_list.remove(m)
                             open_list.add(m)
            # remove n from the open_list, and add it to closed_list#
            because all of his neighbors were inspected open_list.remove(n)
            closed_list.add(n)
        print('Path does not exist!')
        return None
 adjacency_list = {
     'A': [('B', 1), ('C', 3), ('D', 7)],
     'B': [('D', 5)],
     'C': [('D', 12)]
 graph1 = Graph(adjacency_list)
 graph1.a_star_algorithm('A', 'D')
 OUTPUT:
B.Tech CSE 6th Semester                                    CTIEMT                    Page No. 11
                                  Practical No. 5
 Aim: Write a program to implement towers of Hanoi.
 Tower of Hanoi :
  The solution to the Towers of Hanoi puzzle is a classic example of recursion. The ancient
  puzzle of the Towers of Hanoi consists of a number of wooden disks mounted on three poles,
  which are in turn attached toa baseboard. The disks each have different diameters and a hole
  in the middle large enough for the poles to pass through. In the beginning, all the disks are on
  the left pole. The object of the puzzle is to move all the disks over to the right pole, one at a
  time, so that they end up in the original order on that pole. You can use the middle pole as a
  temporary resting place for disks, but at no time is a larger disk to be on top of a smallerone.
  It's easyto solve the Towers of Hanoi with two or three disks, but the process becomes more
  difficult with four or more disks.
 Algorithm:
     •    Identify the base case: If there is only one disk, simply move it from the source rod to
          the destination rod.
     •    Otherwise, recursively solve the problem for n-1 disks, moving them from the source
          rod to the auxiliary rod.
     •    Move the largest disk from the source rod to the destination rod.
     • Recursively solve the problem for n-1 disks on the auxiliary rod, moving them to the
          destination rod.
 CODE:
 def hanoi(n, f, to, via):if n
    == 1:
         print("Move disk 1 from",f,"to",to);else:
         hanoi(n-1, f, via, to)
         print("Move disk",n,"from",f,"to",to);
         hanoi(n-1, via, to, f)
 n=3
B.Tech CSE 6th Semester                              CTIEMT                            Page No. 12
 f = 'A'
 to = 'B' via
 = 'C'
 hanoi(n, f, via, to)
 OUTPUT:
B.Tech CSE 6th Semester   CTIEMT   Page No. 13
                                                   Practical No. 6
 Aim: Implementation of Tic-Tac-Toe game
 Rules of the Game :
      1. The game is played on a grid that's 3 squares by 3 squares.
      2. You are X , your friend (or the computer in this case) is O . Players take turns putting
         their marks in empty squares.
      3. The first player to get 3 of her marks in a row (up, down, across, or diagonally) is the
         winner.
      4. When all 9 squares are full, the game is over. If no player has 3 marks in a row, the
         game ends in a tie.
 CODE:
 import os
 import time
 board = [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
 player = 1
 # Win Flags
 Win = 1
 Draw = -1
 Running = 0
 Stop = 1
 Game = Running
 Mark = 'X'
 def DrawBoard():
    print(" %c | %c | %c " % (board[1], board[2], board[3]))
    print("          |      |      ")
    print(" %c | %c | %c " % (board[4], board[5], board[6]))
    print("          |      |      ")
    print(" %c | %c | %c " % (board[7], board[8], board[9]))
    print(" | | ")
 def CheckPosition(x):
     if board[x] == ' ':
         return True
B.Tech CSE 6th Semester                                       CTIEMT                  Page No. 14
   else:
      return False
 def CheckWin():
   global Game
   if board[1] == board[2] and board[2] == board[3] and board[1] != ' ':
      Game = Win
   elif board[4] == board[5] and board[5] == board[6] and board[4] != ' ':
      Game = Win
   elif board[7] == board[8] and board[8] == board[9] and board[7] != ' ':
      Game = Win
   elif board[1] == board[4] and board[4] == board[7] and board[1] != ' ':
      Game = Win
   elif board[2] == board[5] and board[5] == board[8] and board[2] != ' ':
      Game = Win
   elif board[3] == board[6] and board[6] == board[9] and board[3] != ' ':
      Game = Win
   elif board[1] == board[5] and board[5] == board[9] and board[5] != ' ':
      Game = Win
   elif board[3] == board[5] and board[5] == board[7] and board[5] != ' ':
      Game = Win
   elif board[1] != ' ' and board[2] != ' ' and board[3] != ' ' and \
           board[4] != ' ' and board[5] != ' ' and board[6] != ' ' and \
           board[7] != ' ' and board[8] != ' ' and board[9] != ' ':
      Game = Draw
   else:
      Game = Running
 print("Tic-Tac-Toe Game Designed By Sourabh Somani")
 print("Player 1 [X] --- Player 2 [O]\n")
 print()
 print()
 print("Please Wait...")
B.Tech CSE 6th Semester                               CTIEMT                 Page No. 15
 time.sleep(3)
 while Game == Running:
   os.system('cls')
   DrawBoard()
   if player % 2 != 0:
     print("Player 1's chance")
     Mark = 'X'
   else:
     print("Player 2's chance")
     Mark = 'O'
   choice = int(input("Enter the position between [1-9] where you want to mark: "))
   if CheckPosition(choice):
     board[choice] = Mark
     player += 1
     CheckWin()
   os.system('cls')
   DrawBoard()
   if Game == Draw:
     print("Game Draw")
   elif Game == Win:
     player -= 1
     if player % 2 != 0:
           print("Player 1 Won")
     else:
           print("Player 2 Won")
 OUTPUT:
B.Tech CSE 6th Semester                       CTIEMT                              Page No. 16
                                 Practical No. 7
  Aim: Write a program to implement water jug problem.
  Statement :We are given 2 jugs, a 4 liter one and a 3- liter one. Neither has any measuring
  markers on it.There is a pump that can be used to fill the jugs with water. How can we get
  exactly 2 liters of water in tothe 4-liter jugs?
 Solution:-
    1. Start with the initial state where both jugs are empty.
    2. Create a queue. Next, add the initial state to it.
    3. While the queue is not empty, opt for the following:
            o   Pop the front state from the queue.
            o   Apply all possible production rules to generate new states.
            o   Check if any of these new states match the goal state.
            o   If a goal state is found, the problem is solved.
            o   If not, add the new states to the queue for further exploration.
    4. BFS ensures that you find the shortest path to the goal state, which is efficient for
        solving the Water Jug Problem.
    CODE:
        def gcd(a, b):
             if b == 0:
                return a
             else:
                 return gcd(b, a % b)
       def can_measure_water(jug1_cap, jug2_cap, target):
          if target > jug1_cap + jug2_cap:
          return False
B.Tech CSE 6th Semester                              CTIEMT                        Page No. 17
   if jug1_cap == 0 or jug2_cap == 0:
      return target == 0 or target == jug1_cap + jug2_cap
   return target % gcd(jug1_cap, jug2_cap) == 0
 def measure_water(jug1_cap, jug2_cap, target):
   if not can_measure_water(jug1_cap, jug2_cap, target):
      return []
   gcd_val = gcd(jug1_cap, jug2_cap)
   a, b = jug1_cap // gcd_val, jug2_cap // gcd_val
   if a > b:
      a, b = b, a
      jug1_cap, jug2_cap = jug2_cap, jug1_cap
   q2 = target // gcd_val
   while q2 > 0:
      q1 = (target - b * q2) // a
      yield ('fill', 1)
      yield ('pour', 1, 2)
      yield ('empty', 2)
      yield ('pour', 1, 2)
      yield ('fill', 1)
      yield ('pour', 1, 2)
      q2 -= 1
      target -= a
 jug1_cap = 4
 jug2_cap = 3
 target = 2
B.Tech CSE 6th Semester                      CTIEMT         Page No. 18
 if can_measure_water(jug1_cap, jug2_cap, target):
   print(list(measure_water(jug1_cap, jug2_cap, target)))
 else:
   print("Cannot measure the desired amount of water.")
 OUTPUT:
B.Tech CSE 6th Semester                       CTIEMT        Page No. 19
                                  Practical No. 8
  Aim: Write a program to construct a Bayesian network from given data.
  Theory:
  A Bayesian network is a directed acyclic graph in which each edge corresponds to a conditional
  dependency, and each node corresponds to a unique random variable. The Bayesian network
  consists of two major parts: a directed acyclic graph and a set of conditional probability
  distributions
  1) The directed acyclic graph is a set of random variables represented by nodes.
  2) A node's conditional probability distribution (random variable) is
     defined for every possible outcome of the preceding causal node(s).
  Code:
     import numpy as np
      import csv
     import pandas as pd
      from pgmpy.models import Bayesian Model
     from pgmpy.estimators import Maximum Likelihood Estimator
     from pgmpy.Inference import Variable Elimination
      # Read Cleveland Heart Disease data
     heartDisease = pd.read_csv('heart.csv')
     heartDisease = heartDisease.replace('?',np.nan)
      #display the data print('Few examples from the dataset are given below')
     print(heartDisease.head())
      #Model Bayesian Network
      Model=BayesianModel([('age','trestbps'),('age','fbs'),
     ('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise
     ase'),('fbs','heartdisease'),('heartdisease','restecg'),
     ('heartdisease','thalach'),('heartdisease','chol')])
      #Learning CPDs using Maximum Likelihood Estimators
     print('\n Learning CPD using Maximum likelihood estimators')
B.Tech CSE 6th Semester                              CTIEMT                          Page No. 20
     # Inferencing with Bayesian Network
     print('\n Inferencing with Bayesian Network:')
      HeartDisease_infer = VariableElimination(model)
      # Computing the Probability of Heart Disease given Age
      print('\n 1. Probability of HeartDisease given Age=30')
     q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'age':28})
     print(q['heartdisease'])
     #computing the Probability of HeartDisease given cholesterol
      print('\n 2. Probability of HeartDisease given cholesterol=100')
     q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'chol':100})
     print(q['heartdisease'])
  Output:
B.Tech CSE 6th Semester                         CTIEMT                             Page No. 21
                                 Practical No. 9
  Aim: Write a program to infer from Bayesian Network.
  Theory- Inference over a Bayesian network can come in two forms. The first is simply
  evaluating the joint probability of a particular assignment of values for each variable (or a
  subset ) in the network .The second, more interesting inference task ,is to find P(x|e),or, to find
  the probability of some assignment of a subset of the variables (x) given assignments of other
  variables (our evidence ,e).
  Code:
    from pgmpy.models import BayesianNetwork
    from pgmpy.factors.discrete import TabularCPD
    import networkx as nx
    import pylab as plt
    # Defining Bayesian Structure
    model = BayesianNetwork([('Guest', 'Host'), ('Price', 'Host')])
    # Defining the CPDs:
    cpd_guest = TabularCPD('Guest', 3, [[0.33], [0.33], [0.33]])
    cpd_price = TabularCPD('Price', 3, [[0.33], [0.33], [0.33]])
    cpd_host = TabularCPD('Host', 3, [[0, 0, 0, 0, 0.5, 1, 0, 1, 0.5],
                      [0.5, 0, 1, 0, 0, 0, 1, 0, 0.5],
                      [0.5, 1, 0, 1, 0.5, 0, 0, 0, 0]],
                evidence=['Guest', 'Price'], evidence_card=[3, 3])
    # Associating the CPDs with the network structure.
    model.add_cpds(cpd_guest, cpd_price, cpd_host)
    # Infering the posterior probability
    from pgmpy.inference import VariableElimination
    infer = VariableElimination(model)
    posterior_p = infer.query(['Host'], evidence={'Guest': 2, 'Price': 2})
    print(posterior_p)
B.Tech CSE 6th Semester                             CTIEMT                             Page No. 22
    nx.draw(model, with_labels=True)
    plt.savefig('model.png')
    plt.close()
  Output:
B.Tech CSE 6th Semester                CTIEMT   Page No. 23
                                 Practical No. 10
  Aim: Write a program to run value and policy iteration in a grid world.
 Theory:
    1. Policy Iteration:
        Policy iteration first starts with some (non-optimal) policy, such as a random policy, and
        then calculates the value of each state of the MDP given that policy — this step is
        called policy evaluation. It then updates the policy itself for every state by calculating
        the expected reward of each action applicable from that state.
        Code:
        from tabular_policy import TabularPolicy
        from tabular_value_function import TabularValueFunction
        from qtable import QTable
        class PolicyIteration:
           def __init__(self, mdp, policy):
             self.mdp = mdp
             self.policy = policy
           def policy_evaluation(self, policy, values, theta=0.001):
             while True:
                delta = 0.0
                new_values = TabularValueFunction()
                for state in self.mdp.get_states():
                  # Calculate the value of V(s)
                  actions = self.mdp.get_actions(state)
                  old_value = values.get_value(state)
                  new_value = values.get_q_value(
                      self.mdp, state, policy.select_action(state, actions)
                  )
                  values.update(state, new_value)
                  delta = max(delta, abs(old_value - new_value))
                if delta < theta:
B.Tech CSE 6th Semester                           CTIEMT                             Page No. 24
                  break
             return values
            def policy_iteration(self, max_iterations=100, theta=0.001):
             # create a value function to hold details
             values = TabularValueFunction()
             for i in range(1, max_iterations + 1):
               policy_changed = False
               values = self.policy_evaluation(self.policy, values, theta)
               for state in self.mdp.get_states():
                  actions = self.mdp.get_actions(state)
                  old_action = self.policy.select_action(state, actions)
                  q_values = QTable()
                  for action in self.mdp.get_actions(state):
                      # Calculate the value of Q(s,a)
                      new_value = values.get_q_value(self.mdp, state, action)
                      q_values.update(state, action, new_value)
                  # V(s) = argmax_a Q(s,a)
                  (new_action, _) = q_values.get_max_q(state, self.mdp.get_actions(state))
                  self.policy.update(state, new_action)
                  policy_changed = (
                      True if new_action is not old_action else policy_changed
                  )
        Output:
B.Tech CSE 6th Semester                          CTIEMT                           Page No. 25
    2. Value Iteration:
        Value iteration is a method of computing an optimal MDP policy and its value. Value
        iteration starts at the "end" and then works backward, refining an estimate of either Q* or
        V*. There is really no end, so it uses an arbitrary end point.
        Code:
        from tabular_value_function import TabularValueFunction
        from qtable import QTable
        class ValueIteration:
           def __init__(self, mdp, values):
             self.mdp = mdp
             self.values = values
           def value_iteration(self, max_iterations=100, theta=0.001):
             for i in range(max_iterations):
                delta = 0.0
                new_values = TabularValueFunction()
                for state in self.mdp.get_states():
                  qtable = QTable()
                  for action in self.mdp.get_actions(state):
                     # Calculate the value of Q(s,a)
                     new_value = 0.0
                     for (new_state, probability) in self.mdp.get_transitions(
                          state, action
                     ):
                          reward = self.mdp.get_reward(state, action, new_state)
                          new_value += probability * (
                            reward
                            +(
                               self.mdp.get_discount_factor()
                               * self.values.get_value(new_state)
B.Tech CSE 6th Semester                          CTIEMT                               Page No. 26
                              )
                    qtable.update(state, action, new_value)
                  # V(s) = max_a Q(sa)
                  (_, max_q) = qtable.get_max_q(state, self.mdp.get_actions(state))
                  delta = max(delta, abs(self.values.get_value(state) - max_q))
                  new_values.update(state, max_q)
               self.values.merge(new_values)
               # Terminate if the value function has converged
               if delta < theta:
                  return i
        Output:
B.Tech CSE 6th Semester                        CTIEMT                             Page No. 27
                                  Practical No. 11
  Aim: Write a program to do reinforcement learning in the grid world.
  Theory:
  This is a toy environment called Gridworld that is often used as a toy model in the
  Reinforcement Learning literature. In this particular case: State space: GridWorld has 10x10 =
  100 distinct states. The start state is the top left cell. The gray cells are walls and cannot be
  moved to.
  Code:
  # global variables
  BOARD_ROWS = 3
  BOARD_COLS = 4
  WIN_STATE = (0, 3)
  LOSE_STATE = (1, 3)
  START = (2, 0)
  DETERMINISTIC = True
  def giveReward(self):
    if self.state == WIN_STATE:
       return 1
    elif self.state == LOSE_STATE:
       return -1
    else:
       return 0
  def nxtPosition(self, action):
    if self.determine:
       if action == "up":
            nxtState = (self.state[0] - 1, self.state[1])
       elif action == "down":
B.Tech CSE 6th Semester                               CTIEMT                          Page No. 28
          nxtState = (self.state[0] + 1, self.state[1])
       elif action == "left":
          nxtState = (self.state[0], self.state[1] - 1)
       else:
          nxtState = (self.state[0], self.state[1] + 1)
  def play(self, rounds=10):
    i=0
    while i < rounds:
       if self.State.isEnd:
          # back propagate reward
          reward = self.State.giveReward()
          # explicitly assign end state to reward values
          self.state_values[self.State.state] = reward # this is optional
          print("Game End Reward", reward)
          for s in reversed(self.states):
               reward = self.state_values[s] + self.lr * (reward - self.state_values[s])
               self.state_values[s] = round(reward, 3)
          self.reset()
  Output:
B.Tech CSE 6th Semester                             CTIEMT                                 Page No. 29