Reflex Actions and AI System Analysis
Reflex Actions and AI System Analysis
1. Are reflex actions (such as flinching from a hot stove) rational? Are they intelligent?
   → Reflex actions, such as flinching from a hot stove, are not typically considered rational or
   intelligent in the traditional sense.
       -       Rationality: Reflex actions are automatic and do not involve conscious reasoning or
           deliberation. Rationality usually implies a level of thought and decision-making. Reflexes
           are quick, involuntary responses to stimuli designed to protect the body from harm, but
           they are not driven by conscious thought processes or reasoning.
       -      Intelligence: Intelligence often involves the application of knowledge, reasoning, and
           problem-solving. Reflex actions do not require these cognitive processes. Instead, they
           are governed by the nervous system's automatic responses, bypassing the brain's higher
           functions. Therefore, while reflexes are adaptive and effective for survival, they are not
           considered intelligent because they do not involve the conscious application of
           knowledge or thought.
2. To what extent are the following computer systems instances of artificial intelligence:
       a. Supermarket barcode scanners.
           → AI Involvement: è Minimal
           → Reason: Supermarket bar code scanners are designed to read and decode barcodes to
           retrieve product information and prices from a database. These scanners are primarily
           concerned with pattern recognition and database access. The technology involved does
           not utilize advanced AI techniques such as machine learning or natural language
           processing. The functionality is based on straightforward, rule-based systems for
           recognizing bar codes and performing basic database lookups.
       b. Web search engines.
           → AI Involvement: Significant
           → Reason: Web search engines leverage AI extensively. They utilize algorithms optimized
           with machine learning and natural language processing to enhance the efficiency of
           search functions. AI helps in ranking web pages based on relevance, quality, and user
           intent. Search engines continuously learn from user interactions and feedback to improve
           search accuracy and relevance, reflecting a deep integration of AI principles.
       c. Voice-activated telephone menus.
           → AI Involvement: Moderate
           → Reason: Voice-activated telephone menus use speech recognition technology to
           interpret user commands and perform actions accordingly. While this involves some AI,
           particularly in terms of recognizing and processing voice inputs, these systems typically
           operate based on predefined rules and commands. They do not adapt or learn beyond
           their programmed instructions, making their AI involvement more rule-based compared
           to more adaptive AI systems.
       d. Spelling and grammar correction features in word processing programs
           → AI Involvement: Significant
   → Reason: Speech recognition is a prominent AI technology that allows machines to
   understand and process spoken language. It involves complex AI techniques, including
   deep learning models, to convert spoken words into text or commands. Systems such as
   virtual assistants (e.g., Siri, Alexa) use speech recognition to facilitate natural interactions
   with users. The technology enables high accuracy and adaptive capabilities,
   demonstrating a significant application of AI.
e. Internet routing algorithms that respond dynamically to the state of the network
   → AI Involvement: Extensive
   → Reason: Autonomous planning and scheduling represent a sophisticated branch of AI.
   This technology involves creating and executing strategies to achieve specific goals, often
   in dynamic and complex environments. It employs AI techniques to coordinate activities,
   optimize resources, and adapt to changing conditions. Autonomous systems, such as
   robots and unmanned vehicles, rely on advanced AI for effective decision-making and task
   execution, showcasing an extensive use of AI principles.
LAB 02
 1. For each of the following agents, specify the sensors, actuators, and environment:
        a. Microwave oven
           -       Sensors:
           + Temperature sensors (to detect internal temperature)
           + Door switch (to detect if the door is closed)
           + Timer (to monitor cooking time)
           -       Actuators:
           + Magnetron (to generate microwave radiation)
           + Turntable motor (to rotate food for even cooking)
           + Control panel (buttons and display)
           -      Environment: The kitchen or any area where the microwave is placed, including
           the food being heated.
        b. Chess program
           -       Sensors:
           + User input (keyboard/mouse for moves)
           + Game state (board position and rules)
           -       Actuators:
           + Display (showing the chess board and moves)
           + Output (making suggested moves or announcing game status)
           -     Environment: The virtual chessboard and the rules of chess, including the
           opponent's moves.
        c. Autonomous supply delivery plane
           -       Sensors:
           + GPS (for navigation and positioning)
           + Cameras and LIDAR (for obstacle detection)
           + IMU (Inertial Measurement Unit for orientation and motion)
           -       Actuators:
           + Propellers (for flight)
           + Control surfaces (ailerons, elevators, rudders for steering)
           + Delivery mechanism (to release packages)
           -     Environment: The outdoor airspace environment, including weather conditions,
           geographical obstacles, and designated delivery zones.
 2. For each of the following activities, give a PEAS description of the task environment and
    characterize it in terms of the properties:
    ● Fully Observable       vs   Partially                  ● Dynamic vs Static
      Observable                                             ● Discrete vs Continuous
    ● Deterministic                  vs                      ● Episodic vs Sequential
      Nondeterministic vs Stochastic
                                                             ● Known vs Unknown
    ● Competitive vs Collaborative
    ● Single-agent vs Multi-agent
→
    a. Playing soccer.
       -      PEAS description:
       + Performance Measure: Goals scored, successful passes, tackles, ball possession,
       teamwork, and overall game outcome (win, loss, or draw).
       + Environment: A soccer field, including goals, players (both teams), referees, and
       spectators. The physical attributes of the field (size, surface) and external conditions
       (weather, time of day) also play a role.
       + Actuators: Player actions (kicking, passing, dribbling, running, defending),
       communication signals (verbal and non-verbal cues), and game strategies.
       + Sensors: Players’ vision (to see the ball and other players), tactile feedback (feeling the
       ball and contact with other players), and auditory cues (hearing teammates and referee
       whistles).
       -      Characterization of the Soccer environment:
       + Partially Observable: Players cannot see the entire field or the positions of all players at
       all times. Information is incomplete due to obstructions and the dynamic nature of the
       game.
       + Stochastic: The outcome of actions can be uncertain due to various factors (e.g., player
       skill, opponent actions, and environmental conditions).
       + Competitive: The two teams compete against each other to win the game. Each team
       aims to score while preventing the other from doing so.
       + Multi-Agent: Multiple agents (players from both teams, referees) interact in the
       environment. Each player makes decisions based on their strategies and the actions of
       others.
       + Dynamic: The environment is constantly changing due to the actions of players and the
       movement of the ball. The state of the game evolves in real-time.
       + Continuous: The actions taken (e.g., running, passing) occur in a continuous space. The
       position of the ball and players can vary infinitely within the boundaries of the field.
       + Sequential: The outcome of current actions affects future states. Each play leads to
       further developments in the game, influencing strategies and decisions.
       + Unknown: Players may not have complete knowledge of the opponents’ strategies,
       abilities, or tactics, making the environment uncertain.
b. Exploring the subsurface oceans of Titan.
   -      PEAS description:
   + Performance Measure: Successful identification and analysis of subsurface ocean
   properties, detection of potential biosignatures, collection of geological data, and the
   effective operation of exploration equipment.
   + Environment: Titan’s subsurface ocean and ice crust, including the complex chemical
   environment, extreme temperatures, and geological features.
   + Actuators: Robotic arms for sampling, drills for penetrating ice, communication systems
   for data transmission, propulsion systems for mobility, and scientific instruments for
   analysis.
   + Sensors: Cameras for imaging, spectrometers for chemical analysis, temperature
   sensors, pressure sensors for depth measurements, and environmental sensors to
   monitor conditions.
   -      Characterization of the Exploration environment:
   + Partially Observable: The environment is not fully observable due to the thick ice crust
   and the inability to see the subsurface ocean directly. The exact conditions and
   composition remain uncertain until explored.
   + Nondeterministic: The environment exhibits unpredictable elements. The reactions of
   materials and potential geological activity can be uncertain, affecting mission outcomes.
   + Collaborative: If multiple missions or robotic systems are involved, they would need to
   work together to gather comprehensive data about Titan. However, individual missions
   can also be designed to operate independently.
   + Single-Agent: Typically, a single spacecraft or probe would be responsible for
   exploration, although multiple probes could be deployed in a mission.
   + Dynamic: The environment is dynamic due to ongoing geological processes, changing
   conditions in the subsurface, and potential movement of liquid.
   + Continuous: The environment involves continuous variables, such as temperature,
   pressure, and chemical concentrations, which change smoothly rather than in discrete
   steps.
   + Sequential: The exploration process is sequential, where each action (e.g., drilling,
   sampling) affects subsequent actions and decisions, requiring careful planning based on
   prior results.
   + Unknown: Many aspects of Titan’s subsurface environment remain unknown, including
   the specific composition of the oceans, the thickness of the ice crust, and potential
   biological processes.
c. Shopping for used AI books on the Internet.
   -      PEAS description:
   + Performance Measure: Successful identification and purchase of quality used AI books
   at a reasonable price, user satisfaction with the selection, and timely delivery of the
   purchased books.
   + Environment: Online marketplaces (e.g., Amazon, eBay, specialized bookstores),
   including product listings, seller ratings, reviews, and search functionalities.
   + Actuators: Actions taken by the shopper, such as searching, filtering results, adding
   items to the cart, purchasing, and communicating with sellers.
   + Sensors: Inputs from the user interface, such as search queries, filters (genre, condition,
   price), seller ratings, and reviews from other buyers.
   -      Characterization of the Shopping environment:
   + Partially Observable: Not all information about the condition of the books, seller
   reliability, or shipping times is available. Some details may depend on user reviews or
   seller disclosures.
   + Nondeterministic: The outcomes of actions (e.g., whether a book will be available at the
   desired price or in the expected condition) can be unpredictable and influenced by
   external factors like demand.
   + Competitive: Multiple buyers are competing for the same items, particularly popular or
   rare books. Sellers may also compete for buyers' attention.
   + Single-Agent: The shopper operates as an individual agent making decisions based on
   their preferences. However, the environment contains multiple agents (other shoppers
   and sellers).
   + Dynamic: The online marketplace is continuously changing, with new listings, changing
   prices, and varying availability of books based on demand and inventory.
   + Discrete: The actions and decisions (e.g., selecting specific books, making purchases) are
   discrete steps rather than continuous changes.
   + Sequential: The shopping process is sequential, as each decision (searching, filtering,
   choosing) influences the next steps in the purchasing process
   + Unknown: There are uncertainties regarding the condition of used books, reliability of
   sellers, and shipping times. Additionally, market dynamics can change rapidly.
d. Playing a tennis match.
   -      PEAS description:
   + Performance Measure: Points scored, games won, sets won, overall match outcome
   (win/loss), consistency in performance, and sportsmanship.
   + Environment: A tennis court (either indoor or outdoor), including the net, boundary
   lines, surface type (clay, grass, hard court), weather conditions (for outdoor matches), and
   audience presence.
   + Actuators: Player actions, including serving, hitting the ball (forehand, backhand,
   volleys), moving around the court, and strategic decision-making during play.
   + Sensors: Visual perception (seeing the ball, court, opponent's movements), auditory
   cues (hearing the ball hit the racket and the opponent’s calls), and proprioception
   (awareness of body position and movement).
   -      Characterization of the Soccer environment:
   + Partially Observable: Players cannot see the entire court and may miss aspects of their
   opponent’s strategy or positioning due to obstructions or their own focus on the ball.
   + Nondeterministic: The outcome of each shot is uncertain and can be influenced by
   factors such as player skill, spin on the ball, and unpredictable opponent responses.
   + Competitive: Each player competes against the other to win points, games, and
   ultimately the match, trying to outmaneuver and outplay their opponent.
   + Multi-Agent: A tennis match involves two agents (the players), each making
   independent decisions that influence the game dynamics.
   + Dynamic: The state of the game changes continuously as the ball is hit back and forth,
   and players move in response to each other's actions
   + Continuous: The motion of the ball and the players is continuous, and various actions
   can be performed at any moment, rather than in distinct steps.
   + Sequential: Each point in the match influences future points. Decisions made in one part
   of the match can affect the outcome of subsequent points and games.
   + Unknown: Players may not have complete knowledge of their opponent’s strategies,
   physical condition, or psychological state, making it difficult to predict actions.
e. Practicing tennis against a wall.
         -   PEAS description:
         + Performance Measure: Accuracy of shots, consistency (number of successful
         returns), improvement in technique, duration of practice session, and overall skill
         development.
         + Environment: A tennis court or designated practice area with a wall, including the
         court boundaries and possibly a net. The wall acts as a rebound surface for the ball.
         + Actuators: Player actions, such as serving, hitting the ball (forehand, backhand,
         volleys), and movement to position for each shot.
         + Sensors: Visual perception (tracking the ball's trajectory and bounce off the wall),
         auditory cues (sound of the ball hitting the wall and racket), and proprioception
         (awareness of body positioning and movement).
         -   Characterization of the Soccer environment:
         + Fully Observable: The player has complete visibility of the wall, the ball, and the
         playing area, allowing them to make informed decisions based on what they see.
         + Deterministic: The outcome of hitting the ball against the wall is predictable based
         on the angle and force applied, assuming consistent conditions (e.g., wall surface,
         ball condition).
         + Collaborative: The practice session is self-directed and focuses on personal
         improvement rather than competing against another player, though it could be
         collaborative if practicing with a partner for feedback.
         + Single-Agent: The practice primarily involves one player (the agent) performing
         actions without direct interaction with other agents. However, it could involve a
         coach or partner providing guidance.
         + Static: The environment remains relatively static during practice; the wall and
        court do not change, though the player's actions create dynamic responses (ball
        movement).
        + Continuous: The action of hitting the ball and its trajectory is continuous, allowing
        for smooth and fluid movements rather than distinct steps.
        + Sequential: Each hit affects the next; players must adjust their positioning and
        technique based on the previous shot's outcome, leading to a sequence of actions
        that build skills.
        + Known: The player has a good understanding of the practice environment (the
        wall's distance, bounce characteristics, etc.), making it a known setting for skill
        development.
f. Performing a high jump.
        -   PEAS description:
        + Performance Measure: Height cleared, technique (form during the jump), number
        of successful attempts, consistency in performance, and overall score in a
        competitive setting.
        + Environment: A high jump area, including the high jump bar, landing mat, runway,
        and surrounding track or field. Environmental factors like wind and surface type may
        also play a role.
        + Actuators: The athlete’s physical actions, including running, take-off, and jumping
        movements, as well as body positioning and technique during the jump.
        + Sensors: Visual perception (observing the bar and runway), proprioception
        (awareness of body movement and position), and sometimes feedback from
        coaches or trainers.
        -   Characterization of the Soccer environment:
        + Partially Observable: While the athlete can see the bar and landing area, factors
        like their own fatigue or technique may not be fully known at the moment of the
        jump.
        + Nondeterministic: The outcome of a jump can be unpredictable based on
        numerous factors, including technique, physical condition, and environmental
        conditions.
        + Competitive: The high jump event is competitive, as athletes aim to outperform
        each other by clearing higher heights.
        + Single-Agent: Each athlete performs as an individual agent during their jump,
        though they may be part of a larger event with multiple competitors.
        + Dynamic: The environment is dynamic; the athlete’s performance can change with
        each jump, and the conditions may vary (e.g., surface conditions, weather).
        + Continuous: The jump and its trajectory involve continuous motion, and the
        athlete can adjust their movements fluidly based on real-time feedback.
        + Sequential: Each jump builds on previous attempts, with the athlete adjusting
        technique and strategy based on earlier results in the competition.
        + Known: Athletes typically have a good understanding of their own abilities and the
        setup of the high jump environment, although some variables (like sudden fatigue
        or pressure) may introduce uncertainty.
g. Knitting a sweater.
   -      PEAS description:
   + Performance Measure: Quality of the finished sweater (fit, design, craftsmanship), time
   taken to complete the project, and adherence to the desired pattern or design
   specifications.
   + Environment: The workspace, including knitting materials (yarn, needles), patterns,
   tools (scissors, measuring tape), and lighting conditions.
   + Actuators: The knitter’s physical actions, including manipulating the yarn and needles,
   following the pattern, and making adjustments as needed.
   + Sensors: Visual feedback (seeing stitches and pattern), tactile feedback (feeling the
   texture of the yarn and stitches), and sometimes auditory feedback (sounds from the yarn
   or needles).
   -      Characterization of the Soccer environment:
   + Partially Observable: The knitter can see the work in progress and the pattern but may
   not always perceive subtle issues (e.g., tension variations) until they become more
   pronounced.
   + Nondeterministic: The outcome can be uncertain based on various factors, such as the
   knitter's skill level, yarn behavior, or pattern complexity, which may lead to unexpected
   results.
   + Collaborative: While knitting is often a personal task, it can be collaborative if done in
   groups (e.g., knitting circles) where knitters share ideas and techniques.
   + Single-Agent: Typically, the task is performed by a single knitter, although there may be
   interactions with others in a social setting.
   + Static: The environment itself is static, but the project evolves dynamically as the knitter
   adds stitches and progresses through the pattern.
   + Discrete: The knitting process involves discrete actions (e.g., casting on, knitting rows),
   with specific steps that can be counted.
   + Sequential: Each row of knitting builds on the previous ones, requiring the knitter to
   remember patterns and make adjustments throughout the process.
   + Known: Most aspects of the knitting process and materials are known to the knitter,
   especially if following a specific pattern, though some challenges may arise unexpectedly.
h. Bidding on an item at an auction.
   -      PEAS description:
   + Performance Measure: Winning the item at the lowest possible price, maximizing value
   (item worth vs. final bid), and strategic bidding effectiveness (timing and amount).
   + Environment: The auction setting, which can be physical (in-person) or digital (online
   auction platforms), including the item being auctioned, competing bidders, and auction
          rules.
          + Actuators: Actions taken by the bidder, such as placing bids, adjusting bidding strategies,
          and potentially communicating with other bidders or auctioneers.
          + Sensors: Information gathered from the auction platform or environment, including
          current bid amounts, the number of bidders, item details, and any auction
          announcements.
          -        Characterization of the Soccer environment:
          + Partially Observable: Bidders may not have complete information about the other
          bidders’ maximum bids, strategies, or intentions, leading to uncertainty in
          decision-making.
          + Nondeterministic: The outcomes of bids can be unpredictable due to the actions of
          other bidders, which can change rapidly and influence the bidding dynamics.
          + Competitive: The auction is inherently competitive, with bidders vying against each
          other to win the item, often driving the price higher.
          + Multi-Agent: Multiple bidders (agents) participate in the auction, each with their own
          strategies and decision-making processes.
          + Dynamic: The auction environment is dynamic, with changing bid amounts and the
          actions of bidders influencing the flow of the auction in real-time.
          + Discrete: Bidding occurs in discrete steps (specific bid amounts), with defined
          increments set by the auction rules.
          + Sequential: Each bid influences future bids; bidders may adjust their strategies based on
          previous bids and outcomes throughout the auction.
          + Unknown: Certain aspects, such as competitors’ maximum willingness to pay and their
          bidding strategies, are unknown, adding to the uncertainty of the auction process.
3. Consider a simple thermostat that turns on a furnace when the temperature is at least 3
   degrees below the setting, and turns off a furnace when the temperature is at least 3 degrees
   above the setting. Is a thermostat an instance of a simple reflex agent, a model-based reflex
   agent, or a goal-based agent? Explain your choice.
   → The thermostat is a model-based reflex.
   → Reason: A model-based reflex agent maintains an internal representation of the world, which
   allows it to keep track of its environment and make decisions based on both current and past
   states. The thermostat monitors the current temperature and compares it to a setpoint. It keeps
   track of when to activate or deactivate the furnace based on the temperature being above or
   below the threshold (3 degrees in this case). This internal model of the desired state (the
   setpoint) is crucial for its operation.
   It uses an internal model (the desired temperature setting) to determine actions (turning the
   furnace on or off) based on the current temperature.
LAB 03
 1. For the following graph:
        a. What is the order of visits of the nodes and the path returned by BFS?
           → Queue: S/ A B C (extend S) /D (extend A) /E G (extend B) /F (extend D) /H (extend E) /
           Order of visit: S → A → B → C → D → E → G
           The path returned: S → B → G
        b. What is the order of visits of the nodes and the path returned by DFS?
           → Queue: S / C B A / D (extend A) / F (extend D)/ G E (extend F) / H (extend E) /
           Order of visit: S → A → D → F → E → H → G
           The path returned: S → A → D → F → G
        c. What is the order of visits of the nodes and the path returned by UCS?
           → Priority Queue:     S0 C2 A3 B6 (extend S)
                                 S0 C2 A3 E3 B6 (extend C)
                                 S0 C2 A3 E3 B6 D6 (extend A)
                                 S0 C2 A3 E3 B5 D6 H8 F9 (extend E)
                                 S0 C2 A3 E3 B5 D6 H8 F9 G14 (extend B)
           Order of visit: S → C → A → E → B → D → H → F → G
           The path returned: S → C → E → B → G
 2. Consider a state space where the start state is number 1 and each state k has two successors:
    numbers 2k and 2k + 1.
        a. Draw the portion of the state space for states 1 to 15.
   →
b. Suppose the goal state is 11. List the order in which nodes will be visited for breadth
   first search, depth-limited search with limit 3, and iterative deepening search
   → BFS order: 1 2 3 4 5 6 7 8 9 10 11(Goal found)
   → DFS order: 1 2 4 8 5 10 11(Goal found)
   → IDS:
       ● Depth-limit 0: 1
       ● Depth-limit 1: 1 2 3
       ● Depth-limit 2: 1 2 4 5 3 6 7
       ● Depth-limit 3: 1 2 4 8 9 5 10 11(Goal found)
c. Call the action going from k to 2k Left, and the action going to 2k + 1 Right. Can you find
   an algorithm that outputs the solution to this problem without any search at all?
   → Given the binary tree structure where:
       ● Left action (from k to 2k) corresponds to a binary digit of 0.
       ● Right action (from k to 2k+1) corresponds to a binary digit of 1.
   → Algorithm:
       ● Convert the goal state to binary: The goal state 11 in binary is 1011.
       ● Interpret the binary representation:
              ○ Start from the root node 1.
              ○ Read each digit of the binary representation from left to right:
                      ■ If the digit is 0, move Left (to 2k).
                      ■ If the digit is 1, move Right (to 2k+1).
       ● Follow the binary digits of goal state:
              ○ 1st digit (1): Move Right to state 3.
              ○ 2nd digit (0): Move Left to state 6.
              ○ 3rd digit (1): Move Right to state 11.
       ● The sequence of states visited is: 1 → 3 → 6 → 11
3. You’re taking a class with 𝑛 students, and you want to find a 4-person project group (you and 3
   other students). You decide to formulate your problem as a search problem, as follows:
● State space: An unordered set of students in the group so far. For example, {you, Inky, Blinky} and
  {you, Blinky, Inky} are equivalent states.
● Successor function: You pick someone from the class (who is not already in your group), and add
  them to your group. If your group already has 4 students, there are no further successor states.
● Start state: The project group is just you, and nobody else.
● Goal test: Your group has exactly 4 students, and everyone in the group gets along. (You can
  assume that the goal test is somehow able to determine whether a group gets along.)
   Consider drawing the search graph for this problem.
       a. How many nodes are in the search graph?
           →
● There is possible 1 - person group: it's the group with just you.
               ●   There are                        possible 2-person groups: you, plus any of the n-1 other
                   people in class.
               ●   There are           (possible 3-person groups: you, plus some (unordered) group of 2
                   people among the n-1 others in class.
               ●   There are           possible 4-person groups: you, plus some (unordered) group of 2
                   people among the n-1 others in class.
               ●   Note that there are no states with groups with more than 4 people. The successor
                   function will not generate groups with more than 4 people.
        The questions on this page refer to the graph above, where the start state is A and the goal
        state is G. The number on an edge corresponds to the cost of traversing that edge.
State A B C D E F G H I J
Heuristic value 5 3 2 3 2 0 0 5 4 0
        Assume that each algorithm re-generates states that are not yet expanded, does not re-expand
        states, breaks ties in alphabetical ordering, and terminates after expanding the goal state.
        Note: These assumptions may differ with the operations of some of the algorithms in the
        textbook.
           a. What is the order of state expansions if you used Breadth-First Search?
              → Fringe:       A
              Expand A:       A→B
                              A→C
              Expand B:       A→B→D
                              A→B→E
              Expand C:       AàCàF
                              A→C→G
              Expand D:       A→B→D→H
                              A→B→D→I
   Expand E:
   Expand F:     A → C → F→ J
   Expand G:     Goal
   → Order of visit: A → B → C → D → E → F → G
   → Path: A → C → G
b. What is the order of state expansions if you used Depth-First Search? What path would
   the algorithm return?
   → Fringe:     A
   Expand A:     A→C
                 A→B
   Expand B:     A→B→E
                 A→B→D
                 A→B→C
   Expand C:     A→B→C→G
                 A→B→C→F
   Expand F:     A → B → C→ F →J
                 A→B→C→F→G
                 A→B→C→F→D
   Expand D:     A→B→F→D→I
                 A→B→F→D→H
   Expand H:
   Expand I
   Expand G:     Goal
   → Order of visit: A → B → C → F → D → H → I → G
   → Path: A → B → C → F → G
c. What is the order of state expansions if you used Uniform-Cost Search? What path
   would the algorithm return?
   → Fringe:     A
   Expand A:     A → B8
                 A → C6
   Expand C:     A → C6 → F9
                 A → C6 → G14
   Expand B:     A → B8 → D11
                 A → B8 → E13
   Expand F:     A → C6 → F9 →G10
                       ⇒(Change G14 because cost(F to G) = 10 < cost(C to G) = 14)
                 A → C6 → F9 → J11
                 Not F à D (cost (F to D) = 14 > cost (B to D) = 11)
   Expand G:     Goal
   → Order of visit: A → C → B → F → G
   → Path: A → C → F → G
d. What is the order of state expansions if you used Greedy Best-First Search? What path
   would the algorithm return?
   → Fringe(Stack):     A h(A)=5
   Expand A:     A → C h(C)=2
                 A → B h(B)=3
   Expand C:     A → C → G h(E)=0
                 A → C → F h(F)=0
   Expand F:     A → C → F → J h(J)=0
                 A → C → F → G h(G)=0
                 A → C → F → D h(D)=3
   Expand G:     Goal
   → Order of visit: A → C → F → G
   → Path: A → C → F → G
e. What is the order of state expansions if you used A* Search? What path would the
   algorithm return?
   → Fringe(Stack):     A
   Expand A:     A → C f(C) = g(A to C) + h(C) = 6 + 2 = 8
                 A → B f(B) = g(A to B) + h(B) = 8 + 3 = 11
   Expand C:     A → C → G f(G) = g(A to G) = 6 + 8 + 0 = 14
                 A → C → F f(F) = g(A to F) = 6 + 3 + 0 = 9
                 A → C → B f(B) = 11
   Expand F:     A → C → F → J f(J) = 11
                 A → C → F → G f(G) = 10
                 A → C → F → D f(D) = 17
   Expand G:     Goal
   → Order of visit: A → C → F → G
   → Path: A → C → F → G
LAB 05
 1. Informed Search.
 Search problem graph: S is the start state and G is the goal state.
 Tie-break in alphabetical order.
 ℎ(𝐵) is unknown and will be determined in the subquestions.
 a. In this question, refer to the graph above where the optimal path is 𝑆 → 𝐵 → 𝐷 → 𝐺. For each
    of the following subparts, you will be asked to write ranges of ℎ(𝐵). You should represent
    ranges as ___≤ ℎ(𝐵) ≤___ . Heuristic values can be any number including ±∞. For responses of
    ±∞, you can treat the provided inequalities as a strict inequality. If you believe that there is no
    possible range, please write "None" in the left-hand box and leave the right box empty.What is
    the range for the heuristic to be admissible?
       i.    What is the range for the heuristic to be admissible?
                           0 ≤ ℎ(𝐵) ≤ h*(n) = 2 + 3
                           → 0 ≤ ℎ(𝐵) ≤ 5
      ii.    What is the range for the heuristic to be consistent?
                           h(S) - h(B) ≤ cost(S - B) = 1   à h(S) - 1 ≤ h(B)   à 5 ≤ h(B)
                           h(B) - h(D) ≤ cost(B - D) = 2   à h(B) ≤ 2 + h(D)   à h(B) ≤ 4
                           → None ≤ ℎ(𝐵) ≤___
      iii.   Regardless of whether the heuristic is consistent, what range of heuristic values for B
             would allow A* tree search to still return the optimal path (𝑆 → 𝐵 → 𝐷 → 𝐺)?
                           When expanding D (S → A → D): f(D) = g(D) + h(D) = 2 + 3 + 2 = 7
                           When expanding D (S → B): f(B) = g(B) + h(B) = 1 + h(B)
                           Optimal path S → B → D →G, B expands before D:
                                    f(B) ≤ f(D) à 1 + h(B) ≤ 7 → h(B) ≤ 6
                          → -∞ ≤ ℎ(𝐵) ≤ 6
     iv.   Now of whether the heuristic is consistent, what range of heuristic values for B would
           allow A* tree search to still return the optimal path (𝑆 → 𝐵 → 𝐷 → 𝐺)?
                          → -∞ ≤ ℎ(𝐵) ≤ 6
2. Search and heuristics. Imagine a car-like agent wishes to exit a maze like the one shown below:
● The agent is directional and at all times faces some direction d ∈ (N, S, E, W).
● With a single action, the agent can either move forward at an adjustable velocity v or turn. The
  turning actions are left and right, which change the agent’s direction by 90 degrees. Turning is
  only permitted when the velocity is zero (and leaves it at zero).
● The moving actions are fast and slow. Fast increments the velocity by 1 and slow decrements the
  velocity by 1; in both cases the agent then moves a number of squares equal to its NEW adjusted
  velocity (see example below).
● A consequence of this formulation is that it is impossible for the agent to move in the same
  nonzero velocity for two consecutive timesteps.
● Any action that would result in a collision with a wall or reduce v below 0/above a maximum
  speed Vmax is illegal.
● The agent’s goal is to find a plan which parks it (stationary) on the exit square using as few
  actions (time steps) as possible.
As an example: if at timestep t the agent’s current velocity is 2, by taking the fast action, the agent
first increases the velocity to 3 and move 3 squares forward, such that at timestep t + 1 the agent’s
current velocity will be 3 and will be 3 squares away from where it was at timestep t. If instead the
agent takes the slow action, it first decreases its velocity to 1 and then moves 1 square forward,
such that at timestep t + 1 the agent’s current velocity will be 1 and will be 1 square away from
where it was at timestep t. If, with an instantaneous velocity of 1 at timestep t + 1, it takes the
slow action again, the agent’s current velocity will become 0, and it will not move at timestep t + 1.
Then at timestep t + 2, it will be free to turn if it wishes. Note that the agent could not have turned
at timestep t + 1 when it had a current velocity of 1, because it has to be stationary to turn.
       a. If the grid is M by N, what is the size of the state space? Justify your answer. You should
          assume that all configurations are reachable from the start state.
           → The agent’s position on the grid, which can be any square in an M x N grid. There are M
           x N possible positions.
               ● The agent’s direction, which can be North, South, East, or West, giving 4 possible
                 directions.
               ● The agent’s velocity, which ranges from 0 to Vmax inclusive. If the maximum
                 velocity is Vmax, there are Vmax + 1 possible velocity values (since velocity can be 0).
               ● Thus, the total number of states S is given by: S = M x N x 4 x (Vmax +. 1)
               ● This accounts for the agent’s position, direction, and velocity. Each configuration is
                 a unique state, and the assumption is that all configurations are reachable from
                 the start.
b. Is the Manhattan distance from the agent’s location to the exit’s location admissible?
   Why or why not?
     → The Manhattan distance is not admissible in this context because it underestimates
     the cost of reaching the goal. The agent’s velocity and the inability to change direction
     unless stationary means the actual cost of reaching the exit could be much higher than
     what the Manhattan distance suggests. For example, the agent may need to slow down,
     accelerate, or wait to turn, making the true cost greater than the simple Manhattan
     distance.
c.   State and justify a non-trivial admissible heuristic for this problem which is not the
     Manhattan distance to the exit.
     → A non-trivial admissible heuristic must consider the agent's velocity and other
     constraints while still never overestimating the cost.
     One possible heuristic could be “the number of steps required to decelerate to zero” plus
     the Manhattan distance to the exit. This is because the agent must come to a stop at the
     exit, and decelerating takes time, which the Manhattan distance doesn’t account for.
     → Heuristic:
        ● Assume the agent can decelerate as quickly as possible (i.e., by 1 unit per step).
        ● Add the minimum number of steps required to decelerate from the agent’s
          current velocity to zero (if the agent’s current velocity is v, it will take v steps to
          stop).
        ● Then add the Manhattan distance from the agent’s position to the goal (as the
          remaining number of steps after the agent has stopped).
     This heuristic remains admissible because it always underestimates the actual cost (the
     agent might have to slow down or turn more than anticipated).
d. If we used an inadmissible heuristic in A* graph search, would the search be complete?
   Would it be optimal?
     → Completeness: The search will still be complete, meaning it will always find a solution
     if one exists. A* will continue searching all possible states, even if the heuristic is
     inadmissible.
     → Optimality: The search would not necessarily be optimal. An inadmissible heuristic can
     overestimate the true cost to reach the goal, potentially causing A* to miss the optimal
     solution by focusing on suboptimal paths first.
e. If we used an admissible heuristic in A* graph search, is it guaranteed to return an
   optimal solution? What if the heuristic was consistent? What if we were using A* tree
   search instead of A* graph search?
     → A* Graph Search with Admissible Heuristic: It is guaranteed to return an optimal
     solution because an admissible heuristic ensures that the first solution found is optimal,
     even if other potential solutions remain.
     → A* Graph Search with Consistent Heuristic: If the heuristic is consistent, A* is still
     guaranteed to return an optimal solution. Consistency implies admissibility, and it also
     ensures that once a node is expanded, it will never be revisited, making A* more efficient.
   → A* Tree Search with Admissible Heuristic: In tree search, the same node can be
   expanded multiple times because tree search does not track explored nodes. This can
   lead to suboptimal solutions, even with an admissible heuristic. A consistent heuristic
   would still guarantee optimality in graph search but not necessarily in tree search due to
   potential re-expansions.
f. Give a possible advantage that an inadmissible heuristic might have over an admissible
   one.
   → An inadmissible heuristic can sometimes make the search faster because it may guide
   the search more aggressively toward the goal by overestimating costs. This can lead to
   fewer nodes being expanded, reducing computation time, though at the cost of
   potentially missing the optimal solution. Essentially, an inadmissible heuristic can improve
   the efficiency of finding a solution (though not necessarily the best one), especially in
   scenarios where speed is prioritized over optimality.
LAB 06
 1. Game.
      a. Consider the zero-sum game tree shown below. Triangles that point up, such as at the
         top node (root), represent choices for the maximizing player; triangles that point down
         represent choices for the minimizing player. Assuming both players act optimally, fill in
         the minimax value of each node:
          → The game involves two players: Kirby, who is the maximizing player, and King Dedede,
          who is the minimizing player.
           Fill in the manimax values of the game tree shown:
       ● Door 1:
              ○ Bottle A and B: Values 7 and 3 → Kirby gives Dedede max(7, 3) = 7.
              ○ Bottle C and D:Values 9 and 4 → Kirby gives Dedede max(9, 4) = 9.
              ○ Minimizing Dedede: Kirby will choose the minimum of 7 and 9, so min(7, 9) = 7.
       ● Door 2:
              ○      Bottle E and F: Values 6 and 2 → Kirby gives Dedede max(6, 2) = 6.
              ○ Bottle G and H:Values 1 and 7.5 → Kirby gives Dedede max(1, 7.5) = 7.5 .
              ○ Minimizing Dedede: Kirby will choose the minimum of 6 and 7.5, so min(6, 7.5) =
                6.
       ● Door 3:
              ○ Bottle I and J: Values 5 and 6 → Kirby gives Dedede max(5, 6) = 6.
              ○ Bottle K and L: Values 0 and 8 → Kirby gives Dedede max(0, 8) = 8.
              ○ Minimizing Dedede: Kirby will choose the minimum of 6 and 8, so min(6, 8) = 6.
       ● Final Minimax Values:
              ○ Door 1: Minimax value = 7
              ○ Door 2: Minimax value = 6
              ○ Door 3: Minimax value = 6
           → The door Kirby should choose to maximize his juice intake is Door 1, which gives a
           value of 7.
       b. Which terminal nodes are never explored as a consequence of pruning?
3. Sliding Puzzle.
   Consider the sliding puzzle game as a search problem. The sliding puzzle game consists of a
   three-by-three board with eight-numbered tiles and a blank space. At each turn, only a tile
adjacent to the blank space (left, right, above, below) can be moved into the blank space. The
objective is to reconfigure any generic starting state to the goal state (displayed on the right).
   a. What is the size of the state space?
       (A) 9            (B) 9*9          (C) 9!
   b. Does every board have a unique sequence of moves to reach the goal state?
       (A) Yes          (B) No
       → Not every board has a unique sequence of moves to reach the goal state because the
       puzzle can involve loops or different paths that lead to the same configuration.
   c. For the following heuristics, mark whether or not they are only admissible (admissible
      but not consistent), only consistent (consistent but not admissible), both, or neither.
          i.     h1(n) = Total number of misplaced tiles.
       (A) Only admissible        (B) Only consistent   (C) Both    (D) Neither
         ii.     This heuristic is both admissible (because it never overestimates the number of
                 moves required to reach the goal) and consistent (because moving one tile
                 changes the heuristic by at most 1).
       (A) Only admissible (B) Only consistent          (C) Both    (D) Neither
       → This heuristic is admissible because it does not overestimate the cost. However, it is not
       consistent because the jump between values may not reflect the actual cost between
       states.
        iii.     h3(n) = Sum of Manhattan distances for each individual tile to their goal
                 location.
       (A) Only admissible        (B) Only consistent   (C) Both    (D) Neither
       → This heuristic is both admissible and consistent because each tile's movement cost is
       correctly estimated without overestimation, and the sum of Manhattan distances is
       consistent.
        iv.      h4(n) = max(h1(n), h3(n)).
       (A) Only admissible (B) Only consistent          (C) Both    (D) Neither
         v.      h5(n) = min(h1(n), h3(n)).
       (A) Only admissible        (B) Only consistent   (C) Both    (D) Neither
       → This heuristic is admissible because both h1(n) and h3(n) are admissible. It is also
       consistent because taking the maximum of two consistent heuristics results in a consistent
       heuristic.
        vi.      h6(n) = h1(n) + h3(n)
       (A) Only admissible        (B) Only consistent   (C) Both    (D) Neither
        vii.     Comparing the heuristics above that you marked as admissible (or both), which
                 one is best (dominates the others)? If multiple heuristics are tied mark them all.
          (A) h1             (B) h2          (C) h3     (D) h4        (E) h5         (F) h6
          → Comparing the heuristics marked as admissible or both, the best heuristic is the one
          that gives the tightest lower bound while still being consistent. Therefore, both h3 (sum of
          Manhattan distances) and h4 (max of h1 and h3) are equally good since both are
          admissible and consistent.
4. State Spaces.
   Pacman finds himself in an N by M grid again. He starts in the top left corner of the grid and his
   goal is to reach the bottom right corner. However there are G ghosts in fixed locations around
   the grid that Pacman needs to kill before he reaches the goal. Pacman accomplishes this using
   his new car, the Pacmobile.
   Every turn Pacman chooses a radius value, r, which can be any integer from 0 to R inclusive.
   This shocks and kills all ghosts within r grids of him (by Manhattan distance). However, the
   Pacmobile has a limited battery that contains E (a positive integer) units of charge. Whenever it
   produces an electric field of radius r, the Pacmobile loses r units of charge. The Pacmobile has 5
   moving actions: left, right, up, down, and stop. All of those choices will cost 1 unit of charge as
   well. If the Pacmobile runs out of charge Pacman can no longer do anything and the game ends
   in a loss.
   To further clarify, on each turn Pacman chooses a movement action and radius value r. He then
   shocks all squares in a radius r around him, performs the movement, and loses r + 1 units of
   charge. Recall he starts with E units of charge and he has to kill G ghosts in fixed locations
   before reaching the bottom right square to win the game.Tie-break in alphabetical order.
      a. Mark true or false for the following subparts (Assume at least one solution exists for the
         search problem)
            i.     Running Breadth First Search on this search problem will return the solution that
                   takes the smallest amount of turns.
                   A. True            B. False
          → BFS guarantees finding the solution with the least number of actions (turns) since it
          explores all states level by level.
            ii.    Running Breadth First Search on this search problem will return the solution that
                   uses the smallest amount of charge.
                   A. True                   B. False
          → BFS prioritizes the number of moves, not the amount of charge used. There could be
          solutions that use more charge but fewer moves, which BFS would still return as the
          solution.
           iii.    Running Breadth First Search on this search problem will return the solution
                   where Pacman travels the least amount of grid distance.
                   A. True                   B. False
          → BFS minimizes the number of turns but not necessarily the grid distance. It could still
          choose suboptimal paths if they take fewer turns.
      b. Write your answers to the following two subparts in terms of N, M, G, R, E and any
         scalars you need.
      i.       What is the size of the state space, X, using a minimal state representation?
   → The state space is defined by:
           ●   Pacman's position on the grid: N×M possible positions.
           ● The charge left in the Pacmobile: E+1 possible values (since charge can range from
             0 to E).
           ●   Which ghosts are still alive: 2G possible states (each ghost can be either alive or
               dead).
           ● The current radius r of the Pacmobile: R+1 possible values (since r can range from
             0 to R).
   Thus, the size of the state space is: X = (N * M) * (E + 1) * 2^G * (R + 1)
     ii.       What is the maximum possible branching factor from any state?
   → For each turn, Pacman has:
           ●   5 movement actions (left, right, up, down, and stop).
           ● R+1 choices for the radius r.
   Each action also consumes some charge, but that doesn't affect the number of possible
   actions Pacman can take. Therefore, the maximum branching factor is: Branching factor =
   5 * (R + 1)
c. Now we are going to investigate how the size of the state space changes with certain
   rule changes. Write your answers to the following two subparts in terms of X (the size of
   the original state space) and the terms N, M, G, R, E and any scalars. Each subpart is
   independent of the other.
      i.       The ghosts are now more resilient so they have H health points. This means each
               ghost needs to spend H turns in the electric field before they die. What is the size
               of the state space with this adjustment?
   → If each ghost has H health points, the number of possible states for each ghost
   increases to H+1 (from 0 to H, where 0 means the ghost is dead). Therefore, for G ghosts,
   the number of possible states becomes (H + 1)G
   Thus, the new state space size is: X * (H + 1)^G
     ii.       The Pacmobile can no longer choose any radius value at every turn because it
               now takes time to increase and decrease the radius. If pacman is using a radius r
               on one turn, on the next turn he must choose radius values from r - 1, r, or r + 1.
   → This rule doesn’t increase the number of possible states, but it restricts the number of
   transitions between states. Therefore, the size of the state space remains the same, X, but
   the branching factor changes. Hence:
d. Mark whether the following heuristics are admissible or not. Following the original
   problem description.
      i.       h1(n) = The amount of charge remaining.
               A. Admissible         B. Not Admissible
   → The remaining charge does not directly estimate the cost to reach the goal. Pacman
   could have a lot of charge left but still need many moves to reach the goal, so this
heuristic can overestimate the actual cost.
  ii.   h2(n) = The number of ghosts still alive
        A. Admissible         B. Not Admissible
→ The number of ghosts remaining is a lower bound on the number of actions needed to
kill all the ghosts. Therefore, it is admissible as it never overestimates the remaining cost.
 iii.   h3(n) = The Manhattan distance between the Pacmobile and the bottom right
        corner.
        A. Admissible         B. Not Admissible
→ The Manhattan distance is a valid lower bound on the number of moves required to
reach the goal, as Pacman can always move at least one step closer to the goal per turn.
Hence, it’s admissible.
LAB 07
 1. Kirby is participating on a game show with his enemy, King Dedede! First, Kirby chooses one of
    three doors, behind each of which sit two boxes. King Dedede chooses one of the two boxes to
    give to Kirby. Each box contains two transparent juice bottles, of which Kirby chooses one to
    enjoy for himself.
    The bottles are filled up to different amounts, ranging from 0 (completely empty) to 10
    (completely full), inclusive.
        a. King Dedede changes the rules, and now Kirby must be blindfolded immediately after
           choosing a door. As a result, Kirby chooses a bottle randomly from any box with uniform
           probability. The resulting game tree is almost exactly the same as before, except that
           the bottom-most layer of maximizer nodes are now replaced with chance nodes.
             i.     Fill out the values on the game tree. Which door should Kirby choose?
           → Expected utility calculations for each box (50% chance per bottle)
                  ● Door 1:
                       ○ Bottle A and B: ( 7 + 3 ) / 2 = 5
                       ○ Bottle C and D: ( 9 + 4 ) / 2 = 6.5
                    → The lowest expected utility for Door 1 is 5
                  ● Door 1:
                       ○ Bottle E and F: ( 6 + 2 ) / 2 = 4
                       ○ Bottle G and H: ( 1 + 7.5 ) / 2 = 4.25
                    → The lowest expected utility for Door 2 is 4
                       ○ Bottle I and J: ( 5 + 6 ) / 2 = 5.5
                       ○ Bottle K and L: ( 0 + 8 ) / 2 = 4
                    →The lowest expected utility for Door 3 is 4
           → Kirby should choose Door 1 because Door 1 has the highest expected utility
             ii.    Which terminal nodes are never explored as a consequence of pruning?
     b. King Dedede wants to guarantee that, assuming Kirby plays the game optimally, his
        expected juice utility is minimized. How might King Dedede rearrange the juice bottles
        in the boxes to ensure this? Indicate your answer by filling in the terminal nodes of the
        empty game tree below (for your convenience, a couple nodes have already been filled
        out for you).
           Leaf node values (copied from previous question’s leaf nodes): 7, 3, 9, 4, 6, 2, 1, 7.5, 5,
           6, 0, 8. What is Kirby’s new expected utility after the rearrangement of the bottles?
           → Calculate the average chance node of 2 bottles.
           Choose the smallest values: 0, 1, 2, 3, 4, 5 to form pairs to minimize each door.
           Place the remaining values into any boxes: 6, 6, 7, 7.5, 8, 9.
2. Game.
     a. Consider the following game tree where all leaf nodes are between 0 and 10 (inclusive)
        and both expectation nodes choose the left branch with chance 0.8.
i.    Determine the values of𝑀,𝑁, and the root node.
      → Min (A, B) = Min (5, 3) = 5
      Min (C, D) = Min (9, 10) = 9
      ⇒ M = 3 x 0.8 + 9 x 0.2 = 4.2
      Min (E, F) = Min (5, 0) = 0
      Min (G, H) = Min (8, 5) = 5
      ⇒ N = 0 x 0.8 + 5 x 0.2 = 1
      ⇒ Root = Max (M, N) = Max (4.2, 1) = 4.2
ii.   Given the above expectimax problem, which node(s) can be pruned in
      alpha-beta pruning?
       → Max = 4.2
       It can be seen that E and F is 0 so there is no need to consider G and H because it
       will not be able to exceed 4.2
iii.   Now consider the general case where leaf values can take on any real value
       between 0 and 10 (inclusive). Do not prune on equality.
       What is the maximum number of nodes that can be pruned?
iv.   Now, let 𝑝 be the probability that each expectation node chooses the left branch,
      and assume 𝑝 cannot be 0 or 1. For the following game tree shown below (note
      this is different from the tree in part 1.), determine the bounds on 𝑝 such that the
      maximum number of leaf nodes can be pruned. Recall that probabilities must be
      between 0 and 1.
              i.      Find the optimal state values for Square 2 and Square 3. Your answer can be in
                      terms of x.
                      V*(2) = ?                     V*(3) = ?
                      → Step 1. Optimal value for V*(2)
                          ● From Square 2, Indiana only has one action available: Up, which moves him
                            to Square 4 with a reward of -10
         ● The Bellman equation for V*(2) is V* (2)=R(2, up, 4) + γ.V* (4)
         ● Since R(2, up, 4) = -10 and = 1, we get V* (2)= -10 + V* (4)
      → Step 2. Optimal value for V*(3)
         ● From Square 3, Indiana has two options:
                ○ Right which takes him to Square 4 with a reward of -10
                ○ Up, where he has a chance to slop and end up in Square 5 instead
                  of Square 4
                        ■ With probability 1- x, he goes to Square 4 with a reward of
                          -10
                        ■ With probability x, he slips and goes to Square 5 with a
                          reward of -4
         ● The Bellman equation for V*(3) becomes V* (3)=max(right,up)
                ○ Right action: right=R(3,right,4) + γ.V* (4)= -10 + V* (4)
                ○ Up action:       up=(1-x).(-10) + V* (4)) + x.(-4 + V* (5))
                ○ So we have:
                  V* (3)=max[-10 + V* (4), (1 - x).(-10) + V* (4))+x.(-4 + V* (5))]
                        ■ From the problem statement, Square 5 is a terminal state
                          with an Exit reward of 100. Since Indiana exits the grid once
                          he reaches Square 5, we set: V*(5) = 100
                        ■ From Square 4, the only action Indiana can take it Up, which
                          moves him deterministically to
         ● Square 5 with a reward of -10: V* (4)=R(2,up,4)+γ.V* (4)= -10+1.100=90
      ⇒ We have: V*(2) = 80; V*(3) = max(80, 80 + 16x), V*(4) = 90 and V*(5) = 100
ii.   For which values of x would Indiana prefer to go right on Square 1?
      → At Square 1, Indiana has two options:
         ● Up: This action moves Indiana to Square 2 with a reward of +5
                  V* (1, up)=5 + V* (2)
         ● Right: This action moves Indiana to Square 3 with a reward of -10
                   V* (1, right)=-10 + V* (3)
         ● To find the values of x for which Indiana would prefer going Right, we
           need:
                   V* (1,right)> V* (1,up)        or    -10+V* (3)>5+V* (2)
         ● Substitute V*(2) and V*(3) from the expression above, we have:
                  -10+max(80, 80 + 16x)> -5+90
                ○ Case 1: If 80 + 16x ≤80 then max(80, 80 + 16x) = 80. In this case:
                  -10 + 80 > 85          ⇒ 70 > 85
                    ⇒This is false, so Indiana would not prefer to go Right in this case
                ○ Case 2: If 80 + 16x > 80 then max(80, 80 + 16x) = 80 + 16x. In this
                  case:
-10 + (80 + 16x) > 85 ⇒ 70 + 16x > 85         ⇒x > 15/16
⇒ This means that Indiana will only choose to go Right if the
probability x of slipping from Square 3 to Square 5 (skipping Square
4) is greater than 15/16, or approximately 0.9375
LAB 09
 1. MDPs: Grid-World Water Park
    Consider the MDP drawn below. The state space consists of all squares in a grid-world water
    park. There is a single water slide that is composed of two ladder squares and two slide
    squares (marked with vertical bars and squiggly lines respectively). An agent in this water park
    can move from any square to any neighboring square, unless the current square is a slide in
    which case it must move forward one square along the slide. The actions are denoted by
    arrows between squares on the map and all deterministically move the agent in the given
    direction. The agent cannot stand still: it must move on each time step. Rewards are also
    shown below: the agent feels great pleasure as it slides down the water slide (+2), a certain
    amount of discomfort as it climbs the rungs of the ladder (-1), and receives rewards of 0
    otherwise. The time horizon is infinite; this MDP goes on forever.
       a. How many (deterministic) policies are possible for this MDP?
           → (3 squares with 1 direction) 1^3
           (11 squares with 2 direction) 2^11
           ⇒ 1^3 x 2^11 = 2^11
       b. Fill in the blank cells of this table with values that are correct for the corresponding
          function, discount, and state. Hint: You should not need to do a substantial calculation
          here. (Note: Vt(s) is the time-limited value of state s, if our Markov Decision Process
          were to terminate after t timesteps.)
       c. Fill in the blank cells of this table with the Q-values that result from applying the
          Q-update for the transition specified on each row. You may leave Q-values that are
          unaffected by the current update blank. Use discount = 1.0 and learning rate α = 0.5.
Assume all Q-values are initialized to 0. (Note: the specified transitions would not arise
from a single episode.)
LAB 10
 1. Quagmire
   Consider an unknown MDP with three states (A, B and C) and two actions (← and →). Suppose
   the agent chooses actions according to some policy π in the unknown MDP, collecting a
   dataset consisting of samples (s, a, s′, r) representing taking action in state s resulting in a
   transition to state s′ and a reward of r.
          Assume that all Q-values are initialized to 0, and use a learning rate of α = 1/2
            i.    Run Q-learning on the above experience table and fill in the following Q-values:
                  Q(A, →) = ?                   Q(B, →) = ?
            ii.   After running Q-learning and producing the above Q-values, you construct a
                  policy πQ that maximizes the Q-value in a given state:
BFS Fringe
Queue S→A S, A, G
S→A→B S→G
S→A→C
b. What path would uniform cost graph search return for this search problem?
UCS Fringe
Queue S S, A, C, D, B, G
S → G (12) S→A→C→G
S → A → B (1 + 3 = 4)
S → A → C (1 + 1 = 2)
S → A → C → D (1 + 1 + 1 = 3)
S → A → C → G (1 + 1 + 2 = 4)
S → A → C → D → G (1 + 1 + 1 + 3 = 6)
S → A → B → D (1 + 3 + 3 +7)
       c. What path would depth-first graph search return for this search problem?
    DFS              Fringe
Stack S S, A, B, D. G
S→A S→A→B→D→G
S→A→C
S→A→B
S→A→B→D
S→A→B→D→G
d. What path would A* graph search, using a consistent heuristic, return for this search
   problem?
A* Fringe
S → A → B (1 + 3 + 6 = 10)
S → A → C (1 + 1 + 2 = 4) (iii)
S → A → C→ D (1 + 1 + 1 + 3 = 6)
S → A → C → G (1 + 1 + 2 + 0 = 4) (iv)
S → A → C → D → G (1 + 1 + 1 + 3 = 6)
S → A → B → D (1 + 3 + 3 = 7)
e. Consider the heuristics for this problem shown in the table below.
               i.    Is h1 admissible?     (i) Yes (ii) No
              ii.    Is h1 consistent?     (i) Yes (ii) No
              iii.   Is h2 admissible?     (i) Yes (ii) No
              iv.    Is h2 consistent?     (i) Yes (ii) No
2. Search.
   For the following questions, consider the search problem shown on the left. It has only three
   states, and three directed edges. A is the start node and G is the goal node. To the right, four
   different heuristic functions are defined, numbered I through IV.
     a. In the blanks below, fill in the labels (not the numerical values) of the balls selected for
        nodes A, B1, B2 and B3. For example, if Alice’s optimal move is to select the box B1, and
        Bob selects the ball C, then A=C, B1=C.
           A=E                          B1 = E             B2 = H       B3 = K
           → Bob is a minimizer so he will choose E, H, K for B1, B2, B3 respectively. Alice is a
           maximizer, so she will choose the maximum among E, H, K which is E
     b. Let’s apply alpha-beta pruning for the tree. Please choose all the child nodes of the
        branches that could be pruned in the search tree.
           (a) B1              (b) B2            (c) B3    (d) C        (e) D        (f) E
           (g) F               (h) G             (i) H     (j) I        (k) J        (l) K
           → We cannot prune anything in the B2 branch because min (F, G) = 8 > min (C, D, E) = 6.
           We can prune K because after visiting J we know that the value at B3 <= 3, which must be
         smaller than the value at B1 (= 6) and will never get picked by Alice.
      c. Assume that you can reallocate all the balls to different boxes as long as in the end,
         each box has exactly 3 balls in it. Could you find a way to re-allocate the balls so that
         when we apply alpha-beta pruning, the right branch of B2 as well as the middle and the
         right branch of B3 will be pruned?
         Please list all balls in the order of their new positions in your solution. For example, if
         your solution is not to move any ball, then your answer is “C,D,E,F,G,H,I,J,K”. When
         multiple solutions exist, break ties alphabetically. For example, if there are two
         solutions starting with “C,D, ...” and “D,C, ...”, then your answer should be “C,D, ...”. If
         there is no solution, please write down “None”.
         → C, D, E, F, H, G, I, J, K
         The reasoning is similar to that of the previous part. After visiting C, D, E, we know that
         value at B1 = min (C, D, E) = 6. Then after visiting H = 1 < 6 we can prune G, and after
         visiting J = 3 < 6, we can prune I and K. Among all the feasible solution, this solution must
         come first in alphabetical order, since an earlier solution needs to either start with C, D, E,
         F, G, … or C, D, E, F, H, G, I, …, which do not satisfy the requirement.
4. Zero-sum game tree.
      a. Consider the following zero-sum game tree. Triangles that point up represent choices
         for the maximizing player; triangles that point down represent choices for the
         minimizing player. Assuming both players act optimally, fill in the minimax value of
         each node:
      b. Using alpha beta pruning, which leaf nodes or branches can we prune? Assume that
         branches are visited in left to right order.
c. If the branches of every minimizer node are reordered such that we prune the maximum
   number of leaf nodes, which leaf nodes can be pruned? Assume that children of
   maximizer nodes are visited from left to right and that we are not pruning on equality.
   Remember to label the order of visit for every branch of all minimizer nodes.
   → The value on the right affects the root node. Browse to the right first
   Max pruned = 5
d. Assume that we are not pruning on equality. In this part, we have a fixed traversal
   order from left to right. We start with the tree in the previous part, and shuffle the
   values of all the leaf nodes such that we check as few nodes as possible without
   changing the structure of the tree. The initial ordering (as in the tree presented in the
   previous part) is 6 10, 8, 7, 5, 4, 5, 6, 3,7, 2, and the new ordering is A. B,C,D,E,F,G,H,I,J,K.
   (Note: this question is challenging!)
             i.   For all possible new orderings, which of the results (root values) are possible?
                  → Given the leaf values 6, 10, 8, 7, 5, 4, 5, 6, 3, 7, 2, the root can take any value in
                  the range [2, 10], depending on the reordering.
            ii.   Suppose that for new ordering, the value of root is 4. Which of the following leaf
                  nodes are guaranteed to have value ⩽ 5
                  (a) A          (b) B           (c) C           (d) D
                  (e) E          (f) F           (g) G           (h) H
                  (i) I          (j) J           (k) K           (l) None of the above
           iii.   Suppose that for new ordering, the value of root is 5. Which of the following leaf
                  nodes are guaranteed to have value > 5
                  (a) A          (b) B           (c) C           (d) D
                  (e) E          (f) F           (g) G           (h) H
                  (i) I          (j) J           (k) K           (l) None of the above
5. Demolition Pacman.
   Pacman just bought a new square (S) in an M by N grid world that contains some rough terrain.
   He wants to clear a path from his new home to the closest grocery store, which is another
   square (G) on the grid world. However, there are R squares on the grid that are impassable
   because there are rocks in the way. Fortunately, Pacman is carrying D – 1 sticks of dynamite
   with him. On any turn, Pacman can throw a stick of dynamite at one of eight neighboring
   squares (diagonal throws are allowed) if it contains rocks, consuming a stick and removing the
   rocks in that location. If he chooses not to throw dynamite, Pacman can also drive his truck
   onto one of eight neighboring squares (diagonal moves are allowed) if there is nothing blocking
   him. However, his truck only has F — 1 units of fuel and whenever Pacman moves to a square,
   one unit of fuel is consumed. Once Pacman runs out of fuel and dynamite, he has no more
   actions available. Assume R, D, and F are all greater than 3 and 2 * D < R.
      a. Complete the expression below so that X evaluates to the size of the minimal state space.
         You can use integers as well as any of the variables mentioned in the problem.
   a=1                    b=1                  c=0
   d=0                    f=1                  u=R
   → We have M*N possible values for the location. F possible values for the amount of fuel
   left. We need to store a Boolean for each R rocks for whether it was destroyed or not.
   Thus we raise 2R and we don’t need R as a base. We don’t need to keep track of how
   much dynamite is left because every time we destroy a rock, we have used a dynamite
   and we can figure out how many rocks we destroyed with our Booleans for each rock.
b. What is the maximum possible branching factor?
   Maximum branching factor = 8
   → There are 8 neighboring squares. If there is a rock in one, we can throw a dynamite but
   we can’t move into it. If there is no rock, we can move into it but we can’t throw
   dynamite.
c. For each subpart below, select all algorithms that can be used to find the specified
   sequence of actions. The algorithm should be able to find such a sequence on any grid
   world that contains that sequence without failure. Assume you are allowed to define
   costs between states however you want.
     i.    Any sequence of action that results in Pacman reaching the Grocery store.
           (i) DFS Tree Search   (ii) DFS Graph Search       (iii) BFS
           (iv) UCS              (v) None of these
           → DFS Graph Search, BFS and USC are complete so they will always find a
           solution. DFS Tree Search is also complete in this problem because it can't be
           infinitely recursive since dynamite or fuel is used after every turn.
     ii.   The sequence of actions that results in Pacman reaching the Grocery store while
           using the least amount of dynamite possible.
           (i) DFS Tree Search   (ii) DFS Graph Search       (iii) BFS
           (iv) UCS              (v) None of these
           → We must use UCS with an edge cost between nodes that use dynamite and 0
           edge cost between nodes that use fuel.
    iii.   The sequence of actions that results in Pacman reaching the Grocery store while
           using the most amount of dynamite possible.
           (i) DFS Tree Search   (ii) DFS Graph Search       (iii) BFS
           (iv) UCS              (v) None of these
           → We can’t use UCS (or any of the other algorithms) to find a path with the
           “most” amount of transitions. We can’t use a positive edge costs for transitions
           that don’t use dynamite because that would find a path that avoids not using
           dynamite the most, which is different from using the most dynamite.
    iv.    The sequence of actions that results in Pacman reaching the Grocery store while
           having the lowest sum of fuel consumed plus dynamite consumed.
           (i) DFS Tree Search   (ii) DFS Graph Search       (iii) BFS
           (iv) UCS              (v) None of these
           → This is equivalent to finding the path with the least amount of turns since a
           dynamite of a fuel is used on every turn. UCS and BFS can both find this
d. Each subpart below describes a change to the above problem. All subparts are
   independent of each other, so over each question as if it was the only change to the
   above description. Note, X is equal to the size of the state space from the original
   problem. You are essentially finding a scaling factor that should be multiplied by the
   original state space size.
     i.     Pacman upgraded his truck so that he can now throw dynamite towards the
           north for free. It still costs him dynamite to throw in any of the other seven
           directions.
           Complete the expression below so that it evaluates to the size of the new
           minimal state space. You can use integers as well as any of the variables
           mentioned in the problem.
        b. Which nodes can be pruned from the game tree above through alpha-beta pruning? If
           no nodes can be pruned, explain why not. Assume the search goes from left to right;
           when choosing which child to visit first, choose the left-most unvisited child.
        c. Again, consider the same zero-sum game tree, except that now, instead of a minimizing
           player, we have a chance node that will select one of the three values uniformly at
           random. Fill in the expectimax value of each node. The game tree is redrawn below for
           your convenience.
      d. Which nodes can be pruned from the game tree above through alpha-beta pruning? If
         no nodes can be pruned, explain why not.
         → No nodes can be pruned. There will always be the possibility that an as-yet-unvisited
         leaf of the current parent chance node will have a very high value, which increases the
         overall average value for that chance node. For example, when we see that leaf 4 has a
         value of 2, which is much less than the value of the left chance node, 7, at this point we
         cannot make any assumptions about how the value of the middle chance node will
         ultimately be more or less in value than the left chance node. As it turns out, the leaf 5
         has a value of 15, which brings the expected value of the middle chance node to 8, which
         is greater than the value of the left chance node. In the case where there is an upper
         bound to the value of a leaf node, there is a possibility of pruning: suppose that an upper
         bound of +10 applies only to the children of the rightmost chance node. In this case, after
         seeing that leaf 7 has a value of 6 and leaf 8 has a value of 5, the best possible value that
         the rightmost chance node can take on is (6+5+10)/3 = 7, which is less than 8, the value of
         the middle chance node. Therefore, it is possible to prune leaf 9 in this case.
2. Search.
Admissible         0 <= h(n) <= h*(n)
                   h(n): cost indicated by h to reach goal from a
                   h*(n): optimal cost to reach a goal form n
A* Search Fringe
S (i) S, A, B, C, G
S → B : f = 1 + 7 = 8 (iii) S→G
S → A → B : f = 3 + 2 + 7 = 12 (*)
S→A→C:f=3+2+4=9
S → B → C : f = 1 + 3 + 4 = 8 (iv)
S→B→C→D=1+3+4+1=9
S → B → C → G = 1 + 3 + 4 + 0 = 8 (v)
S→B:h=7 S→G
S → A → C : h = 4 (iii)
S→A→C→D:h=1
S → A → C → G : h = 0 (iv)
       c. Now, consider another game tree which has the same structure as the original game
          tree shown above. This modified game tree can take on any values at the leaf nodes.
          What is the minimum and the maximum number of leaf nodes that can be pruned after
          running alpha-beta pruning?
           → For any game tree with the same structure:
             ● Minimum Leaf Nodes Pruned: 0 (if all branches must be explored).
             ● Maximum Leaf Nodes Pruned: 6 (depends on the best and worst-case values,
                 where pruning eliminates the largest possible subtrees).
5. Non Zero-sum Games.
      a. Let’s look at a non-zero-sum version of a game. In this formulation, player A’s utility will
         be represented as the first of the two leaf numbers, and player B’s utility will be
         represented as the second of the two leaf numbers. Fill in this non-zero game tree
          assuming each player is acting optimally.
      b. Which nodes can be pruned from the game tree above through alpha-beta pruning? If
         no nodes can be pruned, explain why not.
          No nodes can be pruned. Because this game is non-zero-sum, there can exist a leaf node
          anywhere in the tree that is good for both player A and player B.
6. MDPs Value Iteration
   Assume: V0(s) = 0, γ = 1/2
      a. Suppose that we are performing value iteration on the grid world MDP below.