0% found this document useful (0 votes)
76 views68 pages

Reflex Actions and AI System Analysis

Uploaded by

sa.dtn06271
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views68 pages

Reflex Actions and AI System Analysis

Uploaded by

sa.dtn06271
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

LAB 01

1. Are reflex actions (such as flinching from a hot stove) rational? Are they intelligent?
→ Reflex actions, such as flinching from a hot stove, are not typically considered rational or
intelligent in the traditional sense.
- Rationality: Reflex actions are automatic and do not involve conscious reasoning or
deliberation. Rationality usually implies a level of thought and decision-making. Reflexes
are quick, involuntary responses to stimuli designed to protect the body from harm, but
they are not driven by conscious thought processes or reasoning.
- Intelligence: Intelligence often involves the application of knowledge, reasoning, and
problem-solving. Reflex actions do not require these cognitive processes. Instead, they
are governed by the nervous system's automatic responses, bypassing the brain's higher
functions. Therefore, while reflexes are adaptive and effective for survival, they are not
considered intelligent because they do not involve the conscious application of
knowledge or thought.
2. To what extent are the following computer systems instances of artificial intelligence:
a. Supermarket barcode scanners.
→ AI Involvement: è Minimal
→ Reason: Supermarket bar code scanners are designed to read and decode barcodes to
retrieve product information and prices from a database. These scanners are primarily
concerned with pattern recognition and database access. The technology involved does
not utilize advanced AI techniques such as machine learning or natural language
processing. The functionality is based on straightforward, rule-based systems for
recognizing bar codes and performing basic database lookups.
b. Web search engines.
→ AI Involvement: Significant
→ Reason: Web search engines leverage AI extensively. They utilize algorithms optimized
with machine learning and natural language processing to enhance the efficiency of
search functions. AI helps in ranking web pages based on relevance, quality, and user
intent. Search engines continuously learn from user interactions and feedback to improve
search accuracy and relevance, reflecting a deep integration of AI principles.
c. Voice-activated telephone menus.
→ AI Involvement: Moderate
→ Reason: Voice-activated telephone menus use speech recognition technology to
interpret user commands and perform actions accordingly. While this involves some AI,
particularly in terms of recognizing and processing voice inputs, these systems typically
operate based on predefined rules and commands. They do not adapt or learn beyond
their programmed instructions, making their AI involvement more rule-based compared
to more adaptive AI systems.
d. Spelling and grammar correction features in word processing programs
→ AI Involvement: Significant
→ Reason: Speech recognition is a prominent AI technology that allows machines to
understand and process spoken language. It involves complex AI techniques, including
deep learning models, to convert spoken words into text or commands. Systems such as
virtual assistants (e.g., Siri, Alexa) use speech recognition to facilitate natural interactions
with users. The technology enables high accuracy and adaptive capabilities,
demonstrating a significant application of AI.
e. Internet routing algorithms that respond dynamically to the state of the network
→ AI Involvement: Extensive
→ Reason: Autonomous planning and scheduling represent a sophisticated branch of AI.
This technology involves creating and executing strategies to achieve specific goals, often
in dynamic and complex environments. It employs AI techniques to coordinate activities,
optimize resources, and adapt to changing conditions. Autonomous systems, such as
robots and unmanned vehicles, rely on advanced AI for effective decision-making and task
execution, showcasing an extensive use of AI principles.
LAB 02
1. For each of the following agents, specify the sensors, actuators, and environment:
a. Microwave oven
- Sensors:
+ Temperature sensors (to detect internal temperature)
+ Door switch (to detect if the door is closed)
+ Timer (to monitor cooking time)
- Actuators:
+ Magnetron (to generate microwave radiation)
+ Turntable motor (to rotate food for even cooking)
+ Control panel (buttons and display)
- Environment: The kitchen or any area where the microwave is placed, including
the food being heated.
b. Chess program
- Sensors:
+ User input (keyboard/mouse for moves)
+ Game state (board position and rules)
- Actuators:
+ Display (showing the chess board and moves)
+ Output (making suggested moves or announcing game status)
- Environment: The virtual chessboard and the rules of chess, including the
opponent's moves.
c. Autonomous supply delivery plane
- Sensors:
+ GPS (for navigation and positioning)
+ Cameras and LIDAR (for obstacle detection)
+ IMU (Inertial Measurement Unit for orientation and motion)
- Actuators:
+ Propellers (for flight)
+ Control surfaces (ailerons, elevators, rudders for steering)
+ Delivery mechanism (to release packages)
- Environment: The outdoor airspace environment, including weather conditions,
geographical obstacles, and designated delivery zones.
2. For each of the following activities, give a PEAS description of the task environment and
characterize it in terms of the properties:
● Fully Observable vs Partially ● Dynamic vs Static
Observable ● Discrete vs Continuous
● Deterministic vs ● Episodic vs Sequential
Nondeterministic vs Stochastic
● Known vs Unknown
● Competitive vs Collaborative
● Single-agent vs Multi-agent

a. Playing soccer.
- PEAS description:
+ Performance Measure: Goals scored, successful passes, tackles, ball possession,
teamwork, and overall game outcome (win, loss, or draw).
+ Environment: A soccer field, including goals, players (both teams), referees, and
spectators. The physical attributes of the field (size, surface) and external conditions
(weather, time of day) also play a role.
+ Actuators: Player actions (kicking, passing, dribbling, running, defending),
communication signals (verbal and non-verbal cues), and game strategies.
+ Sensors: Players’ vision (to see the ball and other players), tactile feedback (feeling the
ball and contact with other players), and auditory cues (hearing teammates and referee
whistles).
- Characterization of the Soccer environment:
+ Partially Observable: Players cannot see the entire field or the positions of all players at
all times. Information is incomplete due to obstructions and the dynamic nature of the
game.
+ Stochastic: The outcome of actions can be uncertain due to various factors (e.g., player
skill, opponent actions, and environmental conditions).
+ Competitive: The two teams compete against each other to win the game. Each team
aims to score while preventing the other from doing so.
+ Multi-Agent: Multiple agents (players from both teams, referees) interact in the
environment. Each player makes decisions based on their strategies and the actions of
others.
+ Dynamic: The environment is constantly changing due to the actions of players and the
movement of the ball. The state of the game evolves in real-time.
+ Continuous: The actions taken (e.g., running, passing) occur in a continuous space. The
position of the ball and players can vary infinitely within the boundaries of the field.
+ Sequential: The outcome of current actions affects future states. Each play leads to
further developments in the game, influencing strategies and decisions.
+ Unknown: Players may not have complete knowledge of the opponents’ strategies,
abilities, or tactics, making the environment uncertain.
b. Exploring the subsurface oceans of Titan.
- PEAS description:
+ Performance Measure: Successful identification and analysis of subsurface ocean
properties, detection of potential biosignatures, collection of geological data, and the
effective operation of exploration equipment.
+ Environment: Titan’s subsurface ocean and ice crust, including the complex chemical
environment, extreme temperatures, and geological features.
+ Actuators: Robotic arms for sampling, drills for penetrating ice, communication systems
for data transmission, propulsion systems for mobility, and scientific instruments for
analysis.
+ Sensors: Cameras for imaging, spectrometers for chemical analysis, temperature
sensors, pressure sensors for depth measurements, and environmental sensors to
monitor conditions.
- Characterization of the Exploration environment:
+ Partially Observable: The environment is not fully observable due to the thick ice crust
and the inability to see the subsurface ocean directly. The exact conditions and
composition remain uncertain until explored.
+ Nondeterministic: The environment exhibits unpredictable elements. The reactions of
materials and potential geological activity can be uncertain, affecting mission outcomes.
+ Collaborative: If multiple missions or robotic systems are involved, they would need to
work together to gather comprehensive data about Titan. However, individual missions
can also be designed to operate independently.
+ Single-Agent: Typically, a single spacecraft or probe would be responsible for
exploration, although multiple probes could be deployed in a mission.
+ Dynamic: The environment is dynamic due to ongoing geological processes, changing
conditions in the subsurface, and potential movement of liquid.
+ Continuous: The environment involves continuous variables, such as temperature,
pressure, and chemical concentrations, which change smoothly rather than in discrete
steps.
+ Sequential: The exploration process is sequential, where each action (e.g., drilling,
sampling) affects subsequent actions and decisions, requiring careful planning based on
prior results.
+ Unknown: Many aspects of Titan’s subsurface environment remain unknown, including
the specific composition of the oceans, the thickness of the ice crust, and potential
biological processes.
c. Shopping for used AI books on the Internet.
- PEAS description:
+ Performance Measure: Successful identification and purchase of quality used AI books
at a reasonable price, user satisfaction with the selection, and timely delivery of the
purchased books.
+ Environment: Online marketplaces (e.g., Amazon, eBay, specialized bookstores),
including product listings, seller ratings, reviews, and search functionalities.
+ Actuators: Actions taken by the shopper, such as searching, filtering results, adding
items to the cart, purchasing, and communicating with sellers.
+ Sensors: Inputs from the user interface, such as search queries, filters (genre, condition,
price), seller ratings, and reviews from other buyers.
- Characterization of the Shopping environment:
+ Partially Observable: Not all information about the condition of the books, seller
reliability, or shipping times is available. Some details may depend on user reviews or
seller disclosures.
+ Nondeterministic: The outcomes of actions (e.g., whether a book will be available at the
desired price or in the expected condition) can be unpredictable and influenced by
external factors like demand.
+ Competitive: Multiple buyers are competing for the same items, particularly popular or
rare books. Sellers may also compete for buyers' attention.
+ Single-Agent: The shopper operates as an individual agent making decisions based on
their preferences. However, the environment contains multiple agents (other shoppers
and sellers).
+ Dynamic: The online marketplace is continuously changing, with new listings, changing
prices, and varying availability of books based on demand and inventory.
+ Discrete: The actions and decisions (e.g., selecting specific books, making purchases) are
discrete steps rather than continuous changes.
+ Sequential: The shopping process is sequential, as each decision (searching, filtering,
choosing) influences the next steps in the purchasing process
+ Unknown: There are uncertainties regarding the condition of used books, reliability of
sellers, and shipping times. Additionally, market dynamics can change rapidly.
d. Playing a tennis match.
- PEAS description:
+ Performance Measure: Points scored, games won, sets won, overall match outcome
(win/loss), consistency in performance, and sportsmanship.
+ Environment: A tennis court (either indoor or outdoor), including the net, boundary
lines, surface type (clay, grass, hard court), weather conditions (for outdoor matches), and
audience presence.
+ Actuators: Player actions, including serving, hitting the ball (forehand, backhand,
volleys), moving around the court, and strategic decision-making during play.
+ Sensors: Visual perception (seeing the ball, court, opponent's movements), auditory
cues (hearing the ball hit the racket and the opponent’s calls), and proprioception
(awareness of body position and movement).
- Characterization of the Soccer environment:
+ Partially Observable: Players cannot see the entire court and may miss aspects of their
opponent’s strategy or positioning due to obstructions or their own focus on the ball.
+ Nondeterministic: The outcome of each shot is uncertain and can be influenced by
factors such as player skill, spin on the ball, and unpredictable opponent responses.
+ Competitive: Each player competes against the other to win points, games, and
ultimately the match, trying to outmaneuver and outplay their opponent.
+ Multi-Agent: A tennis match involves two agents (the players), each making
independent decisions that influence the game dynamics.
+ Dynamic: The state of the game changes continuously as the ball is hit back and forth,
and players move in response to each other's actions
+ Continuous: The motion of the ball and the players is continuous, and various actions
can be performed at any moment, rather than in distinct steps.
+ Sequential: Each point in the match influences future points. Decisions made in one part
of the match can affect the outcome of subsequent points and games.
+ Unknown: Players may not have complete knowledge of their opponent’s strategies,
physical condition, or psychological state, making it difficult to predict actions.
e. Practicing tennis against a wall.
- PEAS description:
+ Performance Measure: Accuracy of shots, consistency (number of successful
returns), improvement in technique, duration of practice session, and overall skill
development.
+ Environment: A tennis court or designated practice area with a wall, including the
court boundaries and possibly a net. The wall acts as a rebound surface for the ball.
+ Actuators: Player actions, such as serving, hitting the ball (forehand, backhand,
volleys), and movement to position for each shot.
+ Sensors: Visual perception (tracking the ball's trajectory and bounce off the wall),
auditory cues (sound of the ball hitting the wall and racket), and proprioception
(awareness of body positioning and movement).
- Characterization of the Soccer environment:
+ Fully Observable: The player has complete visibility of the wall, the ball, and the
playing area, allowing them to make informed decisions based on what they see.
+ Deterministic: The outcome of hitting the ball against the wall is predictable based
on the angle and force applied, assuming consistent conditions (e.g., wall surface,
ball condition).
+ Collaborative: The practice session is self-directed and focuses on personal
improvement rather than competing against another player, though it could be
collaborative if practicing with a partner for feedback.
+ Single-Agent: The practice primarily involves one player (the agent) performing
actions without direct interaction with other agents. However, it could involve a
coach or partner providing guidance.
+ Static: The environment remains relatively static during practice; the wall and
court do not change, though the player's actions create dynamic responses (ball
movement).
+ Continuous: The action of hitting the ball and its trajectory is continuous, allowing
for smooth and fluid movements rather than distinct steps.
+ Sequential: Each hit affects the next; players must adjust their positioning and
technique based on the previous shot's outcome, leading to a sequence of actions
that build skills.
+ Known: The player has a good understanding of the practice environment (the
wall's distance, bounce characteristics, etc.), making it a known setting for skill
development.
f. Performing a high jump.
- PEAS description:
+ Performance Measure: Height cleared, technique (form during the jump), number
of successful attempts, consistency in performance, and overall score in a
competitive setting.
+ Environment: A high jump area, including the high jump bar, landing mat, runway,
and surrounding track or field. Environmental factors like wind and surface type may
also play a role.
+ Actuators: The athlete’s physical actions, including running, take-off, and jumping
movements, as well as body positioning and technique during the jump.
+ Sensors: Visual perception (observing the bar and runway), proprioception
(awareness of body movement and position), and sometimes feedback from
coaches or trainers.
- Characterization of the Soccer environment:
+ Partially Observable: While the athlete can see the bar and landing area, factors
like their own fatigue or technique may not be fully known at the moment of the
jump.
+ Nondeterministic: The outcome of a jump can be unpredictable based on
numerous factors, including technique, physical condition, and environmental
conditions.
+ Competitive: The high jump event is competitive, as athletes aim to outperform
each other by clearing higher heights.
+ Single-Agent: Each athlete performs as an individual agent during their jump,
though they may be part of a larger event with multiple competitors.
+ Dynamic: The environment is dynamic; the athlete’s performance can change with
each jump, and the conditions may vary (e.g., surface conditions, weather).
+ Continuous: The jump and its trajectory involve continuous motion, and the
athlete can adjust their movements fluidly based on real-time feedback.
+ Sequential: Each jump builds on previous attempts, with the athlete adjusting
technique and strategy based on earlier results in the competition.
+ Known: Athletes typically have a good understanding of their own abilities and the
setup of the high jump environment, although some variables (like sudden fatigue
or pressure) may introduce uncertainty.
g. Knitting a sweater.
- PEAS description:
+ Performance Measure: Quality of the finished sweater (fit, design, craftsmanship), time
taken to complete the project, and adherence to the desired pattern or design
specifications.
+ Environment: The workspace, including knitting materials (yarn, needles), patterns,
tools (scissors, measuring tape), and lighting conditions.
+ Actuators: The knitter’s physical actions, including manipulating the yarn and needles,
following the pattern, and making adjustments as needed.
+ Sensors: Visual feedback (seeing stitches and pattern), tactile feedback (feeling the
texture of the yarn and stitches), and sometimes auditory feedback (sounds from the yarn
or needles).
- Characterization of the Soccer environment:
+ Partially Observable: The knitter can see the work in progress and the pattern but may
not always perceive subtle issues (e.g., tension variations) until they become more
pronounced.
+ Nondeterministic: The outcome can be uncertain based on various factors, such as the
knitter's skill level, yarn behavior, or pattern complexity, which may lead to unexpected
results.
+ Collaborative: While knitting is often a personal task, it can be collaborative if done in
groups (e.g., knitting circles) where knitters share ideas and techniques.
+ Single-Agent: Typically, the task is performed by a single knitter, although there may be
interactions with others in a social setting.
+ Static: The environment itself is static, but the project evolves dynamically as the knitter
adds stitches and progresses through the pattern.
+ Discrete: The knitting process involves discrete actions (e.g., casting on, knitting rows),
with specific steps that can be counted.
+ Sequential: Each row of knitting builds on the previous ones, requiring the knitter to
remember patterns and make adjustments throughout the process.
+ Known: Most aspects of the knitting process and materials are known to the knitter,
especially if following a specific pattern, though some challenges may arise unexpectedly.
h. Bidding on an item at an auction.
- PEAS description:
+ Performance Measure: Winning the item at the lowest possible price, maximizing value
(item worth vs. final bid), and strategic bidding effectiveness (timing and amount).
+ Environment: The auction setting, which can be physical (in-person) or digital (online
auction platforms), including the item being auctioned, competing bidders, and auction
rules.
+ Actuators: Actions taken by the bidder, such as placing bids, adjusting bidding strategies,
and potentially communicating with other bidders or auctioneers.
+ Sensors: Information gathered from the auction platform or environment, including
current bid amounts, the number of bidders, item details, and any auction
announcements.
- Characterization of the Soccer environment:
+ Partially Observable: Bidders may not have complete information about the other
bidders’ maximum bids, strategies, or intentions, leading to uncertainty in
decision-making.
+ Nondeterministic: The outcomes of bids can be unpredictable due to the actions of
other bidders, which can change rapidly and influence the bidding dynamics.
+ Competitive: The auction is inherently competitive, with bidders vying against each
other to win the item, often driving the price higher.
+ Multi-Agent: Multiple bidders (agents) participate in the auction, each with their own
strategies and decision-making processes.
+ Dynamic: The auction environment is dynamic, with changing bid amounts and the
actions of bidders influencing the flow of the auction in real-time.
+ Discrete: Bidding occurs in discrete steps (specific bid amounts), with defined
increments set by the auction rules.
+ Sequential: Each bid influences future bids; bidders may adjust their strategies based on
previous bids and outcomes throughout the auction.
+ Unknown: Certain aspects, such as competitors’ maximum willingness to pay and their
bidding strategies, are unknown, adding to the uncertainty of the auction process.
3. Consider a simple thermostat that turns on a furnace when the temperature is at least 3
degrees below the setting, and turns off a furnace when the temperature is at least 3 degrees
above the setting. Is a thermostat an instance of a simple reflex agent, a model-based reflex
agent, or a goal-based agent? Explain your choice.
→ The thermostat is a model-based reflex.
→ Reason: A model-based reflex agent maintains an internal representation of the world, which
allows it to keep track of its environment and make decisions based on both current and past
states. The thermostat monitors the current temperature and compares it to a setpoint. It keeps
track of when to activate or deactivate the furnace based on the temperature being above or
below the threshold (3 degrees in this case). This internal model of the desired state (the
setpoint) is crucial for its operation.
It uses an internal model (the desired temperature setting) to determine actions (turning the
furnace on or off) based on the current temperature.
LAB 03
1. For the following graph:

a. What is the order of visits of the nodes and the path returned by BFS?
→ Queue: S/ A B C (extend S) /D (extend A) /E G (extend B) /F (extend D) /H (extend E) /
Order of visit: S → A → B → C → D → E → G
The path returned: S → B → G
b. What is the order of visits of the nodes and the path returned by DFS?
→ Queue: S / C B A / D (extend A) / F (extend D)/ G E (extend F) / H (extend E) /
Order of visit: S → A → D → F → E → H → G
The path returned: S → A → D → F → G
c. What is the order of visits of the nodes and the path returned by UCS?
→ Priority Queue: S0 C2 A3 B6 (extend S)
S0 C2 A3 E3 B6 (extend C)
S0 C2 A3 E3 B6 D6 (extend A)
S0 C2 A3 E3 B5 D6 H8 F9 (extend E)
S0 C2 A3 E3 B5 D6 H8 F9 G14 (extend B)
Order of visit: S → C → A → E → B → D → H → F → G
The path returned: S → C → E → B → G
2. Consider a state space where the start state is number 1 and each state k has two successors:
numbers 2k and 2k + 1.
a. Draw the portion of the state space for states 1 to 15.

b. Suppose the goal state is 11. List the order in which nodes will be visited for breadth
first search, depth-limited search with limit 3, and iterative deepening search
→ BFS order: 1 2 3 4 5 6 7 8 9 10 11(Goal found)
→ DFS order: 1 2 4 8 5 10 11(Goal found)
→ IDS:
● Depth-limit 0: 1
● Depth-limit 1: 1 2 3
● Depth-limit 2: 1 2 4 5 3 6 7
● Depth-limit 3: 1 2 4 8 9 5 10 11(Goal found)
c. Call the action going from k to 2k Left, and the action going to 2k + 1 Right. Can you find
an algorithm that outputs the solution to this problem without any search at all?
→ Given the binary tree structure where:
● Left action (from k to 2k) corresponds to a binary digit of 0.
● Right action (from k to 2k+1) corresponds to a binary digit of 1.
→ Algorithm:
● Convert the goal state to binary: The goal state 11 in binary is 1011.
● Interpret the binary representation:
○ Start from the root node 1.
○ Read each digit of the binary representation from left to right:
■ If the digit is 0, move Left (to 2k).
■ If the digit is 1, move Right (to 2k+1).
● Follow the binary digits of goal state:
○ 1st digit (1): Move Right to state 3.
○ 2nd digit (0): Move Left to state 6.
○ 3rd digit (1): Move Right to state 11.
● The sequence of states visited is: 1 → 3 → 6 → 11
3. You’re taking a class with 𝑛 students, and you want to find a 4-person project group (you and 3
other students). You decide to formulate your problem as a search problem, as follows:
● State space: An unordered set of students in the group so far. For example, {you, Inky, Blinky} and
{you, Blinky, Inky} are equivalent states.
● Successor function: You pick someone from the class (who is not already in your group), and add
them to your group. If your group already has 4 students, there are no further successor states.
● Start state: The project group is just you, and nobody else.
● Goal test: Your group has exactly 4 students, and everyone in the group gets along. (You can
assume that the goal test is somehow able to determine whether a group gets along.)
Consider drawing the search graph for this problem.
a. How many nodes are in the search graph?

● Recall that is the number of ways to choose k elements from a set of


elements. Also, 01=1.
● Recall that each node of the search graph represents a state, so this question is asking
how many states are in this problem.
● The state space is a set of people in the group so far. This set could contain 1 person (you),
or 2 people (you and one other), or 3 people (you and two others), or 4 people (you and
three others). Therefore, we need to count up how many possible groups (always
including yourself) of size up to 4 could be made from the class of n students.

● There is possible 1 - person group: it's the group with just you.

● There are possible 2-person groups: you, plus any of the n-1 other
people in class.

● There are (possible 3-person groups: you, plus some (unordered) group of 2
people among the n-1 others in class.

● There are possible 4-person groups: you, plus some (unordered) group of 2
people among the n-1 others in class.
● Note that there are no states with groups with more than 4 people. The successor
function will not generate groups with more than 4 people.

● Therefore, the answer is:

b. Does the search graph contain any cycles?


→ No, because it is only possible to move to states with more students in the group
● Each node in the graph represents a state, and each directed edge represents an
action that takes you from one state to another state. In this problem's successor
function, the only possible action is to go from a smaller group to a larger group. It
is not possible to take an action that goes from a larger group back to a smaller
group. In other words, if you took all the nodes (states) in the graph and ordered
them by number of people in the group so far, all the directed edges would point
from left to right.
c. What is the maximum branching factor for this search tree?
→ The answer is (n-1)
● The branching factor represents the number of actions available from a state.
● From the root (just you in the group), there is a choice of n-1 people to add to
your group. Therefore, there are n-1 possible actions from the root, and the
branching factor from the root is n-1.
● Future nodes deeper in the tree will have fewer choices of students to add to your
group. For example, at depth 1, where the group has two people (you and one
other), there is only a choice of n-2 people to add to your group. Therefore, the
branching factors of nodes deeper in the tree will be less than n-1, and the
maximum branching factor is from the node at the root.
d. What is the maximum depth of this search tree, assuming the root node is at depth 0?
→ The answer is 3
● Each extra layer of the tree corresponds to an additional action, so the maximum depth of
the tree corresponds to the maximum number of actions that could be taken in this
problem.
● It takes 3 actions to create a group of 4 people, and after that, there are no more actions
or successor states available.
e. As 𝑛 and 𝑘 grow very large, which search algorithm will expand fewer nodes to find a
solution?
● Depth-first tree search
● Breadth-first tree search
● Not enough information
→ The answer is “Not enough information”
● In this problem, all the solutions to the problem are at depth k-1 in the search tree,
because it takes k-1 actions to build a group of k students.
● Also, any node at depth k-1 is a solution to the problem, because any group of k students
is a valid solution.
● BFS will look through all the nodes at depth 1 (groups of 2), then all the nodes at depth 2
(groups of 3), then all the nodes at depth 3 (groups of 4), etc. before ever looking at a
node at depth k-1 (group of k).
● On the other hand, DFS will repeatedly expand the deepest node, expanding a single
depth-0 node, then a single depth-1 node, then a single depth-2 node, etc. until it pops
the first depth k-1 node off the fringe and returns it as a solution.
f. Approximately how many nodes will be expanded by the search algorithm you selected
before it finds a solution?
● O(𝑘)
● O(𝑛)
● O(𝑛𝑘)
● O(𝑘^𝑛)
● O(𝑛^𝑘)
→ The answer is O(k)
● As discussed in the previous subpart, DFS will expand one node at each depth
before finding a solution. The depth of the tree is O(k), so that's how many nodes
DFS will expand.
LAB 04
1. Search.

The questions on this page refer to the graph above, where the start state is A and the goal
state is G. The number on an edge corresponds to the cost of traversing that edge.

State A B C D E F G H I J

Heuristic value 5 3 2 3 2 0 0 5 4 0

Assume that each algorithm re-generates states that are not yet expanded, does not re-expand
states, breaks ties in alphabetical ordering, and terminates after expanding the goal state.
Note: These assumptions may differ with the operations of some of the algorithms in the
textbook.
a. What is the order of state expansions if you used Breadth-First Search?
→ Fringe: A
Expand A: A→B
A→C
Expand B: A→B→D
A→B→E
Expand C: AàCàF
A→C→G
Expand D: A→B→D→H
A→B→D→I
Expand E:
Expand F: A → C → F→ J
Expand G: Goal
→ Order of visit: A → B → C → D → E → F → G
→ Path: A → C → G
b. What is the order of state expansions if you used Depth-First Search? What path would
the algorithm return?
→ Fringe: A
Expand A: A→C
A→B
Expand B: A→B→E
A→B→D
A→B→C
Expand C: A→B→C→G
A→B→C→F
Expand F: A → B → C→ F →J
A→B→C→F→G
A→B→C→F→D
Expand D: A→B→F→D→I
A→B→F→D→H
Expand H:
Expand I
Expand G: Goal
→ Order of visit: A → B → C → F → D → H → I → G
→ Path: A → B → C → F → G
c. What is the order of state expansions if you used Uniform-Cost Search? What path
would the algorithm return?
→ Fringe: A
Expand A: A → B8
A → C6
Expand C: A → C6 → F9
A → C6 → G14
Expand B: A → B8 → D11
A → B8 → E13
Expand F: A → C6 → F9 →G10
⇒(Change G14 because cost(F to G) = 10 < cost(C to G) = 14)
A → C6 → F9 → J11
Not F à D (cost (F to D) = 14 > cost (B to D) = 11)
Expand G: Goal
→ Order of visit: A → C → B → F → G
→ Path: A → C → F → G
d. What is the order of state expansions if you used Greedy Best-First Search? What path
would the algorithm return?
→ Fringe(Stack): A h(A)=5
Expand A: A → C h(C)=2
A → B h(B)=3
Expand C: A → C → G h(E)=0
A → C → F h(F)=0
Expand F: A → C → F → J h(J)=0
A → C → F → G h(G)=0
A → C → F → D h(D)=3
Expand G: Goal
→ Order of visit: A → C → F → G
→ Path: A → C → F → G
e. What is the order of state expansions if you used A* Search? What path would the
algorithm return?
→ Fringe(Stack): A
Expand A: A → C f(C) = g(A to C) + h(C) = 6 + 2 = 8
A → B f(B) = g(A to B) + h(B) = 8 + 3 = 11
Expand C: A → C → G f(G) = g(A to G) = 6 + 8 + 0 = 14
A → C → F f(F) = g(A to F) = 6 + 3 + 0 = 9
A → C → B f(B) = 11
Expand F: A → C → F → J f(J) = 11
A → C → F → G f(G) = 10
A → C → F → D f(D) = 17
Expand G: Goal
→ Order of visit: A → C → F → G
→ Path: A → C → F → G
LAB 05
1. Informed Search.

Search problem graph: S is the start state and G is the goal state.
Tie-break in alphabetical order.
ℎ(𝐵) is unknown and will be determined in the subquestions.
a. In this question, refer to the graph above where the optimal path is 𝑆 → 𝐵 → 𝐷 → 𝐺. For each
of the following subparts, you will be asked to write ranges of ℎ(𝐵). You should represent
ranges as ___≤ ℎ(𝐵) ≤___ . Heuristic values can be any number including ±∞. For responses of
±∞, you can treat the provided inequalities as a strict inequality. If you believe that there is no
possible range, please write "None" in the left-hand box and leave the right box empty.What is
the range for the heuristic to be admissible?
i. What is the range for the heuristic to be admissible?
0 ≤ ℎ(𝐵) ≤ h*(n) = 2 + 3
→ 0 ≤ ℎ(𝐵) ≤ 5
ii. What is the range for the heuristic to be consistent?
h(S) - h(B) ≤ cost(S - B) = 1 à h(S) - 1 ≤ h(B) à 5 ≤ h(B)
h(B) - h(D) ≤ cost(B - D) = 2 à h(B) ≤ 2 + h(D) à h(B) ≤ 4
→ None ≤ ℎ(𝐵) ≤___
iii. Regardless of whether the heuristic is consistent, what range of heuristic values for B
would allow A* tree search to still return the optimal path (𝑆 → 𝐵 → 𝐷 → 𝐺)?
When expanding D (S → A → D): f(D) = g(D) + h(D) = 2 + 3 + 2 = 7
When expanding D (S → B): f(B) = g(B) + h(B) = 1 + h(B)
Optimal path S → B → D →G, B expands before D:
f(B) ≤ f(D) à 1 + h(B) ≤ 7 → h(B) ≤ 6
→ -∞ ≤ ℎ(𝐵) ≤ 6
iv. Now of whether the heuristic is consistent, what range of heuristic values for B would
allow A* tree search to still return the optimal path (𝑆 → 𝐵 → 𝐷 → 𝐺)?
→ -∞ ≤ ℎ(𝐵) ≤ 6
2. Search and heuristics. Imagine a car-like agent wishes to exit a maze like the one shown below:
● The agent is directional and at all times faces some direction d ∈ (N, S, E, W).
● With a single action, the agent can either move forward at an adjustable velocity v or turn. The
turning actions are left and right, which change the agent’s direction by 90 degrees. Turning is
only permitted when the velocity is zero (and leaves it at zero).
● The moving actions are fast and slow. Fast increments the velocity by 1 and slow decrements the
velocity by 1; in both cases the agent then moves a number of squares equal to its NEW adjusted
velocity (see example below).
● A consequence of this formulation is that it is impossible for the agent to move in the same
nonzero velocity for two consecutive timesteps.
● Any action that would result in a collision with a wall or reduce v below 0/above a maximum
speed Vmax is illegal.
● The agent’s goal is to find a plan which parks it (stationary) on the exit square using as few
actions (time steps) as possible.
As an example: if at timestep t the agent’s current velocity is 2, by taking the fast action, the agent
first increases the velocity to 3 and move 3 squares forward, such that at timestep t + 1 the agent’s
current velocity will be 3 and will be 3 squares away from where it was at timestep t. If instead the
agent takes the slow action, it first decreases its velocity to 1 and then moves 1 square forward,
such that at timestep t + 1 the agent’s current velocity will be 1 and will be 1 square away from
where it was at timestep t. If, with an instantaneous velocity of 1 at timestep t + 1, it takes the
slow action again, the agent’s current velocity will become 0, and it will not move at timestep t + 1.
Then at timestep t + 2, it will be free to turn if it wishes. Note that the agent could not have turned
at timestep t + 1 when it had a current velocity of 1, because it has to be stationary to turn.
a. If the grid is M by N, what is the size of the state space? Justify your answer. You should
assume that all configurations are reachable from the start state.
→ The agent’s position on the grid, which can be any square in an M x N grid. There are M
x N possible positions.
● The agent’s direction, which can be North, South, East, or West, giving 4 possible
directions.
● The agent’s velocity, which ranges from 0 to Vmax inclusive. If the maximum
velocity is Vmax, there are Vmax + 1 possible velocity values (since velocity can be 0).
● Thus, the total number of states S is given by: S = M x N x 4 x (Vmax +. 1)
● This accounts for the agent’s position, direction, and velocity. Each configuration is
a unique state, and the assumption is that all configurations are reachable from
the start.
b. Is the Manhattan distance from the agent’s location to the exit’s location admissible?
Why or why not?
→ The Manhattan distance is not admissible in this context because it underestimates
the cost of reaching the goal. The agent’s velocity and the inability to change direction
unless stationary means the actual cost of reaching the exit could be much higher than
what the Manhattan distance suggests. For example, the agent may need to slow down,
accelerate, or wait to turn, making the true cost greater than the simple Manhattan
distance.
c. State and justify a non-trivial admissible heuristic for this problem which is not the
Manhattan distance to the exit.
→ A non-trivial admissible heuristic must consider the agent's velocity and other
constraints while still never overestimating the cost.
One possible heuristic could be “the number of steps required to decelerate to zero” plus
the Manhattan distance to the exit. This is because the agent must come to a stop at the
exit, and decelerating takes time, which the Manhattan distance doesn’t account for.
→ Heuristic:
● Assume the agent can decelerate as quickly as possible (i.e., by 1 unit per step).
● Add the minimum number of steps required to decelerate from the agent’s
current velocity to zero (if the agent’s current velocity is v, it will take v steps to
stop).
● Then add the Manhattan distance from the agent’s position to the goal (as the
remaining number of steps after the agent has stopped).
This heuristic remains admissible because it always underestimates the actual cost (the
agent might have to slow down or turn more than anticipated).
d. If we used an inadmissible heuristic in A* graph search, would the search be complete?
Would it be optimal?
→ Completeness: The search will still be complete, meaning it will always find a solution
if one exists. A* will continue searching all possible states, even if the heuristic is
inadmissible.
→ Optimality: The search would not necessarily be optimal. An inadmissible heuristic can
overestimate the true cost to reach the goal, potentially causing A* to miss the optimal
solution by focusing on suboptimal paths first.
e. If we used an admissible heuristic in A* graph search, is it guaranteed to return an
optimal solution? What if the heuristic was consistent? What if we were using A* tree
search instead of A* graph search?
→ A* Graph Search with Admissible Heuristic: It is guaranteed to return an optimal
solution because an admissible heuristic ensures that the first solution found is optimal,
even if other potential solutions remain.
→ A* Graph Search with Consistent Heuristic: If the heuristic is consistent, A* is still
guaranteed to return an optimal solution. Consistency implies admissibility, and it also
ensures that once a node is expanded, it will never be revisited, making A* more efficient.
→ A* Tree Search with Admissible Heuristic: In tree search, the same node can be
expanded multiple times because tree search does not track explored nodes. This can
lead to suboptimal solutions, even with an admissible heuristic. A consistent heuristic
would still guarantee optimality in graph search but not necessarily in tree search due to
potential re-expansions.
f. Give a possible advantage that an inadmissible heuristic might have over an admissible
one.
→ An inadmissible heuristic can sometimes make the search faster because it may guide
the search more aggressively toward the goal by overestimating costs. This can lead to
fewer nodes being expanded, reducing computation time, though at the cost of
potentially missing the optimal solution. Essentially, an inadmissible heuristic can improve
the efficiency of finding a solution (though not necessarily the best one), especially in
scenarios where speed is prioritized over optimality.
LAB 06
1. Game.
a. Consider the zero-sum game tree shown below. Triangles that point up, such as at the
top node (root), represent choices for the maximizing player; triangles that point down
represent choices for the minimizing player. Assuming both players act optimally, fill in
the minimax value of each node:

→ First level above the leaf nodes (Minimizing player):


● For the first branch with children 10, 8, 3: The minimizing player will choose the
minimum value, which is 3.
● For the second branch with children 2, 15, 7: The minimizing player will choose the
minimum value, which is 2.
● For the fourth branch with children 6, 5, 4: The minimizing player will choose the
minimum value, which is 4.
Root node (Maximizing player): With children 3, 2, 4: The maximizing player will
choose the maximum value, which is 4.
b. Which nodes can be pruned from the game tree above through alpha-beta pruning? If
no nodes can be pruned, explain why not. Assume the search goes from left to right;
when choosing which child to visit first, choose the left-most unvisited child.
→ At the root node (Maximizing player): alpha is initialized to −∞ and beta to +∞.
First branch (Minimizing player):
● Explore the children of the first node:
○ The values 10, 8, and 3 are evaluated.
○ The minimizing player will choose the smallest value, which is 3.
● The maximizing player at the root updates alpha to 3.
Seconde branch (Minimizing player):
● The children of this branch are 2, 15, and 7.
● Explore the first child (2):
○ Beta is updated to 2 for the minimizing player.
● At this point, beta (2) is less than alpha (3), so the remaining nodes in this branch
(15 and 7) are pruned. These nodes are irrelevant because the minimizing player
will choose 2, which is already less than 3, making further exploration
unnecessary.
Third branch (Minimizing player):
● The children are 6, 5, and 4:
○ The values 6, 5, and 4 are evaluated.
○ The minimizing player selects the smallest value, 4.
● The root alpha remains 3 because the maximizing player would prefer 3 over 4.
Final Minimax Value:
● The root node's minimax value is 3, as the maximizing player will choose the best
outcome from the two branches (3 from the first branch and 4 from the third
branch).
→ Pruned nodes: Nodes 15 and 7 in the second branch are pruned. This happens because
after evaluating the first child (2), beta becomes 2, which is less than alpha (3). Therefore,
further exploration of 15 and 7 is unnecessary.
2. Kirby is participating on a game show with his enemy, King Dedede! First, Kirby chooses one of
three doors, behind each of which sit two boxes. King Dedede chooses one of the two boxes to
give to Kirby. Each box contains two transparent juice bottles, of which Kirby chooses one to
enjoy for himself.
The bottles are filled up to different amounts, ranging from 0 (completely empty) to 10
(completely full), inclusive.
a. For this subpart, we assume that Kirby is fully aware of the juice in each bottle, the
bottles in each box, and the boxes behind each door. Shown below is the resulting game
tree:

→ The game involves two players: Kirby, who is the maximizing player, and King Dedede,
who is the minimizing player.
Fill in the manimax values of the game tree shown:
● Door 1:
○ Bottle A and B: Values 7 and 3 → Kirby gives Dedede max(7, 3) = 7.
○ Bottle C and D:Values 9 and 4 → Kirby gives Dedede max(9, 4) = 9.
○ Minimizing Dedede: Kirby will choose the minimum of 7 and 9, so min(7, 9) = 7.
● Door 2:
○ Bottle E and F: Values 6 and 2 → Kirby gives Dedede max(6, 2) = 6.
○ Bottle G and H:Values 1 and 7.5 → Kirby gives Dedede max(1, 7.5) = 7.5 .
○ Minimizing Dedede: Kirby will choose the minimum of 6 and 7.5, so min(6, 7.5) =
6.
● Door 3:
○ Bottle I and J: Values 5 and 6 → Kirby gives Dedede max(5, 6) = 6.
○ Bottle K and L: Values 0 and 8 → Kirby gives Dedede max(0, 8) = 8.
○ Minimizing Dedede: Kirby will choose the minimum of 6 and 8, so min(6, 8) = 6.
● Final Minimax Values:
○ Door 1: Minimax value = 7
○ Door 2: Minimax value = 6
○ Door 3: Minimax value = 6
→ The door Kirby should choose to maximize his juice intake is Door 1, which gives a
value of 7.
b. Which terminal nodes are never explored as a consequence of pruning?

3. Sliding Puzzle.
Consider the sliding puzzle game as a search problem. The sliding puzzle game consists of a
three-by-three board with eight-numbered tiles and a blank space. At each turn, only a tile
adjacent to the blank space (left, right, above, below) can be moved into the blank space. The
objective is to reconfigure any generic starting state to the goal state (displayed on the right).
a. What is the size of the state space?
(A) 9 (B) 9*9 (C) 9!
b. Does every board have a unique sequence of moves to reach the goal state?
(A) Yes (B) No
→ Not every board has a unique sequence of moves to reach the goal state because the
puzzle can involve loops or different paths that lead to the same configuration.
c. For the following heuristics, mark whether or not they are only admissible (admissible
but not consistent), only consistent (consistent but not admissible), both, or neither.
i. h1(n) = Total number of misplaced tiles.
(A) Only admissible (B) Only consistent (C) Both (D) Neither
ii. This heuristic is both admissible (because it never overestimates the number of
moves required to reach the goal) and consistent (because moving one tile
changes the heuristic by at most 1).

(A) Only admissible (B) Only consistent (C) Both (D) Neither
→ This heuristic is admissible because it does not overestimate the cost. However, it is not
consistent because the jump between values may not reflect the actual cost between
states.
iii. h3(n) = Sum of Manhattan distances for each individual tile to their goal
location.
(A) Only admissible (B) Only consistent (C) Both (D) Neither
→ This heuristic is both admissible and consistent because each tile's movement cost is
correctly estimated without overestimation, and the sum of Manhattan distances is
consistent.
iv. h4(n) = max(h1(n), h3(n)).
(A) Only admissible (B) Only consistent (C) Both (D) Neither
v. h5(n) = min(h1(n), h3(n)).
(A) Only admissible (B) Only consistent (C) Both (D) Neither
→ This heuristic is admissible because both h1(n) and h3(n) are admissible. It is also
consistent because taking the maximum of two consistent heuristics results in a consistent
heuristic.
vi. h6(n) = h1(n) + h3(n)
(A) Only admissible (B) Only consistent (C) Both (D) Neither
vii. Comparing the heuristics above that you marked as admissible (or both), which
one is best (dominates the others)? If multiple heuristics are tied mark them all.
(A) h1 (B) h2 (C) h3 (D) h4 (E) h5 (F) h6
→ Comparing the heuristics marked as admissible or both, the best heuristic is the one
that gives the tightest lower bound while still being consistent. Therefore, both h3 (sum of
Manhattan distances) and h4 (max of h1 and h3) are equally good since both are
admissible and consistent.
4. State Spaces.
Pacman finds himself in an N by M grid again. He starts in the top left corner of the grid and his
goal is to reach the bottom right corner. However there are G ghosts in fixed locations around
the grid that Pacman needs to kill before he reaches the goal. Pacman accomplishes this using
his new car, the Pacmobile.
Every turn Pacman chooses a radius value, r, which can be any integer from 0 to R inclusive.
This shocks and kills all ghosts within r grids of him (by Manhattan distance). However, the
Pacmobile has a limited battery that contains E (a positive integer) units of charge. Whenever it
produces an electric field of radius r, the Pacmobile loses r units of charge. The Pacmobile has 5
moving actions: left, right, up, down, and stop. All of those choices will cost 1 unit of charge as
well. If the Pacmobile runs out of charge Pacman can no longer do anything and the game ends
in a loss.
To further clarify, on each turn Pacman chooses a movement action and radius value r. He then
shocks all squares in a radius r around him, performs the movement, and loses r + 1 units of
charge. Recall he starts with E units of charge and he has to kill G ghosts in fixed locations
before reaching the bottom right square to win the game.Tie-break in alphabetical order.
a. Mark true or false for the following subparts (Assume at least one solution exists for the
search problem)
i. Running Breadth First Search on this search problem will return the solution that
takes the smallest amount of turns.
A. True B. False
→ BFS guarantees finding the solution with the least number of actions (turns) since it
explores all states level by level.
ii. Running Breadth First Search on this search problem will return the solution that
uses the smallest amount of charge.
A. True B. False
→ BFS prioritizes the number of moves, not the amount of charge used. There could be
solutions that use more charge but fewer moves, which BFS would still return as the
solution.
iii. Running Breadth First Search on this search problem will return the solution
where Pacman travels the least amount of grid distance.
A. True B. False
→ BFS minimizes the number of turns but not necessarily the grid distance. It could still
choose suboptimal paths if they take fewer turns.
b. Write your answers to the following two subparts in terms of N, M, G, R, E and any
scalars you need.
i. What is the size of the state space, X, using a minimal state representation?
→ The state space is defined by:
● Pacman's position on the grid: N×M possible positions.
● The charge left in the Pacmobile: E+1 possible values (since charge can range from
0 to E).
● Which ghosts are still alive: 2G possible states (each ghost can be either alive or
dead).
● The current radius r of the Pacmobile: R+1 possible values (since r can range from
0 to R).
Thus, the size of the state space is: X = (N * M) * (E + 1) * 2^G * (R + 1)
ii. What is the maximum possible branching factor from any state?
→ For each turn, Pacman has:
● 5 movement actions (left, right, up, down, and stop).
● R+1 choices for the radius r.
Each action also consumes some charge, but that doesn't affect the number of possible
actions Pacman can take. Therefore, the maximum branching factor is: Branching factor =
5 * (R + 1)
c. Now we are going to investigate how the size of the state space changes with certain
rule changes. Write your answers to the following two subparts in terms of X (the size of
the original state space) and the terms N, M, G, R, E and any scalars. Each subpart is
independent of the other.
i. The ghosts are now more resilient so they have H health points. This means each
ghost needs to spend H turns in the electric field before they die. What is the size
of the state space with this adjustment?
→ If each ghost has H health points, the number of possible states for each ghost
increases to H+1 (from 0 to H, where 0 means the ghost is dead). Therefore, for G ghosts,
the number of possible states becomes (H + 1)G
Thus, the new state space size is: X * (H + 1)^G
ii. The Pacmobile can no longer choose any radius value at every turn because it
now takes time to increase and decrease the radius. If pacman is using a radius r
on one turn, on the next turn he must choose radius values from r - 1, r, or r + 1.
→ This rule doesn’t increase the number of possible states, but it restricts the number of
transitions between states. Therefore, the size of the state space remains the same, X, but
the branching factor changes. Hence:
d. Mark whether the following heuristics are admissible or not. Following the original
problem description.
i. h1(n) = The amount of charge remaining.
A. Admissible B. Not Admissible
→ The remaining charge does not directly estimate the cost to reach the goal. Pacman
could have a lot of charge left but still need many moves to reach the goal, so this
heuristic can overestimate the actual cost.
ii. h2(n) = The number of ghosts still alive
A. Admissible B. Not Admissible
→ The number of ghosts remaining is a lower bound on the number of actions needed to
kill all the ghosts. Therefore, it is admissible as it never overestimates the remaining cost.
iii. h3(n) = The Manhattan distance between the Pacmobile and the bottom right
corner.
A. Admissible B. Not Admissible
→ The Manhattan distance is a valid lower bound on the number of moves required to
reach the goal, as Pacman can always move at least one step closer to the goal per turn.
Hence, it’s admissible.
LAB 07
1. Kirby is participating on a game show with his enemy, King Dedede! First, Kirby chooses one of
three doors, behind each of which sit two boxes. King Dedede chooses one of the two boxes to
give to Kirby. Each box contains two transparent juice bottles, of which Kirby chooses one to
enjoy for himself.
The bottles are filled up to different amounts, ranging from 0 (completely empty) to 10
(completely full), inclusive.
a. King Dedede changes the rules, and now Kirby must be blindfolded immediately after
choosing a door. As a result, Kirby chooses a bottle randomly from any box with uniform
probability. The resulting game tree is almost exactly the same as before, except that
the bottom-most layer of maximizer nodes are now replaced with chance nodes.

i. Fill out the values on the game tree. Which door should Kirby choose?
→ Expected utility calculations for each box (50% chance per bottle)
● Door 1:
○ Bottle A and B: ( 7 + 3 ) / 2 = 5
○ Bottle C and D: ( 9 + 4 ) / 2 = 6.5
→ The lowest expected utility for Door 1 is 5
● Door 1:
○ Bottle E and F: ( 6 + 2 ) / 2 = 4
○ Bottle G and H: ( 1 + 7.5 ) / 2 = 4.25
→ The lowest expected utility for Door 2 is 4
○ Bottle I and J: ( 5 + 6 ) / 2 = 5.5
○ Bottle K and L: ( 0 + 8 ) / 2 = 4
→The lowest expected utility for Door 3 is 4
→ Kirby should choose Door 1 because Door 1 has the highest expected utility
ii. Which terminal nodes are never explored as a consequence of pruning?
b. King Dedede wants to guarantee that, assuming Kirby plays the game optimally, his
expected juice utility is minimized. How might King Dedede rearrange the juice bottles
in the boxes to ensure this? Indicate your answer by filling in the terminal nodes of the
empty game tree below (for your convenience, a couple nodes have already been filled
out for you).

Leaf node values (copied from previous question’s leaf nodes): 7, 3, 9, 4, 6, 2, 1, 7.5, 5,
6, 0, 8. What is Kirby’s new expected utility after the rearrangement of the bottles?
→ Calculate the average chance node of 2 bottles.
Choose the smallest values: 0, 1, 2, 3, 4, 5 to form pairs to minimize each door.
Place the remaining values into any boxes: 6, 6, 7, 7.5, 8, 9.
2. Game.
a. Consider the following game tree where all leaf nodes are between 0 and 10 (inclusive)
and both expectation nodes choose the left branch with chance 0.8.
i. Determine the values of𝑀,𝑁, and the root node.
→ Min (A, B) = Min (5, 3) = 5
Min (C, D) = Min (9, 10) = 9
⇒ M = 3 x 0.8 + 9 x 0.2 = 4.2
Min (E, F) = Min (5, 0) = 0
Min (G, H) = Min (8, 5) = 5
⇒ N = 0 x 0.8 + 5 x 0.2 = 1
⇒ Root = Max (M, N) = Max (4.2, 1) = 4.2
ii. Given the above expectimax problem, which node(s) can be pruned in
alpha-beta pruning?
→ Max = 4.2
It can be seen that E and F is 0 so there is no need to consider G and H because it
will not be able to exceed 4.2
iii. Now consider the general case where leaf values can take on any real value
between 0 and 10 (inclusive). Do not prune on equality.
What is the maximum number of nodes that can be pruned?
iv. Now, let 𝑝 be the probability that each expectation node chooses the left branch,
and assume 𝑝 cannot be 0 or 1. For the following game tree shown below (note
this is different from the tree in part 1.), determine the bounds on 𝑝 such that the
maximum number of leaf nodes can be pruned. Recall that probabilities must be
between 0 and 1.

→ Left branch: 3p + 9(1-p)


Right branch: Max value = 1p + 10(1- p)
3p + 9(1-p) > 1p + 10(1-p)
3p + 9 – 9p > 1p + 10 – 10p
3p > 1
p > 1/3
⇒ 1/4 < p < 1
LAB 08
1. Indiana Jones & the Kingdom of the Crystal Skull
Help Indiana Jones find the crystal skull that was hidden somewhere at the old frozen lake!
For all parts of the problem, assume that value iteration begins with all states initialized to
zero, and = 1.
a. Suppose that we are performing value iteration on the grid world MDP below. Indiana
starts at the bottom-left part of the lake labeled 1.
Indiana found an ancient manuscript detailing the rules of this MDP. (All of these rules
are also described in the transition table below.)
● Indiana can only go up or right, unless the action would cause Indiana to move off
the board or into the black square.
● Because the lake is frozen, if Indiana moves up from Square 3, it’s possible (with
probability x) that Indiana slips and actually moves two squares up.
● From Squares 1, 2, and 4, Indiana deterministically moves one square in the
chosen direction.
● Once Indiana reaches Square 5, the only action available is Exit, which gives
reward 100. (There are no actions available after exiting.)
● Square 2 contains ancient money, so any action that moves Indiana into Square 2
gives reward +5.
● All other transitions give reward –4 if Indiana moves two squares, and –10 if
Indiana moves one square.

i. Find the optimal state values for Square 2 and Square 3. Your answer can be in
terms of x.
V*(2) = ? V*(3) = ?
→ Step 1. Optimal value for V*(2)
● From Square 2, Indiana only has one action available: Up, which moves him
to Square 4 with a reward of -10
● The Bellman equation for V*(2) is V* (2)=R(2, up, 4) + γ.V* (4)
● Since R(2, up, 4) = -10 and = 1, we get V* (2)= -10 + V* (4)
→ Step 2. Optimal value for V*(3)
● From Square 3, Indiana has two options:
○ Right which takes him to Square 4 with a reward of -10
○ Up, where he has a chance to slop and end up in Square 5 instead
of Square 4
■ With probability 1- x, he goes to Square 4 with a reward of
-10
■ With probability x, he slips and goes to Square 5 with a
reward of -4
● The Bellman equation for V*(3) becomes V* (3)=max⁡(right,up)
○ Right action: right=R(3,right,4) + γ.V* (4)= -10 + V* (4)
○ Up action: up=(1-x).(-10) + V* (4)) + x.(-4 + V* (5))
○ So we have:
V* (3)=max⁡[-10 + V* (4), (1 - x).(-10) + V* (4))+x.(-4 + V* (5))]
■ From the problem statement, Square 5 is a terminal state
with an Exit reward of 100. Since Indiana exits the grid once
he reaches Square 5, we set: V*(5) = 100
■ From Square 4, the only action Indiana can take it Up, which
moves him deterministically to
● Square 5 with a reward of -10: V* (4)=R(2,up,4)+γ.V* (4)= -10+1.100=90
⇒ We have: V*(2) = 80; V*(3) = max(80, 80 + 16x), V*(4) = 90 and V*(5) = 100
ii. For which values of x would Indiana prefer to go right on Square 1?
→ At Square 1, Indiana has two options:
● Up: This action moves Indiana to Square 2 with a reward of +5
V* (1, up)=5 + V* (2)
● Right: This action moves Indiana to Square 3 with a reward of -10
V* (1, right)=-10 + V* (3)
● To find the values of x for which Indiana would prefer going Right, we
need:
V* (1,right)> V* (1,up) or -10+V* (3)>5+V* (2)
● Substitute V*(2) and V*(3) from the expression above, we have:
-10+max⁡(80, 80 + 16x)> -5+90
○ Case 1: If 80 + 16x ≤80 then max(80, 80 + 16x) = 80. In this case:
-10 + 80 > 85 ⇒ 70 > 85
⇒This is false, so Indiana would not prefer to go Right in this case
○ Case 2: If 80 + 16x > 80 then max(80, 80 + 16x) = 80 + 16x. In this
case:
-10 + (80 + 16x) > 85 ⇒ 70 + 16x > 85 ⇒x > 15/16
⇒ This means that Indiana will only choose to go Right if the
probability x of slipping from Square 3 to Square 5 (skipping Square
4) is greater than 15/16, or approximately 0.9375
LAB 09
1. MDPs: Grid-World Water Park
Consider the MDP drawn below. The state space consists of all squares in a grid-world water
park. There is a single water slide that is composed of two ladder squares and two slide
squares (marked with vertical bars and squiggly lines respectively). An agent in this water park
can move from any square to any neighboring square, unless the current square is a slide in
which case it must move forward one square along the slide. The actions are denoted by
arrows between squares on the map and all deterministically move the agent in the given
direction. The agent cannot stand still: it must move on each time step. Rewards are also
shown below: the agent feels great pleasure as it slides down the water slide (+2), a certain
amount of discomfort as it climbs the rungs of the ladder (-1), and receives rewards of 0
otherwise. The time horizon is infinite; this MDP goes on forever.
a. How many (deterministic) policies are possible for this MDP?
→ (3 squares with 1 direction) 1^3
(11 squares with 2 direction) 2^11
⇒ 1^3 x 2^11 = 2^11
b. Fill in the blank cells of this table with values that are correct for the corresponding
function, discount, and state. Hint: You should not need to do a substantial calculation
here. (Note: Vt(s) is the time-limited value of state s, if our Markov Decision Process
were to terminate after t timesteps.)

c. Fill in the blank cells of this table with the Q-values that result from applying the
Q-update for the transition specified on each row. You may leave Q-values that are
unaffected by the current update blank. Use discount = 1.0 and learning rate α = 0.5.
Assume all Q-values are initialized to 0. (Note: the specified transitions would not arise
from a single episode.)
LAB 10
1. Quagmire
Consider an unknown MDP with three states (A, B and C) and two actions (← and →). Suppose
the agent chooses actions according to some policy π in the unknown MDP, collecting a
dataset consisting of samples (s, a, s′, r) representing taking action in state s resulting in a
transition to state s′ and a reward of r.

You may assume a discount factor of γ = 1.


a. Recall the update function of Q-learning is:

Assume that all Q-values are initialized to 0, and use a learning rate of α = 1/2
i. Run Q-learning on the above experience table and fill in the following Q-values:
Q(A, →) = ? Q(B, →) = ?

ii. After running Q-learning and producing the above Q-values, you construct a
policy πQ that maximizes the Q-value in a given state:

What are the actions chosen by the policy in state A and B?


● πQ(A) is equal to:
○ πQ(A) = ←
○ πQ(A) = → [True]
○ πQ(A) = Undefined
● πQ(B) is equal to:
○ πQ(B) = ← [True]
○ πQ(B) = →
○ πQ(B) = Undefined
REVIEW I
1. Search.
Answer the following questions about the search problem shown above. Assume that ties are
broken alphabetically. (For example, a partial plan S → X → A would be expanded before S " X
" B; similarly, S → A → Z would be expanded before S → B → A.) For the questions that ask for
a path, please give your answers in the form “S – A – D – G.”
a. What path would breadth-first graph search return for this search problem?

BFS Fringe

Queue S→A S, A, G

S→G Path return

S→A→B S→G

S→A→C

b. What path would uniform cost graph search return for this search problem?

UCS Fringe

Queue S S, A, C, D, B, G

FIFO S → A (1) Path return

S → G (12) S→A→C→G

S → A → B (1 + 3 = 4)

S → A → C (1 + 1 = 2)

S → A → C → D (1 + 1 + 1 = 3)

S → A → C → G (1 + 1 + 2 = 4)

S → A → C → D → G (1 + 1 + 1 + 3 = 6)

S → A → B → D (1 + 3 + 3 +7)

c. What path would depth-first graph search return for this search problem?
DFS Fringe

Stack S S, A, B, D. G

LIFO S→G Path return

S→A S→A→B→D→G

S→A→C

S→A→B

S→A→B→D

S→A→B→D→G

d. What path would A* graph search, using a consistent heuristic, return for this search
problem?

A* Fringe

Priority Queue S (i) S, A, C, G

S → A (1 + 3 = 4) (ii) Path return

S → G (12 + 0 = 12) S→A→C→G

S → A → B (1 + 3 + 6 = 10)

S → A → C (1 + 1 + 2 = 4) (iii)

S → A → C→ D (1 + 1 + 1 + 3 = 6)

S → A → C → G (1 + 1 + 2 + 0 = 4) (iv)

S → A → C → D → G (1 + 1 + 1 + 3 = 6)

S → A → B → D (1 + 3 + 3 = 7)

e. Consider the heuristics for this problem shown in the table below.
i. Is h1 admissible? (i) Yes (ii) No
ii. Is h1 consistent? (i) Yes (ii) No
iii. Is h2 admissible? (i) Yes (ii) No
iv. Is h2 consistent? (i) Yes (ii) No
2. Search.
For the following questions, consider the search problem shown on the left. It has only three
states, and three directed edges. A is the start node and G is the goal node. To the right, four
different heuristic functions are defined, numbered I through IV.

a. Admissibility and Consistency


For each heuristic function, circle whether it is admissible and whether it is consistent
with respect to the search problem given above.
i. Is hI admissible? (i) Yes (ii) No
ii. Is hI consistent? (i) Yes (ii) No
iii. Is hII admissible? (i) Yes (ii) No
iv. Is hII consistent? (i) Yes (ii) No
v. Is hIII admissible? (i) Yes (ii) No
vi. Is hIII consistent? (i) Yes (ii) No
vii. Is hIV admissible? (i) Yes (ii) No
viii. Is hIV consistent? (i) Yes (ii) No
b. Function Domination
Recall that domination has a specific meaning when talking about heuristic functions.
Circle all true statements among the following.
i. Heuristic function III dominates IV.
(i) True (ii) False
ii. Heuristic function IV dominates III.
(i) True (ii) False
iii. Heuristic functions III and IV have no dominance relationship.
(i) True(ii) False
iv. Heuristic function I dominates IV.
(i) True (ii) False
v. Heuristic function IV dominates I.
(i) True(ii) False
vi. Heuristic functions I and IV have no dominance relationship.
(i) True (ii) False
3. Game.
Alice is playing a ball game with Bob. There are 3 boxes in front of them, each containing 3
balls, and each ball has a score. Alice first selects a box from the 3 boxes, and then Bob takes a
ball from the box selected by Alice. In Figure 1, nodes B1, B2 or B3 are boxes, and nodes
labeled C through K represent individual balls and their scores. Unless otherwise specified,
Alice’s objective is to maximize the score of the ball that is eventually chosen, and Bob’s
objective is to minimize Alice’s score. Assume both players always act optimally for their goals.

a. In the blanks below, fill in the labels (not the numerical values) of the balls selected for
nodes A, B1, B2 and B3. For example, if Alice’s optimal move is to select the box B1, and
Bob selects the ball C, then A=C, B1=C.
A=E B1 = E B2 = H B3 = K
→ Bob is a minimizer so he will choose E, H, K for B1, B2, B3 respectively. Alice is a
maximizer, so she will choose the maximum among E, H, K which is E
b. Let’s apply alpha-beta pruning for the tree. Please choose all the child nodes of the
branches that could be pruned in the search tree.
(a) B1 (b) B2 (c) B3 (d) C (e) D (f) E
(g) F (h) G (i) H (j) I (k) J (l) K
→ We cannot prune anything in the B2 branch because min (F, G) = 8 > min (C, D, E) = 6.
We can prune K because after visiting J we know that the value at B3 <= 3, which must be
smaller than the value at B1 (= 6) and will never get picked by Alice.
c. Assume that you can reallocate all the balls to different boxes as long as in the end,
each box has exactly 3 balls in it. Could you find a way to re-allocate the balls so that
when we apply alpha-beta pruning, the right branch of B2 as well as the middle and the
right branch of B3 will be pruned?
Please list all balls in the order of their new positions in your solution. For example, if
your solution is not to move any ball, then your answer is “C,D,E,F,G,H,I,J,K”. When
multiple solutions exist, break ties alphabetically. For example, if there are two
solutions starting with “C,D, ...” and “D,C, ...”, then your answer should be “C,D, ...”. If
there is no solution, please write down “None”.
→ C, D, E, F, H, G, I, J, K
The reasoning is similar to that of the previous part. After visiting C, D, E, we know that
value at B1 = min (C, D, E) = 6. Then after visiting H = 1 < 6 we can prune G, and after
visiting J = 3 < 6, we can prune I and K. Among all the feasible solution, this solution must
come first in alphabetical order, since an earlier solution needs to either start with C, D, E,
F, G, … or C, D, E, F, H, G, I, …, which do not satisfy the requirement.
4. Zero-sum game tree.
a. Consider the following zero-sum game tree. Triangles that point up represent choices
for the maximizing player; triangles that point down represent choices for the
minimizing player. Assuming both players act optimally, fill in the minimax value of
each node:

b. Using alpha beta pruning, which leaf nodes or branches can we prune? Assume that
branches are visited in left to right order.
c. If the branches of every minimizer node are reordered such that we prune the maximum
number of leaf nodes, which leaf nodes can be pruned? Assume that children of
maximizer nodes are visited from left to right and that we are not pruning on equality.
Remember to label the order of visit for every branch of all minimizer nodes.

→ The value on the right affects the root node. Browse to the right first
Max pruned = 5
d. Assume that we are not pruning on equality. In this part, we have a fixed traversal
order from left to right. We start with the tree in the previous part, and shuffle the
values of all the leaf nodes such that we check as few nodes as possible without
changing the structure of the tree. The initial ordering (as in the tree presented in the
previous part) is 6 10, 8, 7, 5, 4, 5, 6, 3,7, 2, and the new ordering is A. B,C,D,E,F,G,H,I,J,K.
(Note: this question is challenging!)
i. For all possible new orderings, which of the results (root values) are possible?
→ Given the leaf values 6, 10, 8, 7, 5, 4, 5, 6, 3, 7, 2, the root can take any value in
the range [2, 10], depending on the reordering.
ii. Suppose that for new ordering, the value of root is 4. Which of the following leaf
nodes are guaranteed to have value ⩽ 5
(a) A (b) B (c) C (d) D
(e) E (f) F (g) G (h) H
(i) I (j) J (k) K (l) None of the above
iii. Suppose that for new ordering, the value of root is 5. Which of the following leaf
nodes are guaranteed to have value > 5
(a) A (b) B (c) C (d) D
(e) E (f) F (g) G (h) H
(i) I (j) J (k) K (l) None of the above
5. Demolition Pacman.
Pacman just bought a new square (S) in an M by N grid world that contains some rough terrain.
He wants to clear a path from his new home to the closest grocery store, which is another
square (G) on the grid world. However, there are R squares on the grid that are impassable
because there are rocks in the way. Fortunately, Pacman is carrying D – 1 sticks of dynamite
with him. On any turn, Pacman can throw a stick of dynamite at one of eight neighboring
squares (diagonal throws are allowed) if it contains rocks, consuming a stick and removing the
rocks in that location. If he chooses not to throw dynamite, Pacman can also drive his truck
onto one of eight neighboring squares (diagonal moves are allowed) if there is nothing blocking
him. However, his truck only has F — 1 units of fuel and whenever Pacman moves to a square,
one unit of fuel is consumed. Once Pacman runs out of fuel and dynamite, he has no more
actions available. Assume R, D, and F are all greater than 3 and 2 * D < R.
a. Complete the expression below so that X evaluates to the size of the minimal state space.
You can use integers as well as any of the variables mentioned in the problem.
a=1 b=1 c=0
d=0 f=1 u=R
→ We have M*N possible values for the location. F possible values for the amount of fuel
left. We need to store a Boolean for each R rocks for whether it was destroyed or not.
Thus we raise 2R and we don’t need R as a base. We don’t need to keep track of how
much dynamite is left because every time we destroy a rock, we have used a dynamite
and we can figure out how many rocks we destroyed with our Booleans for each rock.
b. What is the maximum possible branching factor?
Maximum branching factor = 8
→ There are 8 neighboring squares. If there is a rock in one, we can throw a dynamite but
we can’t move into it. If there is no rock, we can move into it but we can’t throw
dynamite.
c. For each subpart below, select all algorithms that can be used to find the specified
sequence of actions. The algorithm should be able to find such a sequence on any grid
world that contains that sequence without failure. Assume you are allowed to define
costs between states however you want.
i. Any sequence of action that results in Pacman reaching the Grocery store.
(i) DFS Tree Search (ii) DFS Graph Search (iii) BFS
(iv) UCS (v) None of these
→ DFS Graph Search, BFS and USC are complete so they will always find a
solution. DFS Tree Search is also complete in this problem because it can't be
infinitely recursive since dynamite or fuel is used after every turn.
ii. The sequence of actions that results in Pacman reaching the Grocery store while
using the least amount of dynamite possible.
(i) DFS Tree Search (ii) DFS Graph Search (iii) BFS
(iv) UCS (v) None of these
→ We must use UCS with an edge cost between nodes that use dynamite and 0
edge cost between nodes that use fuel.
iii. The sequence of actions that results in Pacman reaching the Grocery store while
using the most amount of dynamite possible.
(i) DFS Tree Search (ii) DFS Graph Search (iii) BFS
(iv) UCS (v) None of these
→ We can’t use UCS (or any of the other algorithms) to find a path with the
“most” amount of transitions. We can’t use a positive edge costs for transitions
that don’t use dynamite because that would find a path that avoids not using
dynamite the most, which is different from using the most dynamite.
iv. The sequence of actions that results in Pacman reaching the Grocery store while
having the lowest sum of fuel consumed plus dynamite consumed.
(i) DFS Tree Search (ii) DFS Graph Search (iii) BFS
(iv) UCS (v) None of these
→ This is equivalent to finding the path with the least amount of turns since a
dynamite of a fuel is used on every turn. UCS and BFS can both find this
d. Each subpart below describes a change to the above problem. All subparts are
independent of each other, so over each question as if it was the only change to the
above description. Note, X is equal to the size of the state space from the original
problem. You are essentially finding a scaling factor that should be multiplied by the
original state space size.
i. Pacman upgraded his truck so that he can now throw dynamite towards the
north for free. It still costs him dynamite to throw in any of the other seven
directions.
Complete the expression below so that it evaluates to the size of the new
minimal state space. You can use integers as well as any of the variables
mentioned in the problem.

a=0 b=1 c=0


d=0 e=0
→ We now must keep track of how much dynamite was used separately, because
some of the rocks may have been destroyed for free.
Select all that could possibly be true when comparing to the original problem
statement.
(i) The new ruleset will cause BFS to expand less nodes
(ii) The new ruleset will cause BFS to expand the same amount of nodes
(iii) The new ruleset will cause BFS to expand more nodes
→ All answers. Since we did not specify what the grid world setup is, all of
them are possible. BFS will expand less nodes, if there is a short path of
rocks to the goal (to the north) that Pacman didn’t have enough dynamite
to break before. BFS will expand the same amount if there are no rocks in
their way. BFS will expand more nodes if there are a lot of rocks north of
the starting location that don’t go towards the goal.
ii. Y of the R rocks are larger than others (1 < Y < R). They require two sticks of
dynamite to be removed (Pacman can still only throw one stick of dynamite per
tum).
Complete the expression below so that it evaluates to the size of the new
minimal state space. You can use integers as well as any of the variables
mentioned in the problem.

a=0 b=0 c=0


d = -Y e=Y f=0
→ We now have 3 possible states instead of 2 for Y of the rocks so the state space
is scaled by a factor of (3^Y)/(2^Y)
Select all that could possibly be true when comparing to the original problem
statement.
(i) The new ruleset will cause BFS to expand less nodes
(ii) The new ruleset will cause BFS a expand the same amount of nodes
(iii) The new ruleset will cause BFS to expand more nodes
→ All answers. Since we didn’t specify what the grid would set up, all of them are
possible. BFS will expand more nodes if there is a short path of rocks to the goal
that Pacman no longer has enough dynamite to break. BFS will expand the same
amount if there are no rocks in the way. BFS will expand less nodes if there are a
lot of rocks around the starting location that don’t go towards the goal that
Pacman can no longer break.
iii. Y of the R rocks are now too large for Pacman to destroy (I <Y < R). Dynamite
does nothing on them so Penman will have to move around them.
Complete the expression below so that it evaluates to the size of the new
minimal state space. You can use integers as well as any of the variables
mentioned in the problem.

a=0 b=0 c=0


d = -Y e=0 f=0
→ We no longer have to keep track of Y of the rocks so the state space is scaled by
1/(2^Y)
Select all that could possibly be true when comparing to the original problem
statement.
(i) The new ruleset will cause BFS to expand less nodes
(ii) The new ruleset will cause BFS a expand the same amount of nodes
(iii) The new ruleset will cause BFS to expand more nodes
→ All answers. Since we didn’t specify what the grid world setup is, all of them
are possible. BFS will expand more nodes if there is a short path of rocks to the
goal that Pacman no longer has enough dynamite to break. BFS will expand the
same amount if there are no rocks in the way. BFS will expand less nodes if there
are a lot of rocks around the starting location that don’t go towards the goal that
Pacman can no longer break.
REVIEW II
1. Games.
a. Consider the zero-sum game tree shown below. Triangles that point up, such as at the
top node (root), represent choices for the maximizing player; triangles that point down
represent choices for the minimizing player. Assuming both players act optimally, fill in
the minimax value of each node.

b. Which nodes can be pruned from the game tree above through alpha-beta pruning? If
no nodes can be pruned, explain why not. Assume the search goes from left to right;
when choosing which child to visit first, choose the left-most unvisited child.

c. Again, consider the same zero-sum game tree, except that now, instead of a minimizing
player, we have a chance node that will select one of the three values uniformly at
random. Fill in the expectimax value of each node. The game tree is redrawn below for
your convenience.
d. Which nodes can be pruned from the game tree above through alpha-beta pruning? If
no nodes can be pruned, explain why not.
→ No nodes can be pruned. There will always be the possibility that an as-yet-unvisited
leaf of the current parent chance node will have a very high value, which increases the
overall average value for that chance node. For example, when we see that leaf 4 has a
value of 2, which is much less than the value of the left chance node, 7, at this point we
cannot make any assumptions about how the value of the middle chance node will
ultimately be more or less in value than the left chance node. As it turns out, the leaf 5
has a value of 15, which brings the expected value of the middle chance node to 8, which
is greater than the value of the left chance node. In the case where there is an upper
bound to the value of a leaf node, there is a possibility of pruning: suppose that an upper
bound of +10 applies only to the children of the rightmost chance node. In this case, after
seeing that leaf 7 has a value of 6 and leaf 8 has a value of 5, the best possible value that
the rightmost chance node can take on is (6+5+10)/3 = 7, which is less than 8, the value of
the middle chance node. Therefore, it is possible to prune leaf 9 in this case.
2. Search.
Admissible 0 <= h(n) <= h*(n)
h(n): cost indicated by h to reach goal from a
h*(n): optimal cost to reach a goal form n

Consistent h(A) - h(B) <= Cost (A → B)

S→A h(S) - h(A) = 7 - 5 = 2 <= Cost (S → A) = 3 True

S→B h(S) - h(B) = 7 - 7 = 0 <= Cost (S → B) = 1 True

A→C h(A) - h(C) = 5 - 4 = 1 <= Cost (A → C) = 2 True

B→C h(A) - h(C) = 7 - 4 = 3 <= Cost (B → C) = 3 True

C→D h(C) - h(B) = 4 - 1 = 3 <= Cost (C → D) = 6 True

C→G h(C) - h(G) = 4 - 0 = 4 <= Cost (C → G) = 4 True

D→G h(D) - h(G) = 1 - 0 = 1 <= Cost (D → G) = 1 True

→ (i) admissible, consistent

A* Search Fringe

S (i) S, A, B, C, G

S → A : f = 3 + 5 = 8 (ii) Path return

S → B : f = 1 + 7 = 8 (iii) S→G

S → A → B : f = 3 + 2 + 7 = 12 (*)

S→A→C:f=3+2+4=9

S → B → C : f = 1 + 3 + 4 = 8 (iv)

S→B→C→D=1+3+4+1=9

S → B → C → G = 1 + 3 + 4 + 0 = 8 (v)

→ (ii) S – 1; A – 2; B – 3; C – 4; D – Not Expanded; G – 5


→ (iii) S → B → C → G

Greedy Graph Search Fringe

Do not repeat visited S (i) S, A, C, G


nodes
S → A : h = 5 (ii) Path return

S→B:h=7 S→G

S → A → C : h = 4 (iii)

S→A→C→D:h=1

S → A → C → G : h = 0 (iv)

→ (iv) S – 1; A – 2; B – Not Expanded; C – 3; D – Not Expanded; G – 4


→ (v) S → A → C → G
3. Games.
Consider the following game tree where upward triangle nodes are max nodes, and downward
triangle nodes are min nodes:

a. If 𝑌 = 4, what is the minimax value at the root node 𝐴?


→ Start evaluating from leaves upward:
● D (min): min(4, 5) = 4
● E (min): min(3) = 3
● F (min): min(6, 14) = 6
→ Moving to max nodes:
● B (max): max(4, 3) = 4
● C (max): max(6) = 6
→ Finally A (min): min(4,6) = 4
b. What range of values for 𝑌 would cause the minimax value at the root 𝐴 to be equal to
𝑌? Your answer should be an inequality using ≤, such as 𝑌 ≤ 188, or 188 ≤ 𝑌 ≤ 288, or
388 ≤ 𝑌 ≤ 388, or “impossible” (if there are no such values for 𝑌 ).
→ A (min): min(B, C) = Y
B and C values are as follows:
+ B = max (min(Y, 5) , 3)
+ C = max (min(6, 14)) = 6
Focus on B:
+ D = Y ⇒ D <= 5
+ B = max (min(Y, 5) , 3) = D = Y ⇒ D >= 3 ⇒ Y >= 3
Focus on A:
+ A = Y ⇒ B < C ⇒ Y <= 6
Combine conditions: 3 <= Y <= 5
c. Assume that 𝑌 = 4. If we run minimax with alpha-beta pruning, visiting nodes from left
to right, which terminal nodes will we not visit because of pruning? Select all that apply.
(i) Y (ii) 5 (iii) 3 (iv) 6
(v) 14 (vi) None of the above (we visit all terminal nodes)

→ Nodes not visited: Terminal node 14 under F is pruned


d. When is the statement 𝐴 ≤ max(𝑌 , 14) true?
→ A (min): min(B, C)
B and C values are as follows:
+ B = max (min(Y, 5) , 3)
+ C = max (min(6, 14)) = 6
Case 1. Y <= 5
● Focus on D → D = Y
● Focus on B
○ D <= 3 → Y <= 3 → B = 3
⇒ A = min (B, C) = min (3, 6) = 3 <= max (Y, 14). True
○ D >= 3 → 3 <= Y <= 5 → B = Y
⇒ A = min (B, C) = min (Y, 6) = Y <= max (Y, 14). True
Case 2. Y > 5 → D = 5 → B = 5 → A = 5 < max (Y, 14). True
→ Always true
e. Suppose the max nodes 𝐵 and 𝐶 are replaced with chance nodes that select actions
uniformly at random. What range of values for 𝑌 would cause the value at the root 𝐴 to
be equal to 𝑌 ? Your answer should be an inequality using ≤, such as 𝑌 ≤ 188, or 188 ≤ 𝑌
≤ 288, or 388 ≤ 𝑌 ≤ 388, or “impossible” (if there are no such values for 𝑌 ).
→ A (min): min(B, C) = Y
B and C values are as follows:
+ B = max (min(Y, 5) , 3)
+ C = max (min(6, 14)) = 6
Case 1. Y <= 5
● Focus on D → D = Y
● Focus on B → B = (D + 3) / 2 = (Y + 3) / 2
● Focus on A → A = B = (Y + 3) / 2
● On the other hand, A = Y
⇒ Y = (Y + 3) / 2 ⇒ Y = 3
Case 2. Y > 5 → D = 5 → B = 5 → A = 5 ⇒ A = Y : Never happened
→ So, 3 <= Y <= 3 or Y = 3
4. Games. Consider the following game tree.

a. What is the minimax value at note A?


→ Level 4 (Min Nodes):
● H: Min(3, 17) → 3
● I: Min(2, 12) → 2
● J: Min(15) → 15
● K: Min(25, 0) → 0
● L: Min(2, 5) → 2
● M: Min(3) → 3
● N: Min(2, 14) → 2
Level 3 (Max Nodes):
● D: Max(3, 2) → 3
● E: Max(15, 0) → 15
● F: Max(2, 3) → 3
● G: Max(2) → 2
Level 2 (Min Nodes):
● B: Min(3, 15) → 3
● C: Min(3, 2) → 2
Level 1 (Max Node):
● A: Max(3, 2) → 3
b. Which branches will be pruned after running minimax search with alpha-beta pruning?
For instance, if the edge between node A and node B is pruned, write A – B. If the edge
between H and 3 is pruned, write H – 3. List the pruned branches from left to right. If a
branch from an upper level is pruned, you don’t have to list the branches below that.

c. Now, consider another game tree which has the same structure as the original game
tree shown above. This modified game tree can take on any values at the leaf nodes.
What is the minimum and the maximum number of leaf nodes that can be pruned after
running alpha-beta pruning?
→ For any game tree with the same structure:
● Minimum Leaf Nodes Pruned: 0 (if all branches must be explored).
● Maximum Leaf Nodes Pruned: 6 (depends on the best and worst-case values,
where pruning eliminates the largest possible subtrees).
5. Non Zero-sum Games.
a. Let’s look at a non-zero-sum version of a game. In this formulation, player A’s utility will
be represented as the first of the two leaf numbers, and player B’s utility will be
represented as the second of the two leaf numbers. Fill in this non-zero game tree
assuming each player is acting optimally.

b. Which nodes can be pruned from the game tree above through alpha-beta pruning? If
no nodes can be pruned, explain why not.
No nodes can be pruned. Because this game is non-zero-sum, there can exist a leaf node
anywhere in the tree that is good for both player A and player B.
6. MDPs Value Iteration
Assume: V0(s) = 0, γ = 1/2
a. Suppose that we are performing value iteration on the grid world MDP below.

i. What is the optimal value for A and B


→ V*(A) = 100 * γ^2 = 100 * (½)^2 = 25
→ V*(B) = 1 * γ^3 = (½)^3 = ⅛
or V*(B) = 100 * γ^5 = 100 * 1/32 = 25/8
⇒ V*(B) = 25/8
ii. After how many iterations 𝑘 will we have 𝑉𝑘(𝑠) = 𝑉*(𝑠) for all states 𝑠? If it never
occurs, write “never”.
→ k = 6 → V (S ) = V*(S)
b. For the following problem, we add a new state in which we can take the 𝐸𝑋𝐼𝑇 action
with a reward of +𝑥.
i. For what values of 𝑥 is it guaranteed that our optimal policy 𝜋* has 𝜋*(𝐶) = ←?
Write ∞ and −∞ if there is no upper or lower bound, respectively. Write the
upper and lower bounds in each respective box.
→ Q (C, →) = 100 * γ^ 4 = 100 / 16
Q (C, ←) = X * γ^3 = X / 8
Have ℼ* (C) = ← ⇒ Q (C , ←) > Q (C, →) ⇒ X / 8 > 100 / 16 ⇒ X > 50
ii. For what values of 𝑥 does value iteration take the minimum number of iterations
𝑘 to converge to 𝑉* for all states? Write ∞ and −∞ if there is no upper or lower
bound, respectively. Write the upper and lower bounds in each respective box.
→ 50 <= x <= 200
iii. What is the value 𝑘, the minimum number of iterations until 𝑉𝑘 has converged to
𝑉* for all states.
→k=4
NOTES
Graph Search VS Tree Search?
Tree Search (cho phép lặp lại)
A set: list, không cho lặp lại

You might also like