Introduction to
Artificial Intelligence
Lecture: Introduction to AI
Outline
● What is Artificial Intelligence (AI)?
● The Foundations of Artificial Intelligence
● The History of Artificial Intelligence
● The State of the Art
2
AI: Dreams for everyone
3
AI: Sophia Robot
4
AI: Deep Blue – AlphaGo
5
Intelligence vs. Artificial Intelligence
● Intelligence includes the capacity for logic, understanding,
learning, reasoning, creativity, and problem solving, etc.
● Artificial intelligence (AI) attempts not just to understand but
also to build intelligent entities.
John McCarthy Marvin Minsky Allen Newell Arthur Samuel Herbert Simon
(1927 – 2011) (1927 – 2016) (1927 – 1992) (1901 – 1990) (1916 – 2001)
6
The fields of AI
● AI research aims to build intelligent entities that are capable of
simulating humans in different aspects.
○ Thinking: learning, planning, knowledge refinement
○ Perception: see, hear, feel, etc.
○ Communication in natural languages
○ Manipulation and moving objects
7
What is AI?
8
Acting humanly
● The Turing Test approach (Alan Turing, 1950):
○ A computer passes the test if a human interrogator, after
posing some written questions, cannot tell whether the
written responses come from a person or from a computer.
9
Acting humanly
● Other Turing Test: Chihuahua and muffin
10
Thinking humanly
● Cognitive modeling approach
● The program’s input-output behavior matches corresponding
human behavior.
● We need to get inside the actual workings of human minds
○ Introspection: trying to catch our own thoughts as they go
by
○ Psychological experiments: observing a person in action
○ Brain imaging: observing the brain in action
11
Thinking humanly
● General Problem Solver – GPS (Newell and Simon, 1961)
○ Not merely solve problems correctly
○ Compare the trace of its reasoning steps to traces of
human subjects solving the same problems
● Cognitive Science
○ Computer models from AI
○ Experimental techniques from psychology
● These approaches are now distinct from AI
○ Share the available theories but do not explain anything
resembling human intelligence
○ All share a principal direction
12
Thinking rationally
● The laws of thought approach
● “Right thinking” = irrefutable reasoning processes
○ Example of Aristotle (381BC – 322BC)
All men are mortal. ∀x.man(x)⇒mortal(x)
Socrates is a man. man(Socrates)
Therefore, Socrates is mortal. mortal(Socrates)
● Obstacles:
○ Not all intelligence is mediated by logic behavior
○ Solving a problem “in principle” is different from doing in
practise.
13
Acting rationally
● The rational agent approach
● Rational behavior = “doing the right thing”
○ The “right thing” is what is expected to maximize goal
achievement given the available information.
● An agent is just something that perceives and then acts
f: P* → A
● A rational agent acts to achieve the best outcome or, when
there is uncertainty, the best expected outcome.
14
Goals of AI
● AI studies the intelligent part concerned with humans and
represents those actions using computers.
● Make computers more useful by letting them take over
dangerous or tedious tasks from human
● Understand principles of human intelligence
15
Related research fields
Field Description
Philosophy Logic, methods of reasoning, mind as physical system, foundations
of learning, language, rationality.
Mathematics Formal representation and proof, algorithms, computation,
(un)decidability, (in)tractability, probability.
Economics Utility, decision theory, rational economic agents
Neuroscience Neurons as information processing units.
Psychology/ How do people behave, perceive, process information, represent
Cognitive Science knowledge.
Computer Engineering Building fast computers
Control Theory Design systems that maximize an objective function over time
Linguistic Knowledge representation, grammar
16
Common topics in AI
● Search (includes Game Playing)
● Representing knowledge and reasoning with it
● Planning
● Learning
● Natural language processing
● Expert systems
● Interacting with the Environment
○ Vision, Speech recognition, Robotics, etc.
17
Pros and Cons of AI
● Pros:
○ More powerful and more useful computers
○ New and improved interfaces
○ Solve new problems
○ Better handling of information
○ Relieve information overload
○ Conversion of information into knowledge
● Cons:
○ Increased costs
○ Difficulty with software development - slow and expensive
○ Few experienced programmers
18
Solving problems by searching
● Search is the fundamental technique of AI, either “uninformed”
or “informed”.
● Search for the first answer that satisfies our goal or keep
searching until we find the best answer.
19
Solving problems by searching
● Search is the fundamental technique of AI, either “uninformed”
or “informed”.
● Search for the first answer that satisfies our goal or keep
searching until we find the best answer.
20
Knowledge and reasoning
● To act rationally in our environment, then we must have some
way of describing that environment and drawing inferences
from that representation.
○ How do we describe what we know about the world ?
○ How do we describe it concisely ?
○ How do we describe it so that we can get hold of the right
piece of knowledge when we need it ?
○ How do we generate new pieces of knowledge ?
○ How do we deal with uncertain knowledge ?
21
Knowledge and reasoning
● Propositional logic and predicate logic
● Inference techniques: forward chaining, backward chaining,
and resolution
● Uncertain knowledge and reasoning
22
Machine Learning
● If a system is going to act truly appropriately, then it must be
able to change its actions in the light of experience.
○ How do we generate new facts from old ?
○ How do we generate new concepts ?
○ How do we learn to distinguish different situations in new
environments?
23
History of AI
● 1940-1950: Early days
○ 1943: McCulloch & Pitts: Boolean circuit model of brain
○ 1950: Turing's “Computing Machinery and Intelligence”
● 1950—70: Excitement: Look, Ma, no hands!
○ 1950s: Early AI programs, including Samuel's checkers
program, Newell & Simon's Logic Theorist, Gelernter's
Geometry Engine
○ 1956: Dartmouth meeting: “Artificial Intelligence” adopted
○ 1965: Robinson's complete algorithm for logical reasoning
● 1970—90: Knowledge-based approaches
○ 1969—79: Early development of knowledge-based systems
○ 1980—88: Expert systems industry booms
○ 1988—93: Expert systems industry busts: “A Winter”
24
History of AI
● 1990—: Statistical approaches
○ Resurgence of probability, focus on uncertainty
○ General increase in technical depth
○ Agents and learning systems... “A Spring”?
● 2000—: Where are we now?
25
Recent advancements
26
Recent advancements
27
Recent advancements
28
Homework
● Find 03 recent advancements in AI.
● Briefly introduce the selected advancements.
● Determine, as deeply as possible, the subfields of them.
29
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
30
Introduction to
Artificial Intelligence
Lecture: Intelligent Agents
Outline
● Agents and Environments
● The Concept of Rationality
● The Nature of Environments
● The Structures of Agents
2
What is agent?
● AI studies how to make computers do things that people are
better at if they could.
● Such systems are called Agents.
● An agent perceives its environment through sensors and acts
upon that environment through actuators.
3
Examples of Agents
Agent Sensors Actuators
Human agent eyes, ears, and other hands, legs, vocal tract,
organs. etc.
Robotic agent cameras, infrared range levels, motors, etc.
finders, etc.
Software agent keystrokes, file displaying on screen,
contents, network writing files, sending
packets, etc. network packets, etc.
4
Agent’s Behavior
● Percept: the agent’s perceptual inputs at any given instant
● Percept sequence: the complete history of everything the agent
has ever perceived
● An agent’s behavior is described by the agent function that
maps any given percept sequence to an action.
f: P* → A
5
Example: Vacuum Agent
● Percepts: location and contents, e.g., [A,Dirty]
● Actions: Left, Right, Suck, Do Nothing
6
Vacuum Agent
Percept sequence Action
[A, Clean] Right
[A, Dirty] Suck
[B, Clean] Left
[B, Dirty] Suck
… …
[A, Clean], [A, Clean], [A, Clean] Right
[A, Clean], [A, Clean], [A, Dirty] Suck
7
Vacuum Agent
function VACUUM-AGENT([location,status])
returns an action
if status = Dirty then return Suck
else if location = A then return Right
else if location = B then return Left
8
Importance of agents
● A tool for analyzing systems
● All areas of engineering can be seen as designing artifacts that
interact with the world.
● AI designs artifacts that have significant computational
resources and the task environment requires nontrivial decision
making.
9
Rational agents
● A rational agent is one that does the right thing.
● We need ways to measure success → Performance measure.
● Performance measure evaluates any given sequence of
environment states.
● General rule: Design performance measures according to
○ What one actually wants in the environment
○ Not how one thinks the agent should behave
● Example: vacuum agent
○ The amount of dirt cleaned up in a single eight-hour shift
○ The floor clean, no matter how the agent behaves
10
Rationality
● Performance measure
○ Define the criterion of success
● Prior knowledge
○ What the agent knows about the environment
● Percept sequence
○ The agent’s percept to date
● Actions
○ What the agent can perform
11
Definition of rational agents
● For each possible percept sequence, a rational agent should
select an action that is expected to maximize its performance
measure, given the evidence provided by the percept sequence
and whatever built-in knowledge the agent has.
12
Example: Vacuum machine
● Performance measure
○ Award one point for each clean square at each time step,
over 10000 timesteps
● Prior knowledge about the environment
○ The geography of the environment (2 squares)
○ The effect of the actions
● Actions that can perform
○ Left, Right, Suck and Do Nothing
● Percept sequences
○ Where is the agent?
○ Whether the location contains dirt?
● The agent is rational.
13
Omniscience, Learning, Autonomy
Omniscience ● Knows the actual outcome of its actions in
advance
● No other possible outcomes
● Impossible in real world
Learning ● A rational agent must learn as much as possible
from what it perceives
Autonomy ● A rational agent should be autonomous – Learn
what it can to compensate for partial or incorrect
prior knowledge.
14
Examples
● Hide and seek
● Apple collector
● https://youtu.be/shccS5kJvtQ
● https://youtu.be/FIz8ycRCSw0
15
Task environment
● Task environments are essentially the “problems” to which
rational agents are the “solutions”.
● PEAS:
○ Performance measure
○ Environment
○ Agent’s Actuators
○ Agent’s Sensors
● In designing an agent, the first step must always be to specify
the task environment (PEAS) as fully as possible.
16
Example: Automated taxi driver
● Agent type: taxi driver
● Performance measure: safe, fast, legal, comfortable trip,
maximize profits
● Environment: roads, other traffic, pedestrians, passengers.
● Sensors: cameras, sonar, speedometer, GPS, odometer,
accelerometer, engine sensors, etc.
● Actuators: steering, accelerator, brake, signal, horn, display.
17
Properties of Task Environment
Fully observable Partially observable
Single agent Multiagent
Deterministic Stochastic
Episodic Sequential
Static Dynamic
Discrete Continuous
Known Unknown
18
Properties of Task Environment
● Fully observable: The agent’s sensory gives it access to the
complete state of the environment.
○ The agent need not maintain internal state to keep track of
the world.
● Partially observable
○ Noisy and inaccurate sensors
○ Parts of the state are simply missing from the sensor data,
e.g., a vacuum agent with only a local dirt sensor cannot tell
whether there is dirt in other squares
● Unobservable: The agent has no sensors at all
19
Properties of Task Environment
● Single agent: An agent operates by itself in an environment.
○ E.g., solving crossword → single-agent,
○ playing chess → two-agent
● Competitive vs. Cooperative multiagent environment
○ E.g., playing chess → competitive,
○ driving on road → cooperative
20
Properties of Task Environment
● Deterministic: The next state of the environment is completely
determined by the current state and the action executed by the
agent.
○ E.g., the vacuum world → deterministic,
○ driving on road → stochastic
● Most real situations are so complex that they must be treated
as stochastic.
21
Properties of Task Environment
● Episodic: The agent’s experience is divided into atomic
episodes, in each of which the agent receives a percept and
then performs a single action.
○ Quality of action depends just on the episode itself
○ Do not need to think ahead
● Sequential: A current decision could affect future decisions.
○ E.g., spotting defective parts on an assembly line vs.
playing chess
22
Properties of Task Environment
● Static: The environment is unchanged while an agent is
deliberating.
○ E.g., crossword puzzles → static, taxi driving → dynamic
● Dynamic: The agent is continuous asked what it wants to do
○ If it has not decided yet, that counts as deciding to do
nothing.
● Semi-dynamic: The environment itself does not change with the
passage of time but the agent’s performance score does
○ E.g., chess playing with a clock
23
Properties of Task Environment
● Discrete vs. continuous
○ The distinction applies to the state of the environment, to the way
time is handled, and to the agent’s percepts and actions
○ E.g., the chess has a finite number of distinct states, percepts and
actions; while the vehicles’ speeds and locations sweep through a
range of continuous values smoothly over time.
● Known vs. unknown
○ Known environment: the outcomes (or outcome probabilities if the
environment is stochastic) for all actions are given.
○ Unknown environment: the agent needs to learn how it works to
make good decisions.
24
Structures of Agents
● Simple reflex agents
● Model-based reflex agents
● Goal-based agents
● Utility-based agents
25
Simple reflex agents
● Select actions based on the current percept, ignoring the rest of
the percept history
● The connection from percept to action is represented by
condition-action rules.
IF current percept THEN action
26
Simple reflex agents
27
Model-based reflex agents
● Partially observable → the agent must keep track of an internal
state
○ It depends on the percept history and reflects some of the
unobserved aspects
○ E.g., driving a car and changing lane
● The agent program updates the internal state information as
time goes by by encoding two kinds of knowledge
○ How the world evolves independently of the agent
○ How the agent’s actions affect the world
28
Model-based reflex agents
29
Goal-based agents
● Current state of the environment is always not enough
● The agent further needs some sort of goal information that
describes desired situations.
○ E.g., at a road junction, the taxi can turn left, turn right, or
go straight on, depending on where the taxi is trying to get
to.
● Less efficient but more flexible
○ Knowledge that supports the decisions is represented
explicitly and can be modified
30
Goal-based agents
31
Utility-based agents
● Goals alone are inadequate to generate high-quality behavior
in most environments
○ Many action sequences can get the goals, some are better,
and some are worse
○ E.g., go home: Vinasun taxi or Grab car?
● An agent’s utility function is essentially an internalization of the
performance measure.
○ Goal → success, utility → degree of success (how
successful it is)
○ If state A is more preferred than others, then A has higher
utility
32
Utility-based agents
33
Learning Agents
34
Learning Agents
● Learning in intelligent agents is a process of modification of
each component of the agent to bring the components into
closer agreement with the available feedback information,
thereby improving the overall performance of the agent.
○ Learning element → Making improvement
○ Performance element → Selecting external actions
○ Critic → Tells the Learning element how well the agent is
doing with respect to fixed performance standard.
(Feedback from user or examples, good or not?)
○ Problem generator → Suggest actions that will lead to new
and informative experiences.
35
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
36
Introduction to
Artificial Intelligence
Lecture: Problem Solving Using Searching
Outline
● Problem-Solving Agents
● Example Problems
● Searching for Solutions
2
Holiday in Romania
3
Goal-based Agents
● Intelligent agents maximize their performance measure.
● Goals help organize behavior by limiting the objectives that the
agent is trying to achieve and the actions it considers.
4
Problem Formulation
● Consider a goal to be a set of world states in which the
objective is satisfied.
● Problem formulation is the process of deciding what actions
and states to consider, given a goal.
5
Properties of the Romania environment
● Observable
○ Each city has a sign indicating its presence for arriving
drivers.
○ The agent always knows the current state.
● Discrete
○ Each city is connected to a small number of other cities.
○ There are only finitely many actions to choose from any
given state.
● Known
○ The agent knows which states are reached by each action.
● Deterministic
○ Each action has exactly one outcome.
6
Solving problem by searching
● Search: the process of looking for a sequence of actions that
reaches the goal
● A search algorithm takes a problem as input and returns a
solution in the form of an action sequence.
● Execution phase: once a solution is found, the recommended
actions are carried out.
7
Solving problem by searching
function PROBLEM-SOLVING-AGENT(percept) returns an action
persistent: seq, an action sequence, initially empty
state, some description of the current world state
goal, a goal, initially null
problem, a problem formulation
state ← UPDATE-STATE(state, percept)
if seq is empty then
goal ← FORMULATE-GOAL(state)
problem ← FORMULATE-PROBLEM(state, goal)
seq ← SEARCH(problem)
if seq = failure then return a null action
action ← FIRST(seq)
seq ← REST(seq)
return action 8
Well-defined problems and solutions
● A problem can be defined formally by five components.
○ Initial state: in which the agent starts
■ E.g., the agent in Romania has its initial state described
as 𝐼𝑛(𝐴𝑟𝑎𝑑)
○ Actions: the possible actions available to the agent
■ E.g., 𝐴𝐶𝑇𝐼𝑂𝑁(𝐴𝑟𝑎𝑑) = {
𝐺𝑜(𝑆𝑖𝑏𝑖𝑢), 𝐺𝑜(𝑇𝑖𝑚𝑖𝑠𝑜𝑎𝑟𝑎), 𝐺𝑜(𝑍𝑒𝑟𝑖𝑛𝑑)}
○ Transition model: what each action does
■ E.g., 𝑅𝑒𝑠𝑢𝑙𝑡(𝐼𝑛(𝐴𝑟𝑎𝑑), 𝐺𝑜(𝑍𝑒𝑟𝑖𝑛𝑑)) = 𝐼𝑛(𝑍𝑒𝑟𝑖𝑛𝑑)
○ Successor: a state reachable from a given state by a single
action
9
Well-defined problems and solutions
○ Goal test: determine whether a given state is a goal state
■ The goal is specified by either an explicit set of possible
goal states or an abstract property.
■ E.g., 𝐼𝑛(𝐵𝑢𝑐h𝑎𝑟𝑒𝑠𝑡), checkmate
○ Path cost: a function that sets a numeric cost to each path
■ Non-negative, reflecting the agent’s performance
measure
■ E.g., 𝑐(𝐼𝑛(𝐴𝑟𝑎𝑑),𝐺𝑜(𝑍𝑒𝑟𝑖𝑛𝑑),𝐼𝑛(𝑍𝑒𝑟𝑖𝑛𝑑)) = 75
● An optimal solution has the lowest path cost.
10
Formulating problems by abstraction
● Abstraction creates an approximate and simplified model of the
real world, which is too detailed for computer.
● This is critical for automated problem solving.
● The choice of a good abstraction involves
○ Remove as much detail as possible while
○ Retain validity and ensure that the abstract actions are easy
to be carried out.
11
The Vacuum-cleaner world
● States: determined by both the agent location and the dirt
locations
○ 2 x 22 = 8 possible world states (𝑛 × 2𝑛 in general)
● Initial state: Any state can be designated as the initial state.
● Actions: Left, Right, and Suck
● Transition model: The actions have their expected effects.
● Goal test: whether all the squares are clean
● Path cost: each step costs 1
12
The Vacuum-cleaner world
13
The 8-puzzle
● States: the location of each of the eight tiles and the blank
● Initial state: any state can be designated as the initial state
● Actions: movements of the blank space
○ Left, Right, Up, or Down.
○ Different subsets of these are possible depending on where
the blank is
● Transition model: return a resulting state given a state and an
action
● Goal test: whether the state matches the goal configuration
● Path cost: each step costs 1
14
The 8-puzzle
15
The 8-queens
● Incremental formulation: add a queen step-by-step to the empty
initial state
● Complete-state formulation: start with all 8 queens on the board
and move them around
● The path cost is trivial because only the final state counts
16
The 8-queens
● States: any arrangement of 0 to 8 queens on the board
● Initial state: no queens on the board
● Actions: add a queen to any empty square
● Transition model: returns the board with a queen added to the
specified square
● Goal test: 8 queens are on the board, none attacked
● 64 ∙ 63 ⋯ 57 ≈ 1.8 × 1014 possible sequences to investigate
17
Search Tree
● Search algorithms consider many possible action sequences to
find the solution sequence.
● Search tree: the possible action sequences starting at the initial
state (root)
○ Branches are actions and nodes are states in the problem’s
state space
● Frontier: the set of all leaf nodes available for expansion at any
given point
● Search algorithms all share the basic structure while vary
according to how they choose which state to expand next --
called search strategy.
18
Search Tree
19
Search Tree
function TREE-SEARCH(problem) returns a solution, or failure
initialize the frontier using the initial state of problem
loop do
if the frontier is empty then return failure
choose a leaf node and remove it from the frontier
if the node contains a goal state then return the
corresponding solution
expand the chosen node, adding the resulting nodes to the
frontier
20
Redundant paths
● Redundant paths are unavoidable.
● Following redundant paths may cause a tractable problem to
become intractable.
● This is true even for algorithms that know how to avoid infinite
loops.
21
Graph Search
function GRAPH-SEARCH(problem) returns a solution, or failure
initialize the frontier using the initial state of problem
initialize the explored set to be empty
loop do
if the frontier is empty then return failure
choose a leaf node and remove it from the frontier
if the node contains a goal state then return the
corresponding solution
add the node to the explored set
expand the chosen node, adding the resulting nodes to the
frontier only if not in the frontier nor explored set 22
Graph Search
23
Infrastructure for search algorithms
● Each node 𝑛 is structuralized by four components.
○ 𝒏. 𝐒𝐓𝐀𝐓𝐄: the state in the state space to which the node
corresponds
○ 𝒏. 𝐏𝐀𝐑𝐄𝐍𝐓: the node in the search tree that generated the
node 𝑛
○ 𝒏. 𝐀𝐂𝐓𝐈𝐎𝐍: the action applied to the parent to generate 𝑛
○ 𝒏. 𝐏𝐀𝐓𝐇 − 𝐂𝐎𝐒𝐓 : the cost, denoted by 𝒈(𝒏), of the path
from the initial state to the node, as indicated by the parent
pointer
● Frontier can be implemented with a (priority) queue or stack.
● Explored set can be a hash table that allows for efficient
checking of repeated states.
24
Infrastructure for search algorithms
25
Problem-solving performance
● Completeness: does it always find a solution if one exists?
● Time complexity: how long does it take to find a solution?
● Space complexity: how much memory is needed to perform
the search?
● Optimality: does it always find a least-cost solution?
26
Homework
● Task 1:
○ Given the start state and the goal one of the 8-puzzle in
page 15.
○ Students draw a search tree by the expanding nodes.
○ The expansion stops when the goal state is expanded.
● Task 2:
○ Program Task 1 in Python using Google Colab
○ Use graph search instead of tree search
27
Homework
1 8 2 8 2 8 2
D L
4 3 1 4 3 1 4 3
7 6 5 7 6 5 7 6 5
U
1 8 2 1 8 2
L 1 2
7 4 3 7 4 3
L 6 5 6 5
4 8 3
7 6 5
1 8 2
D
4 3 1 8 2
U
7 6 5 4 6 3
1 8 2
L 7 5
4 3
7 6 5 28
Homework
from graphviz import Digraph
dot = Digraph()
dot.node('0', '182\n_43\n765')
dot.node('1', '182\n4_3\n765')
dot.node('2', '_82\n143\n765')
dot.node('3', '182\n743\n_65')
dot.edge('0', '1', 'L')
dot.edge('0', '2', 'D')
dot.edge('0', '3', 'U')
dot
29
Homework
● Conduct homework in the given notebook
30
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
31
Introduction to
Artificial Intelligence
Lecture: Uninformed Search
Outline
● Uninformed Search Strategies
● Breadth-first Search
● Uniform-cost Search
● Depth-first Search
● Depth-limited Search
● Iterative Deepening Search
● Bidirectional Search
2
Uninformed Search Strategies
● No additional information about states beyond that provided in
the problem definition → Blind Search
○ Breadth-first Search
○ Uniform-cost Search
○ Depth-first Search
○ Depth-limited Search
○ Iterative Deepening Search
○ Bidirectional Search
3
Breadth-first Search
4
Breadth-first Search
● Implementation: frontier is a FIFO queue
● The goal test is applied to each node when it is generated
rather than when it is selected for expansion.
● Discard any new path to a state already in the frontier or in the
explored set.
5
Breadth-first Search
function BREADTH-FIRST-SEARCH(problem) returns a solution, or failure
node ← a node with STATE = problem.INITIAL-STATE, PATH-COST = 0
if problem.GOAL-TEST(node.STATE) then return SOLUTION(node)
frontier ← a FIFO queue with node as the only element
explored ← an empty set
loop do
if EMPTY?( frontier) then return failure
node ← POP(frontier)
add node.STATE to explored
for each action in problem.ACTIONS(node.STATE) do
child ← CHILD-NODE(problem, node, action)
if child.STATE is not in explored or frontier then
if problem.GOAL-TEST(child.STATE) then
return SOLUTION(child)
frontier ← INSERT(child, frontier) 6
Breadth-first Search
● Draw the search tree S → G
● Write down the path
● Nodes at the same level are handled in the alphabetic order
7
Breadth-first Search
8
Breadth-first Search
● Evaluation
○ Completeness: YES
○ Time complexity: O(bd)
○ Space complexity: O(bd)
○ Optimality: YES if costs are uniform
● Terms:
○ b: maximal branching factor
○ d: level/depth of the solution
○ m: height of the search tree
9
Uniform-cost Search
● UCS expands the node 𝑛 with the lowest path cost 𝑔(𝑛)
● Implementation: frontier is a priority queue ordered by 𝑔
○ Equivalent to breadth-first search if step costs all equal
○ Equivalent to Dijkstra’s algorithm in general
● The goal test is applied to a node when it is selected for
expansion.
10
Uniform-cost Search
function UNIFORM-COST-SEARCH(problem) returns a solution, or failure
node ← a node with STATE = problem.INITIAL-STATE, PATH-COST = 0
frontier ← a priority queue ordered by PATH-COST, with node as the element
explored ← an empty set
loop do
if EMPTY?( frontier) then return failure
node ← POP(frontier)
if problem.GOAL-TEST(node.STATE) then return SOLUTION(node)
add node.STATE to explored
for each action in problem.ACTIONS(node.STATE) do
child ← CHILD-NODE(problem, node, action)
if child.STATE is not in explored or frontier then
frontier ← INSERT(child, frontier)
else if child.STATE is in frontier with higher PATH-COST then
replace that frontier node with child 11
Uniform-cost Search
● Draw the search tree S → G
● Write down the path
12
Uniform-cost Search
● Search path: S → d → e → r → f → G, cost = 10
13
Uniform-cost Search
● Evaluation:
○ Completeness: YES if the best solution has a finite cost and
minimum arc cost is positive.
○ Time complexity: 𝑂(𝑏1+ 𝐶∗/𝜖 )
■ 𝐶∗ : the cost of the best solution
■ 𝜖 : minimal action cost
○ Space complexity: 𝑂(𝑏1+ 𝐶∗/𝜖 )
○ Optimality: YES
14
Depth-first Search
● Implementation: frontier is a LIFO Stack
● Evaluation:
○ Completeness: YES if loops prevented
○ Time complexity: 𝑂(𝑏𝑚)
○ Space complexity: 𝑂(𝑏𝑚)
○ Optimality: NO
15
Depth-first Search
● Draw the search tree S → G
● Write down the path
● Nodes at the same level are handled in the alphabetic order
16
Depth-limited Search
● Standard DFS with a predetermined depth limit 𝑙, i.e., nodes at
depth 𝑙 are treated as if they have no successors.
→ infinite problems solved
● Depth limits can be based on knowledge of the problem.
17
Depth-limited Search
function DEPTH-LIMITED-SEARCH(problem, limit) returns a solution, or failure/cutoff
return RECURSIVE-DLS(MAKE-NODE(problem.INITIAL-STATE), problem, limit)
function RECURSIVE-DLS(node, problem, limit) returns a solution, or failure/cutoff
if problem.GOAL-TEST(node.STATE) then return SOLUTION(node)
else if limit = 0 then return cutoff
else cutoff_occurred? ← false
for each action in problem.ACTIONS(node.STATE) do
child ← CHILD-NODE(problem, node, action)
result ← RECURSIVE-DLS(child, problem, limit – 1)
if result = cutoff then cutoff_occurred? ← true
else if result != failure then return result
if cutoff_occurred? then return cutoff else return failure 18
Depth-limited Search
● Evaluation:
○ Completeness: maybe NO if l < d
○ Time complexity: O(bl)
○ Space complexity: O(bl)
○ Optimality: NO if l > d
19
Iterative Deepening Search
● General strategy, often used in combination with depth-first tree
search to find the best depth limit.
● Gradually increase the limit until a goal is found.
function ITERATIVE-DEEPENING-SEARCH(problem) returns a solution, or failure
for depth = 0 to ∞ do
result ← DEPTH-LIMITED-SEARCH(problem, depth)
if result != cutoff then return result
20
Iterative Deepening Search
21
Iterative Deepening Search
22
Iterative Deepening Search
● Evaluation:
○ Completeness: YES when the branching factor is finite
○ Time complexity: O(bd)
○ Space complexity: O(bd)
○ Optimality: YES if costs are uniform
23
Bidirectional Search
● Two simultaneous searches: one from the initial state towards,
and the other from the goal state backwards
● Hoping that two searches meet in the middle
24
Summary
● Comparison of uninformed algorithms (tree-search versions)
25
Homework
● Conduct homework in the given notebook.
26
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
27
Introduction to
Artificial Intelligence
Lecture: Informed Search
Outline
● Informed Search Strategies
● Best-first Search
● Greedy Best-first Search
● A* Search
● Heuristic Dominance
2
Informed search strategies
● Use problem-specific knowledge beyond the definition of the
problem itself
● Find solutions more efficiently
● Provide significant speed-up in practice
3
What are heuristics?
● Additional knowledge of the problem is imparted to the search
algorithm using heuristics.
● A heuristic is any practical approach to problem solving
sufficient for reaching an immediate goal where an optimal
solution is usually impossible.
○ Not guaranteed to be optimal, perfect, logical, or rational
○ Speed up the process of finding a satisfactory solution
○ Ease the cognitive load of making a decision
4
What are heuristics?
5
Best-first search
● An instance of the general TREE-SEARCH or GRAPH-
SEARCH algorithm
● A node is selected for expansion based on an evaluation
function, 𝒇(𝒏).
● Node with the lowest 𝑓(𝑛) is expanded first
● The choice of 𝑓 determines the search strategy.
6
Heuristic function
● h(n) → estimated cost of the cheapest path from the state at
node 𝑛 to a goal.
● Unlike 𝑔(𝑛), h(𝑛) depends only on the state at that node
● Assumption of h(𝑛)
○ Arbitrary, nonnegative, problem-specific functions
○ Constraint: if 𝑛 is a goal node, then 𝒉(𝒏) = 𝟎
7
g(n) vs h(n)
g(n) h(n)
n
UCS
f(n) = g(n) G
S
g(S) = 0 h(S) = 0
8
Greedy best-first search
g(n) h(n)
n
GBFS
f(n) = h(n) G
S
g(S) = 0 h(S) = 0
9
Straight-line distance heuristic
10
Greedy best-first search
● Draw the search tree
● Determine the path
● Compute the path-cost
11
Greedy best-first search: Evaluation
● Completeness:
○ NO – may get stuck forever
● Time complexity
○ 𝑂(𝑏𝑚) → reduced substantially with a good heuristic
● Space complexity
○ 𝑂(𝑏𝑚) – keeps all nodes in memory
● Optimality
○ NO
12
A* search
● The most widely known form of best-first search
● Evaluate nodes by𝒇(𝒏) =𝒈(𝒏) + 𝒉(𝒏)
○ 𝑔(𝑛) is the cost to reach the node 𝑛
○ h(𝑛) is the cost to get from 𝑛 to the goal
○ 𝑓(𝑛) = estimated cost of the cheapest solution through 𝑛
13
A* search
g(n) h(n)
n
A*
f(n) = g(n) + h(n) G
S
g(S) = 0 h(S) = 0
14
A* search
15
A* search
● Draw the search tree
● Determine the path
● Compute the path-cost
16
A* search: Evaluation
● Completeness
○ YES if all step costs exceed some finite 𝜖 and if 𝑏 is finite
○ (review the condition for completeness of UCS)
● Optimality
○ YES – with conditions on heuristic being used
● Time complexity
○ Exponential
● Space complexity
○ Exponential (keep all nodes in memory)
17
A* search is not always optimal
18
Heuristic: Admissibility
● h(𝑛) must be an admissible heuristic
○ Never overestimate the cost to reach the goal → optimistic
○ E.g: Euclidean distance, Manhattan distance, straight-line
distance, etc.
19
Heuristic: Admissibility
● If h(𝑛) is admissible, A* using TREE-SEARCH is optimal
Suppose some suboptimal goal 𝐺2 has been generated and is in the frontier.
Let 𝑛 be an unexpanded node in the frontier such that 𝑛 is on a shortest path to an
optimal goal 𝐺.
𝑓(𝐺2) = 𝑔(𝐺2) since h(𝐺2) = 0
𝑔(𝐺2) > 𝑔(𝐺) since G2 is suboptimal
𝑓(𝐺) = 𝑔(𝐺) since h(G) = 0
⇒ 𝑓(𝐺2) > 𝑓(𝐺) (1)
h(𝑛) ≤ h*(𝑛) since h is admissible
𝑔(𝑛) + h(𝑛) ≤ 𝑔(𝑛) + h*(𝑛) ⇒ 𝑓(𝑛) ≤ 𝑓(𝐺) (2)
From (1), (2): 𝑓(𝐺2) > 𝑓(𝑛) → A* will never select 𝐺2 for expansion
20
Heuristic: Consistency
● Admissibility is insufficient for graph search.
○ The optimal path to a repeated state could be discard if it is
not the first one selected.
● h(𝑛) is consistent if for every node 𝑛, every successor 𝑛′ of 𝑛
generated by any action 𝑎,
𝒉(𝒏) ≤ 𝒄(𝒏, 𝒂, 𝒏′) + 𝒉(𝒏′)
● Every consistent heuristic is also admissible.
21
Heuristic: Consistency
● If h(𝑛) is consistent, A* using GRAPH-SEARCH is optimal
If h(𝑛) is consistent, the values of 𝑓(𝑛) along any path are non-decreasing.
Suppose 𝑛′ is a successor of 𝑛→𝑔(𝑛′) = 𝑔(𝑛) + 𝑐(𝑛,𝑎,𝑛′)
𝑓(𝑛′) = 𝑔(𝑛′) + h(𝑛′) = 𝑔(𝑛) + 𝑐(𝑛,𝑎,𝑛′) + h(𝑛′) ≥ 𝑔(𝑛) + h(𝑛) = 𝑓(𝑛)
Whenever A* selects a node 𝑛 for expansion, the optimal path to that node has
been found.
22
Contours of A* search
● A* expands nodes in order of increasing 𝑓-value
● A* will expand all nodes with costs 𝑓(𝑛) < 𝐶*
23
Heuristic dominance
● Given two admissible heuristics, h1 and h2
● If h2(𝑛) ≥ h1(𝑛), for all 𝑛, then h2 dominates h1
● A* using h2 will never expand more nodes than A* using h1
● Better to use a heuristic function with higher values, provided it
is consistent and its computation time is not too long.
24
Homework
● Conduct homework in the given notebook.
25
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
26
Introduction to
Artificial Intelligence
Lecture: Local Search
Outline
● Optimization Problems
● Hill-climbing Search
● Simulated Annealing Search
● Local Beam Search
● Genetic Algorithm
2
Global search algorithms
● Global search: explore search spaces systematically
○ Keep one or more paths in memory and record which
alternatives have been explored at each point along the
path
● Many problems do not fit the “standard” search model
○ The final configuration matters, not the order in which it is
formed.
○ No “goal test” and no “path cost”
3
Optimization problems
● The optimization problems find the best state according to an
objective function
○ E.g., the Knight’s tour, TSP, scheduling, integrated-circuit
design, factory-floor layout, automatic programming, vehicle
routing, etc.
4
Local search algorithms
● Operate using a single current node and generally move only to
neighbors of that node
● Not systematic: the paths followed by the search are typically
not retained.
● Use very little memory: usually a constant amount
● Often able to find reasonable solutions in large or infinite
(continuous) state spaces.
5
Local search and Optimization
● “Pure optimization” problems
○ All states have an objective function
○ Goal is to find state with max (or min) objective value
● Local search can do quite well on these problems.
6
State-space landscape
● A landscape has both “location” and “elevation”: location state,
elevation heuristic cost or objective function
● A complete local search algorithm finds a goal if one exists.
● An optimal local search algorithm finds a global extremum
7
Hill-climbing Search
● A loop that continually moves in the direction of increasing
value and terminates when it reaches a “peak”.
● Not look ahead beyond the immediate neighbors of the current
state.
● No search tree maintained, only the state and the objective
function’s value for the current node recorded.
function HILL-CLIMBING(problem) returns a local maximum
current ← MAKE-NODE(problem.INITIAL-STATE)
loop do
neighbor ← a highest-valued successor of current
if neighbor.VALUE ≤ current.VALUE then return current.STATE
current ← neighbor
8
Hill-climbing: 8-queens problem
● Complete-state formulation
○ All 8 queens on the board, one per column
● Successor function
○ Move a single queen to another square in the same column
○ 8×7=56 successors
● Heuristic cost function h(𝑛)
○ The number of pairs of queens that are ATTACKING each
other, either directly or indirectly
○ Global minimum has h(𝑛) = 0
9
Hill-climbing Search
● Hill climbing is often suboptimal, due to local maxima, ridges
and plateau
10
Hill-climbing Search Variants
● Stochastic hill climbing:
○ choose at random from among the uphill moves with a
probability of selection varied with the moves’ steepness
● First-choice hill climbing:
○ generate successors randomly until one is generated that is
better than the current state
● Random-restart hill climbing:
○ A series of hill-climbing searches from randomly generated
initial states until a goal is found
11
Simulated Annealing
● Combine hill climbing with a random walk in some way that
yields both efficiency and completeness
● Shake hard (i.e., at a high temperature) and then gradually
reduce the intensity of shaking (i.e., lower the temperature)
12
Simulated Annealing
function SIMULATED-ANNEALING(problem, schedule)
returns a solution state
inputs: problem, a problem
schedule, a mapping from time to “temperature”
current ← MAKE-NODE(problem.INITIAL-STATE)
for t = 1 to ∞ do
T ← schedule(t)
if T = 0 then return current
next ← a randomly selected successor of current
ΔE ← next.VALUE – current.VALUE
if ΔE > 0 then current ← next
else current ← next only with probability eΔE/T 13
Local Beam Search
● Keep track of 𝑘 states rather than just one
● Begin with 𝑘 randomly generated states
● At each step, all successors of all 𝑘 states are generated
● If the goal is not found, select the 𝑘 best successors from the
complete list and repeat
14
Genetic Algorithms
● A variant of stochastic beam search
● Successor states are generated by combining two parent
states rather than by modifying a single state
● The reproduction are sexual rather than asexual
15
Genetic Algorithms
● Population: a set of 𝑘 randomly generated states that GA
begins with
● Fitness function: an objective function that rates each state
○ Higher values for better state
○ E.g., 8-queens: the number of non-attacking pairs of
queens (min = 0, max = 8×7/2 = 28)
● Produce the next generation by “simulated evolution”
○ Random selection
○ Crossover
○ Random mutation
● Each state, or individual, is represented as a string over a finite
alphabet – most commonly, a string of 0s and 1s.
16
Genetic Algorithms
● Pairs are selected at random for reproduction.
● The probability of being chosen for reproducing is directly
proportional to the fitness score.
● One individual may be selected several times or not at all.
17
Genetic Algorithms
function GENETIC-ALGORITHM(population, FITNESS-FN)
returns an individual
inputs: population, a set of individuals
FITNESS-FN, a function that measures the fitness of an individual
repeat
new_population ← empty set
for i = 1 to SIZE(population) do
x ← RANDOM-SELECTION(population, FITNESS-FN)
y ← RANDOM-SELECTION(population, FITNESS-FN)
child ← REPRODUCE(x , y)
if (small random probability) then child ← MUTATE(child)
add child to new_population
population ← new_population
until some individual is fit enough, or enough time has elapsed
return the best individual in population, according to FITNESS-FN
18
Genetic Algorithms
function REPRODUCE(x , y)
returns an individual
inputs: x , y, parent individuals
n ← LENGTH(x)
c ← random number from 1 to n
return APPEND(SUBSTRING(x , 1, c), SUBSTRING(y, c + 1, n))
19
Homework
● Conduct homework in the given notebook.
20
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
21
Introduction to
Artificial Intelligence
Lecture: Adversarial Search
Outline
● Games
● Optimal Decisions in Games
● α-β Pruning
● Imperfect, Real-time Decisions
● Stochastic Games
2
Multiagent environments
● Each agent needs to consider the actions of other agents and
how they affect its own welfare.
● The unpredictability of other agents introduce contingencies
into the agent’s problem-solving process
● Game theory views any multiagent environment as a game.
○ The impact of each agent on the others is “significant,”
regardless of whether the agents are cooperative or
competitive.
● Types of games:
○ Perfect information vs Imperfect information
○ Deterministic vs Chance
A sequential game has perfect information if each player, when making any decision, is perfectly informed of all the events
that have previously occurred, including the "initialization event" of the game.
3
Types of game
Deterministic Chance
Perfect Chess, Checkers, Go, Backgammon, Monopoly
information Othello
Imperfect Bridge, poker, scrabble
information nuclear war
4
Adversarial search
● Adversarial search (known as games) covers competitive
environments in which the agents’ goals are in conflict.
● Zero-sum games of perfect information
○ Deterministic, fully observable environments, turn-taking,
two-player
○ The utility values at the end are always equal and opposite.
5
Primary assumptions
● Two players only, called MAX and MIN.
○ MAX moves first, and then they take turns moving until the
game ends
○ Winner gets reward, loser gets penalty.
● Both players have complete knowledge of the game’s state
● No element of chance
● Zero-sum games
○ The total payoff to all players is the same for every game
instance.
● Rational players
○ Each player always tries to maximize his/her utility
6
Games as search
● 𝑆0 – Initial state: How the game is set up at the start 0.
● 𝑃𝐿𝐴𝑌𝐸𝑅(𝑠): Which player has the move in a state, MAX/MIN?
● 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠) – Successor function: A list of (move, state) pairs
specifying legal moves.
● 𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎) – Transition model: Result of move 𝑎 on state 𝑠
● 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿 − 𝑇𝐸𝑆𝑇(𝑠): Is the game finished?
● 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠,𝑝) – Utility function: A numerical value of a terminal
state 𝑠 for a player 𝑝
7
Games vs. Search problems
● Complexity: games are too hard to be solved
● Time limits: make some decision even when calculating the
optimal decision is infeasible
● Efficiency: penalize inefficiency severely
○ Several interesting ideas on how to make the best possible
use of time are spawn.
8
Search Tree of Tic-Tac-Toe
9
Optimal decision in games
● Normal search problem
○ Optimal solution is a sequence of action leading to a goal
state.
● Games
○ A search path that guarantee win for a player
○ The optimal strategy can be determined from the minimax
value of each node
● MINIMAX(s) =
UTILITY(s)
if TERMINAL-TEST(s)
maxa∈Actions(s)MINIMAX(RESULT(s, a)) if PLAYER(s) =
MAX 10
Example
MAX best move
MIN best move
Utility values for MAX
11
The minimax algorithm
● Compute the minimax decision from the current state
● Use a simple recursive computation of the minimax values of
each successor state
○ The recursion proceeds all the way down to the leaves of
the tree, and then the minimax values are backed up
through the tree as the recursion unwinds.
12
The minimax algorithm
function MINIMAX-DECISION(state) returns an action
return argmaxa ∈ ACTIONS(s)MIN-VALUE(RESULT(state, a))
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← -∞
for each a in ACTIONS(state) do
v ← MAX(v, MIN-VALUE(RESULT(s, a)))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v←∞
for each a in ACTIONS(state) do
v ← MIN(v, MAX-VALUE(RESULT(s, a)))
return v
13
The minimax algorithm
● A complete depth-first exploration of the game tree
● Completeness
○ Yes (if tree is finite)
● Optimality
○ Yes (against an optimal opponent)
● Time complexity
○ 𝑂(𝑏𝑚)
● Space complexity
○ 𝑂(𝑏𝑚) (depth-first exploration)
14
The minimax algorithm
15
Problem with minimax search
● The number of game states is exponential in the tree’s depth
→ Do not examine every node
● Alpha-beta pruning: Prune away branches that cannot possibly
influence the final decision
● Bounded lookahead
○ Limit depth for each search
○ This is what chess players do: look ahead for a few moves
and see what looks best
16
Alpha-beta pruning
17
Alpha-beta pruning
18
Alpha-beta pruning
function ALPHA-BETA-SEARCH(state) returns an action
v ← MAX-VALUE(state,-∞,+∞)
return the action in ACTIONS(state) with value v
function MAX-VALUE(state,α,β) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← -∞
for each a in ACTIONS(state) do
v ← MAX(v, MIN-VALUE(RESULT(s,a),α,β))
if v ≥ β then return v
α ← MAX(α, v)
return v
function MIN-VALUE(state,α,β) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← +∞
for each a in ACTIONS(state) do
v ← MIN(v, MAX-VALUE(RESULT(s,a) ,α,β))
if v ≤ α then return v
β ← MIN(β, v)
return v
19
Alpha-beta pruning
● Pruning does not affect the result
● Good move ordering improves effectiveness of pruning
● Killer move heuristic
● Transposition table avoids re-evaluation a state
20
Alpha-beta pruning
21
Heuristic minimax
● Both minimax and alpha-beta pruning search all the way to
terminal states.
○ This depth is usually impractical because moves must be
made in a reasonable amount of time (~ minutes).
● Cut off the search earlier with some depth limit
● Use an evaluation function
H-MINIMAX(s, d) =
EVAL(s)
if CUTOFF-TEST(s, d)
maxa∈ACTIONS(s)H-MINIMAX(RESULT(s, a), d+1) if PLAYER(s) =
MAX
mina∈ACTIONS(s)H-MINIMAX(RESULT(s, a), d+1) if PLAYER(s) =22
MIN
Evaluation Functions
● The evaluation function should order the terminal states in the
same way as the true utility function does
○ States that are wins must evaluate better than draws, which
in turn must be better than losses.
● The computation must not take too long!
● For nonterminal states, their orders should be strongly
correlated with the actual chances of winning.
23
Cutting off search
● Minimax Cutoff is identical to Minimax Value except
○ 𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙? is replaced by 𝐶𝑢𝑡𝑜𝑓𝑓?
○ 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 is replaced by 𝐸𝑣𝑎𝑙
if CUTOFF-TEST(state, depth) then return EVAL(state)
24
Stochastic Games
● Uncertain outcomes controlled by chance, not an adversary!
● Why wouldn’t we know what the result of an action will be?
○ Explicit randomness: rolling dice
○ Unpredictable opponents: the ghosts respond randomly
○ Actions can fail: when moving a robot, wheels might slip
25
Expectimax search
● Values reflect average-case (expectimax) outcomes, not worst-
case (minimax) outcomes
● Expectimax search: compute the average score under optimal
play
○ Max nodes as in minimax search
○ Chance nodes are like min nodes, but the outcome is
uncertain
○ Calculate expected utilities, i.e. take weighted average of
children
● The underlying uncertain-result problems can be formulated as
Markov Decision Processes
26
Expectimax search
27
Expectimax search
● It is possible to perform pruning in expectimax search.
● Common techniques for pruning in expectimax include:
○ Depth Limiting: Limiting the depth of the search tree can
effectively prune branches that are too deep to be practically
explored.
○ Evaluation Function: Using an evaluation function to estimate
the value of a state without fully exploring its subtree. If the
evaluation function indicates that further exploration is unlikely to
yield significant improvements, you can prune the subtree.
○ Probabilistic Pruning: In scenarios where probabilities are
involved, you might prune branches that have very low
probabilities of occurring, as they contribute little to the overall
expectation.
○ Iterative Deepening: Iterative deepening can be combined with
pruning techniques to explore deeper parts of the tree only when
necessary, based on the current state of the search.
28
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
29
Introduction to
Artificial Intelligence
Lecture: Constraint Satisfaction Problems
Outline
● Constraint Satisfaction Problems (CSPs)
● Constraint Propagation: Inference in CSPs
● Backtracking Search for CSPs
● Local Search for CSPs
2
Constraint Satisfaction Problems (CSPs)
● A constraint satisfaction problem (CSP) uses a factored
representation for each state.
○ State = a set of variables and each of which has a value
○ Solution = each variable has a value that satisfies all
constraints on that variable
● Take advantage of the structure of states
● General-purpose rather than problem-specific heuristics
○ Identify combinations of variable-value that violate the
constraints → eliminate large portions of the search space
all at once
○ Solutions to complex problems
3
Constraint Satisfaction Problems (CSPs)
● A CSP consists of the following three components
○ 𝑿 = {𝑋1,.., 𝑋n}: a set of variables
○ 𝑫 = {𝐷1,.., 𝐷n}: a set of domains, one for each variable.
○ 𝑪: a set of constraints that state allowable combinations of
values.
● Each 𝐶 consists of a pair <𝑠𝑐𝑜𝑝𝑒, 𝑟𝑒𝑙>
○ 𝑠𝑐𝑜𝑝𝑒: a tuple of variables that participate in the constraint
○ A relation 𝑟𝑒𝑙 defines the values that participated variables
can take
4
Constraints in CSPs
● Assume that both 𝑋1 and 𝑋2 have the domain {𝐴,𝐵}
● “Two variables must have different values”
● A relation can be an explicit list of all tuples of values that
satisfy the constraint.
○ <(𝑋1, 𝑋2), {(𝐴, 𝐵), (𝐵, 𝐴)}>
● It can be an abstract relation that supports two operations
○ Test whether a tuple is a member of the relation
○ Enumerate the members of the relation
○ <(𝑋1, 𝑋2), 𝑋1 ≠ 𝑋2>
5
Solutions for CSPs
● Each state is defined by an assignment of values to some or all
the variables.
● A solution to a CSP is a consistent – complete assignment.
○ A consistent assignment does not violate any constraints.
○ A complete assignment has every variable assigned, while
a partial assignment assigns values to only some
variables.
6
Example: Map Coloring
● Each state is assigned a color in {red, green, blue}.
● Adjacent states have different colors.
● Constraint graph:
○ Nodes ⇔ Variables
○ Arcs ⇔ Constraints
7
Example: Map Coloring
● Variables: 𝑋 = {𝑊𝐴, 𝑁𝑇, 𝑄, 𝑁𝑆𝑊, 𝑉, 𝑆𝐴, 𝑇}
● Domains: 𝐷i = {𝑟𝑒𝑑, 𝑔𝑟𝑒𝑒𝑛, 𝑏𝑙𝑢𝑒}
● Constraints: Adjacent regions must have different colors
{𝑆𝐴 ≠ 𝑊𝐴, 𝑆𝐴 ≠ 𝑁𝑇, 𝑆𝐴 ≠ 𝑄, 𝑆𝐴 ≠ 𝑁𝑆𝑊,
𝑆𝐴 ≠ 𝑉, 𝑊𝐴 ≠ 𝑁𝑇, 𝑁𝑇 ≠ 𝑄, 𝑄 ≠ 𝑁𝑆𝑊 , 𝑁𝑆𝑊 ≠ 𝑉}
○ 𝑆𝐴 ≠ 𝑊𝐴 is a shortcut of <(𝑆𝐴,𝑊𝐴), 𝑆𝐴 ≠ 𝑊𝐴>
○ 𝑆𝐴 ≠ 𝑊𝐴 can be fully enumerated as
{(red,green), (red,blue), (green,red),
(green,blue), (blue,red), (blue,green)}
8
Example: Map Coloring
● There are many possible solutions to this problem.
{𝑊𝐴 = 𝑟𝑒𝑑, 𝑁𝑇 = 𝑔𝑟𝑒𝑒𝑛, 𝑄 = 𝑟𝑒𝑑, 𝑁𝑆𝑊 = 𝑔𝑟𝑒𝑒𝑛,
𝑉 = 𝑟𝑒𝑑, 𝑆𝐴 = 𝑏𝑙𝑢𝑒, 𝑇 = 𝑟𝑒𝑑}
9
Benefits of CSPs
● Provide natural representation for a wide variety of problems
● Many problems intractable in regular state-space search can
be solved quickly with CSP formulation.
● Better insights to the problem and its solution.
10
Variations on the CSP formalism
● Discrete and finite variables
○ 𝑛 variables, domain size 𝑑 → 𝑂(𝑑𝑛) complete assignments
○ E.g., map coloring, scheduling with time limits, 8-queens
etc.
● Discrete, infinite domains
○ Sets of Integers, strings, etc.
○ E.g., job scheduling without deadlines
○ Constraint language: understand constraints without
enumeration
● Continuous domains
○ Real-world problems often involve continuous domains and
even real-valued variables.
11
CPSs in practise
● Operations Research (scheduling, timetabling)
○ Scheduling the time of observations on the Hubble Space
Telescope
○ Airline schedules
● Linear programming
○ Constraints must be linear equalities or inequalities
→ solved in time polynomial in the number of variables.
● Bioinformatics (DNA sequencing)
● Electrical engineering (circuit layout-ing)
● Cryptography
● Computer vision: image interpretation
12
Types of constraints
● Unary constraint: restrict the value of a single variable
○ 𝑆𝐴 ≠ 𝑔𝑟𝑒𝑒𝑛
● Binary constraint: relate two variables
○ 𝑆𝐴 ≠ 𝑊𝐴
● Higher-order constraints: involve three or more variables
○ E.g., Professors A, B, and C cannot be on a committee
together
○ Always possible to be represented by multiple binary
constraints
● Global constraints: involving an arbitrary number of variables
○ 𝐴𝑙𝑙𝑑𝑖𝑓𝑓 = all variables involved must have different values
13
Preference constraints
● Which solutions are preferred → soft constraints
○ E.g., 𝑟𝑒𝑑 is better than 𝑔𝑟𝑒𝑒𝑛
→ this often can be represented by a cost for each variable
assignment
● Constraint optimization problem (COP): a combination of
optimization with CSPs → linear programming
14
Constraint propagation
● Constraints help to reduce the number of legal values for a
variable
→ legal values for another variable are also reduced.
● Intertwined with search, or done as a preprocessing step.
● Enforcing local consistency in each part of a graph causes
inconsistent values to be eliminated throughout the graph.
15
Node consistency
● A single variable is node-consistent if all the values in the
variable’s domain satisfy the variable’s unary constraints.
● Eliminate all the unary constraints in a CSP by running node
consistency.
● E.g., The South Australians dislike green, the domain of {𝑆𝐴}
will be {𝑟𝑒𝑑, 𝑔𝑟𝑒𝑒𝑛, 𝑏𝑙𝑢𝑒}
16
Arc consistency
● A variable in a CSP is arc-consistent if every value in its
domain satisfies the variable’s binary constraints.
● Arc consistency may have no effect in several cases.
● E.g., the Australia map, no matter what value chosen for 𝑆𝐴 (or
for 𝑊𝐴), there is a valid value for the other variable.
17
Arc consistency
18
Arc consistency
● Run as a preprocessor before the search starts or after each
assignment
● AC must be run repeatedly until no inconsistency remains
● Need a systematic method for arc-checking
○ If 𝑋 loses a value, neighbors of 𝑋 need to be rechecked
→ incoming arcs can become inconsistent again while
outgoing arcs stay still
19
AC-3 algorithm
function AC-3(csp)
returns false if an inconsistency is found and true otherwise
inputs: csp, a binary CSP with components (X, D, C)
local variables: queue, a queue of arcs, initially all the arcs in csp
while queue is not empty do
(Xi , Xj) ← REMOVE-FIRST(queue)
if REVISE(csp, Xi , Xj) then
if size of Di = 0 then return false
for each Xk in Xi.NEIGHBORS - {Xj} do
add (Xk , Xi) to queue
return true
20
AC-3 algorithm
function REVISE(csp, Xi , Xj)
returns true iff we revise the domain of Xi
revised ← false
for each x in Di do
if no value y in Dj allows (x ,y) to satisfy the constraint between Xi
and Xj then
delete x from Di
revised ← true
return revised
● The worst-case complexity is 𝑂(𝑐𝑑3)
○ 𝑛: number of variables, each has domain size 𝑑, 𝑐 binary constraints
(arc) 21
Backtracking Search
● States are defined by the values assigned so far
○ Initial state: empty assignment { }
○ Successor function: assign a value to an unassigned
variable that agrees with the current assignment
→ fail if no legal assignments
○ Goal test: the current assignment is complete
● Depth-first search: choose values for one variable at a time and
backtrack when a variable has no legal values left
22
Backtracking Search
function BACKTRACKING-SEARCH(csp) returns a solution, or failure
return BACKTRACK({ }, csp)
function BACKTRACK(assignment, csp) returns a solution, or failure
if assignment is complete then return assignment
var ← SELECT-UNASSIGNED-VARIABLE(csp)
for each value in ORDER-DOMAIN-VALUES(var, assignment, csp) do
if value is consistent with assignment then
add {var = value} to assignment
inferences ← INFERENCE(csp, var, value)
if inferences != failure then
add inferences to assignment
result ← BACKTRACK(assignment, csp)
if result != failure then
return result
remove {var = value} and inferences from assignment
return failure
23
Backtracking Search
24
Variable and value ordering
● Minimum-remaining-values (MRV) heuristic: choose the
variable with the fewest legal values
○ E.g., after [𝑊𝐴 = 𝑟𝑒𝑑, 𝑁𝑇 = 𝑔𝑟𝑒𝑒𝑛] only one possible value
for 𝑆𝐴
25
Variable and value ordering
● Degree heuristic (DH): choose the variable that involves in the
largest number of constraints on other unassigned variables.
○ E.g., 𝑆𝐴 has a highest degree of 5
26
Variable and value ordering
● Least constraining value (LCV) heuristic: given a variable,
choose the value that leaves the maximum flexibility for
subsequent variable assignments
27
Inference: Forward checking
● Keep track of remaining legal values for unassigned variables
● Terminate search when any variable has no legal values
28
Local search for CSPs
● Complete-state formulation
○ The initial state assigns a value to every variable →
violation
○ The search changes the value of one variable at a time →
eliminate the violated constraints
● Min-conflicts heuristic: the minimum number of conflicts with
other variables
● Min-conflicts is surprisingly effective for many CSPs.
29
MIN-CONFLICTS algorithm
function MIN-CONFLICTS(csp, max steps) returns a solution or failure
inputs: csp, a constraint satisfaction problem
max steps, the number of steps allowed before giving up
current ← an initial complete assignment for csp
for i = 1 to max steps do
if current is a solution for csp then return current
var ← a randomly chosen conflicted variable from csp.VARIABLES
value ← the value v for var that minimizes CONFLICTS(var, v,
current, csp)
set var = value in current
return failure
30
Local search for CSPs
● The landscape of a CSP under the min-conflicts heuristic
usually has a series of plateaux.
● Plateau search: allow sideways moves to another state with the
same score.
● Tabu search: keep a small list of recently visited states and
forbid the algorithm to return to those states
● Simulated annealing can also be used.
31
Constraint weighting
● Concentrate the search on the important constraints
● Each constraint is given a numeric weight, 𝑊 , initially all 1.
● At each step, choose a variable/value pair to change that has
the lowest total weight of all violated constraints
● Increase the weight of each constraint that is violated by the
current assignment.
32
Homework
● Conduct homework in the given notebook
33
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
34
Introduction to
Artificial Intelligence
Lecture: Logical Agents
Outline
● Knowledge-Based Agents
● Logic
● Propositional Logic
● Propositional Theorem Proving
● Effective Propositional Model Checking
2
Knowledge-based agents
● Supported by logic – a general class of representation
● Combine and recombine information to suit myriad purposes
● Knowledge base (KB): A set of sentences or facts
○ Each sentence represents some assertion about the world.
○ Axiom = the sentence that is not derived from other
sentences
● Inference: Derive (infer) new sentences from old ones
○ Add new sentences to the knowledge base and query what
is known
3
Knowledge-based agents
function KB-AGENT(percept) returns an action
persistent: KB, a knowledge base
t, a counter, initially 0, indicating time
TELL(KB, MAKE-PERCEPT-SENTENCE(percept, t))
action ← ASK(KB, MAKE-ACTION-QUERY(t))
TELL(KB, MAKE-ACTION-SENTENCE(action, t))
t←t+1
return action
Inference mechanisms are hidden inside TELL and ASK
4
Logic
● Logics are formal languages for representing information such
that conclusions can be drawn
● Syntax defines the well-formed sentences in the language
● Semantics define the "meaning" of sentences
● Models are mathematical abstractions that fix the truth or
falsehood of every relevant sentence.
● 𝑚 satisfies (or is a model of) 𝛼 if 𝛼 is true in model 𝑚
● 𝑀(𝛼) = the set of all models of 𝛼
5
Entailment
● A sentence follows logically from another sentence: 𝜶 ⊨ 𝜷
● 𝜶 ⊨𝜷 if and only if, in every model in which 𝜶 is true, 𝜷 is also
true.
○ 𝑀(𝛼) ⊆ 𝑀(𝛽)
○ 𝑥 = 0 entails 𝑥𝑦 = 0
● Entailment is a relationship between sentences that is based
on semantics.
6
Logical inference
● 𝐾𝐵 ⊨𝑖 𝛼 means 𝛼 can be derived from 𝐾𝐵 by procedure 𝑖.
● Soundness: 𝑖 is sound if whenever 𝐾𝐵 ⊨𝑖 𝛼, it is also true that
𝐾𝐵 ⊨ 𝛼
● Completeness: 𝑖 is complete if whenever 𝐾𝐵 ⊨ 𝛼, it is also true
that 𝐾𝐵 ⊨𝑖 𝛼
7
World and representation
● The reasoning agent gets its knowledge about the facts of the
world as a sequence of logical sentences
● Conclusions must be drawn only from those → without agent’s
independent access to the world
● Thus it is very important that the agent’s reasoning is sound.
8
Propositional Logic: Syntax
● Propositional logic: the simplest logic illustrating basic ideas
● Constants: TRUE or FALSE
● Symbols stand for propositions (sentences): 𝑃, 𝑄, 𝑊, etc.
● Logical connectives
NOT ¬ Negation
AND ∧ Conjunction
OR ∨ Disjunction
IMPLIES ⇒ Implication (if..then)
IFF ⟺ Equivalence, biconditional
● Literal: atomic sentence (P) or negated atomic sentence (¬P)
9
Propositional Logic: Syntax
10
Semantics
● Each model specifies true/false for each proposition symbol
11
A simple inference procedure
● Given: a set of sentences, 𝑲𝑩, and sentence 𝜶
● Goal: answer 𝑲𝑩 ⊨ 𝜶? = “Does 𝑲𝑩 semantically entail 𝜶?”
● Approaches
○ Model-checking approach (Inference by enumeration)
○ Inference rules
○ Conversion to the inverse SAT problem (Resolution
refutation)
12
Model-checking approach
● Check if 𝛼 is true in every model in which 𝐾𝐵 is true
● Draw a truth table for checking
13
Inference by (depth-first) enumeration
function TT-ENTAILS?(KB, α) returns true or false
inputs: KB, the knowledge base, a sentence in propositional logic
α, the query, a sentence in propositional logic
symbols ← a list of the proposition symbols in KB and α
return TT-CHECK-ALL(KB, α, symbols, { })
function TT-CHECK-ALL(KB, α, symbols, model) returns true or false
if EMPTY?(symbols) then
if PL-TRUE?(KB, model) then return PL-TRUE?(α, model)
else return true // when KB is false, always return true
else do
P ← FIRST(symbols)
rest ← REST(symbols)
return (TT-CHECK-ALL(KB,α,rest,model ∪ {P = true})
and TT-CHECK-ALL(KB,α,rest,model ∪ {P = false}))
14
Model-checking approach
● Given a KB containing the following rules and facts
R1: IF hot AND smoky THEN fire
R2: IF alarm_beeps THEN smoky
R3: IF fire THEN sprinklers_on
F1: alarm_beeps
F2: hot
● Represent the KB in propositional logic with given symbols
○ H = hot, S = smoky, F = fire, A = alarms_beeps,
○ R = sprinklers_on
● Answer the question “Sprinklers_on?” by using the model- checking
approach.
15
Inference rules approach
● Theorem proving: Apply rules of inference directly to the
sentences in KB to construct a proof of the desired sentence
without consulting models
16
Logical equivalence
● Two sentences 𝛼 and 𝛽 are logically equivalent if they are true
in the same set of models.
𝜶 ≡ 𝜷 𝒊𝒇𝒇 𝜶 ⊨ 𝜷 𝒂𝒏𝒅 𝜷 ⊨ 𝜶
17
Validity
● A sentence is valid if it is true in all models.
● Valid sentences are also known as tautologies.
● Validity is connected to inference via the Deduction Theorem
𝛼 ⊨ 𝛽 𝑖𝑓𝑓 𝛼 ⇒ 𝛽 𝑖𝑠 𝑣𝑎𝑙𝑖𝑑
18
Satisfiability
● A sentence is satisfiable if it is true in some model.
● A sentence is unsatisfiable if it is true in no models.
● Satisfiability is connected to inference via the following
𝛼 ⊨ 𝛽 𝑖𝑓𝑓 𝛼 ∧ ¬𝛽 𝑖𝑠 𝑢𝑛𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑎𝑏𝑙𝑒
→ Refutation or proof by contradiction
● The SAT problem determines the satisfiability of sentences in
propositional logic (NP-complete).
19
Validity and Satisfiability
1. 𝐴 ∨ 𝐵 ⇒ 𝐴 ∧ 𝐶
2. 𝐴 ∧ 𝐵 ⇒ 𝐴 ∨ 𝐶
3. (𝐴 ∨ 𝐵) ∧ (¬𝐵 ∨ 𝐶) ⇒ 𝐴 ∨ 𝐶
4. (𝐴 ∨ ¬𝐵) ⇒ 𝐴 ∧ 𝐵
20
Inference and Proofs
● Proof: A chain of conclusions leads to the desired goal
21
Inference and Proofs
22
Monotonicity
● The set of entailed sentences only increases as information is
added to the knowledge base.
𝑖𝑓 𝐾𝐵 ⊨ 𝛼 𝑡h𝑒𝑛 𝐾𝐵 ∧ 𝛽 ⊨ 𝛼
● Additional conclusions can be drawn without invalidating any
conclusion 𝛼 already inferred.
23
Proof by Resolution
● Proof by Inference Rules: sound but not complete
● Resolution: sound and complete, a single inference rule
● Given 𝑙𝑖 and 𝑚 are complementary literals
○ Unit resolution inference rule
○ Full resolution rule
24
Proof by Resolution
25
Proof by Resolution
● Factoring: the resulting clause should contain only one copy of
each literal.
● For any sentences 𝛼 and 𝛽 in propositional logic, a resolution-
based theorem prover can decide whether 𝛼 ⊨ 𝛽.
26
Conjunctive Normal Form (CNF)
● Resolution applies only to clauses, i.e. disjunctions of literals
○ Convert all sentences in KB into clauses (CNF form)
● For example, convert A ⇔ (B ∨ C) into CNF
(¬A ∨ B ∨ C) ∧ (¬B ∨ A) ∧ (¬C ∨ A )
27
Conversion to CNF
● Eliminate ⇔: 𝛼 ⇔ 𝛽 ≡ 𝛼 ⇒𝛽 ∧ 𝛽 ⇒ 𝛼
● Eliminate ⇒: 𝛼 ⇒ 𝛽 ≡ ¬𝛼 ∨ 𝛽
● The operator ¬ appears only in literals: “move ¬ inwards”
¬¬𝛼 ≡ 𝛼 (double-negation elimination)
¬(𝛼 ∧ 𝛽) ≡ ¬𝛼 ∨ ¬𝛽 (De Morgan)
¬(𝛼 ∨ 𝛽) ≡ ¬𝛼 ∧ ¬𝛽 (De Morgan)
● Apply the distributivity law to distribute ∨ over ∧
(𝛼 ∧ 𝛽) ∨ 𝛾 ≡ (𝛼 ∨ 𝛾) ∧ (𝛽 ∨ 𝛾)
28
The resolution algorithm
● Proof by contradiction: To show that 𝐾𝐵 ⊨ 𝛼, prove 𝐾𝐵 ∧ ¬𝛼 is
unsatisfiable
function PL-RESOLUTION(KB, α) returns true or false
inputs: KB, the knowledge base, a sentence in propositional logic
α, the query, a sentence in propositional logic
clauses ← the set of clauses in the CNF representation of KB ∧ ¬α
new ← { }
loop do
for each pair of clauses Ci, Cj in clauses do
resolvents ← PL-RESOLVE(Ci , Cj)
if resolvents contains the empty clause then return true
new ← new ∪ resolvents
if new ⊆ clauses then return false
clauses ← clauses ∪ new
29
Horn clauses and Definite clauses
● Definite clause: a disjunction of literals of which exactly one is
positive.
● Horn clause: a disjunction of literals of which at most one is
positive.
● Goal clause: clauses with no positive literals
30
KB of definite clauses
● KB containing only definite clauses are interesting.
● Every definite clause can be written as an implication.
● Inference can be done with forward-chaining and backward-
chaining algorithms.
● Deciding entailment can be done in linear time.
31
Forward chaining
● Key idea: Fire any rule whose premises are satisfied in the KB,
add its conclusion to the KB, until the query is found.
32
Forward chaining
function PL-FC-ENTAILS?(KB, q) returns true or false
inputs: KB, the knowledge base, a set of propositional definite clauses
q, the query, a proposition symbol
count ← a table, where count[c] is the number of symbols in c’s premise
inferred ← a table, where inferred[s] is initially false for all symbols
agenda ← a queue of symbols, initially symbols known to be true in KB
while agenda is not empty do
p ← POP(agenda)
if p = q then return true
if inferred[p] = false then
inferred[p] ← true
for each clause c in KB where p is in c.PREMISE do
decrement count[c]
if count[c] = 0 then add c.CONCLUSION to agenda
return false 33
Forward chaining
34
Forward chaining
35
Backward chaining
● Key idea: Work backwards from the query 𝒒
● Avoid loops: A new subgoal is already on the goal stack?
● Avoid repeated work: A new subgoal has already been proved
true, or has already failed?
36
Backward chaining
37
Backward chaining
38
Backward chaining
39
Backward chaining
40
Backward chaining
41
Forward vs. Backward chaining
● Forward chaining: data-driven, automatic, unconscious
processing
● Backward chaining: goal-driven, good for problem-solving
42
Efficient propositional inference
● The SAT problem (checking satisfiability)
○ Testing entailment, 𝛼 ⊨ 𝛽? = testing unsatisfiability of 𝛼 ∧
¬𝛽
● Two families of efficient algorithms for general propositional
inference based on model checking
○ Complete backtracking search algorithms
DPLL algorithm (Davis, Putnam, Logemann, Loveland)
○ Incomplete local search algorithms(hill-climbing)
WalkSAT algorithm
43
The DPLL algorithm
● Determine whether an input propositional logic sentence (in
CNF) is satisfiable
● Improvements over truth table enumeration
1. Early termination
2. Pure symbol heuristic
3. Unit clause heuristic
44
The DPLL algorithm
● Early termination: A clause is true if any literal is true and a
sentence is false if any clause is false.
● Pure symbol heuristic: A pure symbol always appears with the
same "sign" in all clauses.
● Unit clause heuristic: there is only one literal in the clause and
thus this literal must be true
45
The DPLL algorithm
function DPLL-SATISFIABLE?(s)
returns true or false
inputs: s, a sentence in propositional logic
clauses ← the set of clauses in the CNF representation of s
symbols ← a list of the proposition symbols in s
return DPLL(clauses, symbols,{ })
46
The DPLL algorithm
function DPLL(clauses, symbols, model) returns true or false
if every clause in clauses is true in model then return true
if some clauses in clauses are false in model then return false
P, value ← FIND-PURE-SYMBOL(symbols, clauses, model)
if P is non-null then return DPLL(clauses, symbols – P, model ∪ {P=value})
P, value ← FIND-UNIT-CLAUSE(clauses, model)
if P is non-null then return DPLL(clauses, symbols – P, model ∪ {P=value})
P ← FIRST(symbols)
rest ← REST(symbols)
return DPLL(clauses, rest, model ∪ {P=true})
or DPLL(clauses, rest, model ∪ {P=false}))
47
The WalkSAT algorithm
● Incomplete, local search algorithm
● Evaluation function: The min-conflict heuristic of minimizing the
number of unsatisfied clauses
● Balance between greediness and randomness
48
The WalkSAT algorithm
● When the algorithm returns a model → The input sentence is
indeed satisfiable
● When it returns false → The sentence is unsatisfiable OR we
need to give it more time
● WalkSAT cannot always detect unsatisfiability
● It is most useful when a solution is expected to exist
49
PySAT - Glucose3
! pip install python-sat
from pysat.solvers import Glucose3
g = Glucose3()
g.add_clause([1])
g.add_clause([2])
g.add_clause([-1, 3])
g.add_clause([-2, -3, -4])
g.add_clause([-4])
print(g.solve())
# True
g.get_model()
# [1, 2, 3, -4] 50
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
51
Introduction to
Artificial Intelligence
Lecture: First Order Logic
Outline
● First-Order Logic (FOL)
● Unification and Lifting
● Forward Chaining
● Backward Chaining
● Resolution
2
First-order Logic
● Objects are referred by nouns and noun phrases.
○ E.g., people, houses, numbers, colors, Bill Gates, games,
wars, etc.
● Relations can be unary relations (properties) or 𝒏-ary relations,
representing by verbs and verb phrases
○ Properties: red, round, prime, etc.
○ 𝑛-ary relations: brother of, bigger than, part of, comes
between, etc.
● Functions are relations in which there is only one “value” for a
given “input.”
○ E.g., father of, best friend, one more than, etc.
3
First-order Logic
● “One plus two equals three.”
○ Object: one, two, three, one plus two
○ Relation: equal
○ Function: plus
● “The intelligent AlphaGo beat the world champion in 2016.”
○ Object: AlphaGo, world champion, 2016
○ Property: intelligent
○ Relation: beat
4
Types of Logic
Language Ontological Epistemological
Commitment Ontological Commitment
(What exists in the world) (What an agent believes
about facts)
Propositional logic Facts True/false/unknown
First-order logic Facts, objects, True/false/unknown
relations
Temporal logic Facts, objects, True/false/unknown
relations, time
Probability logic Facts Degree of belief ∈ [0,1]
Fuzzy logic Facts with degree of Known interval value
truth ∈ [0,1]
5
Models for a logic language
● Formal structures that constitute the possible worlds under
consideration
● Each model links the vocabulary of sentences to elements of
the possible world → determine the truth of any sentence
● Models for propositional logic link proposition symbols to
predefined truth values.
● The domain of a model is the set of objects (or domain
elements) it contains.
● Nonempty — Every possible world must contain at least one
object
6
Models for a logic language
7
Symbols
● Constant symbols represent objects.
○ E.g., Richard, John, etc.
● Predicate symbols stand for relations.
○ E.g., Brother , OnHead, Person, King, and Crown, etc.
● Function symbols stand for functions.
○ E.g., LeftLeg
● Each predicate or function symbol comes with an arity that
fixes the number of arguments.
○ E.g., Brother(x,y) → binary, LeftLeg(x) → unary, etc.
● These symbols begins with uppercase letters by convention.
8
Intended interpretation
● Interpretation specifies exactly which objects, relations and
functions are referred to by the symbol.
● Each model includes an (intended) interpretation.
9
FOL syntax
10
FOL syntax
11
Terms
● A term is a logical expression that refers to an object
○ Constant symbols: 𝐽𝑜h𝑛
○ Function symbols: 𝐿𝑒𝑓𝑡𝐿𝑒𝑔(𝐽𝑜h𝑛)
● A complex term is formed by a function symbol followed by a
parenthesized list of terms as arguments.
● Term = function(term1,...,termn) or constant or variable
12
Atomic sentence
● An atomic sentences state facts by using a predicate symbol
followed by a parenthesized list of terms.
● Atomic sentence = predicate(term1,...,termn)
● For example,
○ Brother(Richard, John)
○ Married(Father(Richard), Mother(John))
● It is true if the relation referred to by the predicate symbol holds
among the objects referred to by the arguments
13
Complex sentences
● A complex sentences are made from atomic sentences using
connectives.
● For example,
○ ¬𝐵𝑟𝑜𝑡h𝑒𝑟(𝐿𝑒𝑓𝑡𝐿𝑒𝑔(𝑅𝑖𝑐h𝑎𝑟𝑑), 𝐽𝑜h𝑛)
○ 𝐵𝑟𝑜𝑡h𝑒𝑟(𝑅𝑖𝑐h𝑎𝑟𝑑,𝐽𝑜h𝑛) ∧ 𝐵𝑟𝑜𝑡h𝑒𝑟(𝐽𝑜h𝑛,𝑅𝑖𝑐h𝑎𝑟𝑑)
○ 𝐾𝑖𝑛𝑔(𝑅𝑖𝑐h𝑎𝑟𝑑) ∨ 𝐾𝑖𝑛𝑔(𝐽𝑜h𝑛)
○ ¬𝐾𝑖𝑛𝑔(𝑅𝑖𝑐h𝑎𝑟𝑑) ⇒ 𝐾𝑖𝑛𝑔(𝐽𝑜h𝑛)
● Syntax and semantics are the same as in propositional logic.
14
Quantifiers: Universal quantification
● ∀<variables> <sentence>
○ E.g., “Students of FIT are smart.”
○ ∀𝑥 𝑆𝑡𝑢𝑑𝑒𝑛𝑡(𝑥, 𝐹𝐼𝑇) ⇒ 𝑆𝑚𝑎𝑟𝑡(𝑥)
● ∀x P is true in a model m iff P is true with x being each possible
object in the model.
● It is equivalent to the conjunction of instantiations of 𝑃.
Student(Lan, FIT) ⇒ Smart(Lan) ∧
Student(Tuan, FIT) ⇒ Smart(Tuan) ∧
Student(Long, FIT) ⇒ Smart(Long) ∧ …
15
Variables
● A variable is a term all by itself and able to serve as the
argument of a function
○ E.g., in predicates 𝐾𝑖𝑛𝑔(𝑥) or in function 𝐿𝑒𝑓𝑡𝐿𝑒𝑔(𝑥).
● Usually represented by lowercase letters
● A term with no variables is called a ground term.
16
A common mistake to avoid
● Typically, ⇒ is the main connective with ∀
○ The conclusion of the rule just for those objects for whom
the premise is true
○ It says nothing at all about individuals for whom the premise
is false.
● Too strong implication
○ ∀𝑥 𝑆𝑡𝑢𝑑𝑒𝑛𝑡(𝑥,𝐹𝐼𝑇) ∧ 𝑆𝑚𝑎𝑟𝑡(𝑥)
○ It means “Everyone is a student of FIT and Everyone is
smart.”
17
Quantifiers: Existential quantification
● ∃<variables> <sentence>
○ E.g., “Some students of FIT are smart.”
○ ∃𝑥 𝑆𝑡𝑢𝑑𝑒𝑛𝑡(𝑥,𝐹𝐼𝑇) ∧ 𝑆𝑚𝑎𝑟𝑡(𝑥)
● ∃x P is true in a model m iff P is true with x being some
possible object in the model.
● It is equivalent to the disjunction of instantiations of 𝑃.
Student(Lan, FIT) ∧ Smart(Lan) ∨
Student(Tuan, FIT) ∧ Smart(Tuan) ∨
Student(Long, FIT) ∧ Smart(Long) ∨ ...
18
Common mistake to avoid
● Typically, ∧ is the main connective with ∃
● Too weak implication
○ ∃𝑥 𝑆𝑡𝑢𝑑𝑒𝑛𝑡(𝑥, 𝐹𝐼𝑇) ⇒ 𝑆𝑚𝑎𝑟𝑡(𝑥)
○ It is true even with anyone who is not at FIT.
19
Nested quantifiers
● Multiple quantifiers enable more complex sentences.
● Simplest cases: Quantifiers are of the same type
○ ∀𝑥∀𝑦 𝐵𝑟𝑜𝑡h𝑒𝑟(𝑥, 𝑦) ⇒ 𝑆𝑖𝑏𝑙𝑖𝑛𝑔(𝑥, 𝑦)
○ ∀𝑥∀𝑦 𝑆𝑖𝑏𝑙𝑖𝑛𝑔(𝑥, 𝑦) ⇔ 𝑆𝑖𝑏𝑙𝑖𝑛𝑔(𝑦, 𝑥)
● Mixtures
○ ∀𝑥∃𝑦 𝐿𝑜𝑣𝑒𝑠(𝑥, 𝑦) → “Everybody loves somebody.”
○ ∃𝑥∀𝑦 𝐿𝑜𝑣𝑒𝑠(𝑦, 𝑥) → “There is someone loved by everyone.”
● The order of quantification is therefore very important.
20
Nested quantifiers
● Rule: The variable belongs to the innermost quantifier that
mentions it.
○ ∀𝑥 (𝐶𝑟𝑜𝑤𝑛(𝑥) ∨ (∃𝑥 𝐵𝑟𝑜𝑡h𝑒𝑟(𝑅𝑖𝑐h𝑎𝑟𝑑, 𝑥))
● Workaround: Use different variable names with nested
quantifier, e.g., ∃𝑧 𝐵𝑟𝑜𝑡h𝑒𝑟(𝑅𝑖𝑐h𝑎𝑟𝑑, 𝑧)
21
Quantifier duality
● De Morgan’s rules
● For example
22
Equality symbol =
● 𝑡𝑒𝑟𝑚1 = 𝑡𝑒𝑟𝑚2 is true under a given interpretation iff 𝑡𝑒𝑟𝑚1 and
𝑡𝑒𝑟𝑚2 refer to the same object
● The negation insists that two terms are not the same.
∃𝑥, 𝑦 𝐵𝑟𝑜𝑡h𝑒𝑟(𝑥, 𝑅𝑖𝑐h𝑎𝑟𝑑) ∧ 𝐵𝑟𝑜𝑡h𝑒𝑟(𝑦, 𝑅𝑖𝑐h𝑎𝑟𝑑) ∧ ¬(𝑥 = 𝑦)
23
Practise
∀𝑚, 𝑐 𝑀𝑜𝑡h𝑒𝑟(𝑐) = 𝑚 ⇔ 𝐹𝑒𝑚𝑎𝑙𝑒(𝑚) ∧ 𝑃𝑎𝑟𝑒𝑛𝑡(𝑚, 𝑐)
∀𝑤, h 𝐻𝑢𝑠𝑏𝑎𝑛𝑑(h, 𝑤) ⇔ 𝑀𝑎𝑙𝑒(h) ∧ 𝑆𝑝𝑜𝑢𝑠𝑒(h, 𝑤)
∀𝑥 𝑀𝑎𝑙𝑒(𝑥) ⇔ ¬𝐹𝑒𝑚𝑎𝑙𝑒(𝑥)
∀𝑝,𝑐 𝑃𝑎𝑟𝑒𝑛𝑡(𝑝,𝑐) ⇔ 𝐶h𝑖𝑙𝑑(𝑐,𝑝)
∀𝑔, 𝑐 𝐺𝑟𝑎𝑛𝑑𝑝𝑎𝑟𝑒𝑛𝑡(𝑔, 𝑐) ⇔ ∃𝑝 𝑃𝑎𝑟𝑒𝑛𝑡(𝑔, 𝑝) ∧ 𝑃𝑎𝑟𝑒𝑛𝑡(𝑝, 𝑐)
∀𝑥, 𝑦 𝑆𝑖𝑏𝑙𝑖𝑛𝑔(𝑥, 𝑦) ⇔ ¬(𝑥 = 𝑦) ∧ ∃𝑝 𝑃𝑎𝑟𝑒𝑛𝑡(𝑝, 𝑥) ∧ 𝑃𝑎𝑟𝑒𝑛𝑡(𝑝,
𝑦)
24
Practise
● Diagnostic rule → infer cause from effect
○ ∀𝑠 𝐵𝑟𝑒𝑒𝑧𝑦(𝑠) ⇔ ∃𝑟 𝐴𝑑𝑗𝑎𝑐𝑒𝑛𝑡(𝑟, 𝑠) ∧ 𝑃𝑖𝑡(𝑟)
● Causal rule → infer effect from cause
○ ∀𝑟 𝑃𝑖𝑡(𝑟) ⇔ [∀𝑠 𝐴𝑑𝑗𝑎𝑐𝑒𝑛𝑡(𝑟, 𝑠) ⇒ 𝐵𝑟𝑒𝑒𝑧𝑦(𝑠)]
25
Universal Instantiation (UI)
● It is possible to infer any sentence obtained by substituting a
ground term for the variable.
● Let 𝑆𝑈𝐵𝑆𝑇(𝜃, 𝛼) be the result of applying the substitution 𝜃 to
the sentence 𝛼.
● Then the rule of Universal Instantiation is written
26
Universal Instantiation (UI)
∀𝒙 𝑲𝒊𝒏𝒈(𝒙) ∧ 𝑮𝒓𝒆𝒆𝒅𝒚(𝒙) ⇒ 𝑬𝒗𝒊𝒍(𝒙)
𝐾𝑖𝑛𝑔(𝐽𝑜h𝑛) ∧ 𝐺𝑟𝑒𝑒𝑑𝑦(𝐽𝑜h𝑛) ⇒ 𝐸𝑣𝑖𝑙(𝐽𝑜h𝑛)
𝐾𝑖𝑛𝑔(𝑅𝑖𝑐h𝑎𝑟𝑑) ∧ 𝐺𝑟𝑒𝑒𝑑𝑦(𝑅𝑖𝑐h𝑎𝑟𝑑) ⇒ 𝐸𝑣𝑖𝑙(𝑅𝑖𝑐h𝑎𝑟𝑑)
𝐾𝑖𝑛𝑔(𝐹𝑎𝑡h𝑒𝑟(𝐽𝑜h𝑛)) ∧ 𝐺𝑟𝑒𝑒𝑑𝑦(𝐹𝑎𝑡h𝑒𝑟(𝐽𝑜h𝑛)) ⇒
𝐸𝑣𝑖𝑙(𝐹𝑎𝑡h𝑒𝑟(𝐽𝑜h𝑛))
substitutions: {𝑥/𝐽𝑜h𝑛}, {𝑥/𝑅𝑖𝑐h𝑎𝑟𝑑}, {𝑥/𝐹𝑎𝑡h𝑒𝑟(𝐽𝑜h𝑛)}
27
Existential Instantiation (EI)
● It is possible to replace the variable by a single new constant
symbol.
● The rule of Existential Instantiation is written
for any sentence 𝛼, variable 𝑣, and constant symbol 𝑘 that does not
appear elsewhere in 𝐾𝐵.
𝐶𝑟𝑜𝑤𝑛(𝐶) ∧ 𝑂𝑛𝐻𝑒𝑎𝑑(𝐶, 𝐽𝑜h𝑛)
𝐶 does not appear in 𝐾𝐵 → Skolem constant.
28
Universal / Existential Instantiation
● The UI rule can be applied many times to produce different
consequences.
● The EI rule can be applied once, and then the existentially
quantified sentence is discarded.
○ The new 𝐾𝐵 is not logically equivalent to the old but shown
to be inferentially equivalent.
29
Generalized Modus Ponens (GMP)
● For atomic sentences 𝑝i, 𝑝i′ and 𝑞, where there exists 𝜃 such
that 𝑆𝑈𝐵𝑆𝑇(𝜃, 𝑝i′) = 𝑆𝑈𝐵𝑆𝑇(𝜃, 𝑝i), for all 𝑖
● For example, 𝑝1′ is 𝐾𝑖𝑛𝑔(𝐽𝑜h𝑛) 𝑝1 is 𝐾𝑖𝑛𝑔(𝑥)
𝑝2′ is 𝐺𝑟𝑒𝑒𝑑𝑦(𝑦) 𝑝2 is 𝐺𝑟𝑒𝑒𝑑𝑦(𝑥)
𝜃 is {𝑥/𝐽𝑜h𝑛, 𝑦/𝐽𝑜h𝑛} 𝑞 is 𝐸𝑣𝑖𝑙(𝑥)
𝑆𝑈𝐵𝑆𝑇(𝜃, 𝑞) is 𝐸𝑣𝑖𝑙(𝐽𝑜h𝑛)
● All variables assumed universally quantified
● A lifted version of Modus Ponens → sound inference rule 30
Unification
● Find substitutions that make different logical expressions look
identical
𝑼𝑵𝑰𝑭𝒀(𝒑, 𝒒) = 𝜽 where 𝑆𝑈𝐵𝑆𝑇(𝜃, 𝑝) = 𝑆𝑈𝐵𝑆𝑇(𝜃, 𝑞)
● For example,
p q 𝜃
Knows(John, x) Knows(John, Jane) {x/Jane}
Knows(John, x) Knows(y, Steve) {x/Steve, y/John}
Knows(John, x) Knows(y, Mother(y)) {x/Mother(John), y/John}
Knows(John, x) Knows(x, Steve) fail
● Standardizing apart eliminates overlap of variables:
Knows(z, Steve)
31
Most General Unifier (MGU)
● 𝑈𝑁𝐼𝐹𝑌(𝐾𝑛𝑜𝑤𝑠(𝐽𝑜h𝑛, 𝑥), 𝐾𝑛𝑜𝑤𝑠(𝑦, 𝑧)) = 𝜃
1. 𝜃 = {𝑦/𝐽𝑜h𝑛, 𝑥/𝑧 }
2. 𝜃 = {𝑦/𝐽𝑜h𝑛, 𝑥/𝐽𝑜h𝑛, 𝑧/𝐽𝑜h𝑛}
● The first unifier is more general than the second
● There is a single Most General Unifier (MGU) that is unique up
to renaming of variables.
𝑀𝐺𝑈 = {𝑦/𝐽𝑜h𝑛, 𝑥/𝑧}
32
Unification algorithm
function UNIFY(x , y, θ) returns a substitution to make x and y identical
inputs: x , a variable, constant, list, or compound expression
y, a variable, constant, list, or compound expression
θ, the substitution built up so far (optional, defaults to empty)
if θ = failure then return failure
else if x = y then return θ
else if VARIABLE?(x) then return UNIFY-VAR(x , y, θ)
else if VARIABLE?(y) then return UNIFY-VAR(y, x , θ)
else if COMPOUND?(x) and COMPOUND?(y) then
return UNIFY(x.ARGS, y.ARGS, UNIFY(x.OP, y.OP, θ))
else if LIST?(x) and LIST?(y) then
return UNIFY(x.REST, y.REST, UNIFY(x.FIRST, y.FIRST, θ))
else return failure
33
Unification algorithm
function UNIFY-VAR(var, x , θ) returns a substitution
if {var /val} ∈ θ then
return UNIFY(val, x , θ)
else if {x/val} ∈ θ then
return UNIFY(var, val, θ)
else if OCCUR-CHECK?(var, x) then
return failure
else return add {var/x} to θ
34
Forward Chaining
● A definite clause is a disjunctions of literals of which exactly
one is positive.
● First-order literals can include variables, which are assumed to
be universally quantified.
● Not every 𝐾𝐵 can be converted into a set of definite clauses
due to the single-positive-literal restriction.
35
Forward Chaining
● “The law says that it is a crime for an American to sell weapons
to hostile nations. The country Nono, an enemy of America,
has some missiles, and all of its missiles were sold to it by
Colonel West, who is American.”
● 𝐴𝑚𝑒𝑟𝑖𝑐𝑎𝑛(𝑥) ∧ 𝑊𝑒𝑎𝑝𝑜𝑛(𝑦) ∧ 𝑆𝑒𝑙𝑙𝑠(𝑥, 𝑦, 𝑧) ∧ 𝐻𝑜𝑠𝑡𝑖𝑙𝑒(𝑧) ⇒
𝐶𝑟𝑖𝑚𝑖𝑛𝑎𝑙(𝑥)
● ∃𝑥 𝑂𝑤𝑛𝑠(𝑁𝑜𝑛𝑜, 𝑥) ∧ 𝑀𝑖𝑠𝑠𝑖𝑙𝑒(𝑥)
○ 𝑂𝑤𝑛𝑠(𝑁𝑜𝑛𝑜,𝑀1)
○ 𝑀𝑖𝑠𝑠𝑖𝑙𝑒(𝑀1)
● 𝑀𝑖𝑠𝑠𝑖𝑙𝑒(𝑥) ∧ 𝑂𝑤𝑛𝑠(𝑁𝑜𝑛𝑜, 𝑥) ⇒ 𝑆𝑒𝑙𝑙𝑠(𝑊𝑒𝑠𝑡, 𝑥, 𝑁𝑜𝑛𝑜)
● 𝑀𝑖𝑠𝑠𝑖𝑙𝑒(𝑥) ⇒ 𝑊𝑒𝑎𝑝𝑜𝑛(𝑥)
𝐴𝑚𝑒𝑟𝑖𝑐𝑎𝑛(𝑊𝑒𝑠𝑡)
● 𝐸𝑛𝑒𝑚𝑦(𝑥,𝐴𝑚𝑒𝑟𝑖𝑐𝑎) ⇒ 𝐻𝑜𝑠𝑡𝑖𝑙𝑒(𝑥) 𝐸𝑛𝑒𝑚𝑦(𝑁𝑜𝑛𝑜, 36
Forward Chaining
37
Forward Chaining
38
Forward Chaining
● Soundness?
○ YES, every inference is just an application of GMP.
● Completeness?
○ YES for definite clause knowledge bases.
● It answers every query whose answers are entailed by any 𝐾𝐵
of definite clauses.
● Terminate for Datalog in finite number of iterations
● Datalog = first-order definite clauses + no functions
● Entailment with definite clauses is semidecidable.
39
Backward chaining
40
Backward chaining
41
Backward chaining
● Depth-first recursive proof search
○ Space is linear in size of proof
● Incomplete due to infinite loops
○ Fix by checking current goal against every goal on stack
● Inefficient due to repeated subgoals (success and failure)
○ Fix using caching of previous results (extra space)
● Widely used for logic programming
42
CNF for First-order logic
● First-order resolution requires that sentences be in CNF
● Every sentence of first-order logic can be converted into an
inferentially equivalent CNF sentence.
● For example, the sentence
∀𝑥 𝐴𝑚𝑒𝑟𝑖𝑐𝑎𝑛(𝑥) ∧ 𝑊𝑒𝑎𝑝𝑜𝑛(𝑦) ∧ 𝑆𝑒𝑙𝑙𝑠(𝑥, 𝑦, 𝑧)
∧ 𝐻𝑜𝑠𝑡𝑖𝑙𝑒(𝑧) ⇒ 𝐶𝑟𝑖𝑚𝑖𝑛𝑎𝑙(𝑥)
becomes, in CNF,
¬𝐴𝑚𝑒𝑟𝑖𝑐𝑎𝑛(𝑥) ∨ ¬𝑊𝑒𝑎𝑝𝑜𝑛(𝑦) ∨ ¬𝑆𝑒𝑙𝑙𝑠(𝑥, 𝑦, 𝑧)
∨ ¬𝐻𝑜𝑠𝑡𝑖𝑙𝑒(𝑧) ∨ 𝐶𝑟𝑖𝑚𝑖𝑛𝑎𝑙(𝑥)
43
Conversion to CNF
● Eliminate implications
● Move ¬ in wards: ¬∀𝑥 𝑝≡ ∃𝑥 ¬𝑝, ¬∃𝑥 𝑝 ≡ ∀𝑥 ¬𝑝
● Standardize variables: each quantifier uses a different one
44
Conversion to CNF
● Skolemize: remove existential quantifiers by elimination
○ Simple case: translate ∃𝑥 𝑃(𝑥) into 𝑃(𝐴), where 𝐴 is a new
constant.
■ However, ∀𝑥 [𝐴𝑛𝑖𝑚𝑎𝑙(𝐴) ∧ ¬𝐿𝑜𝑣𝑒𝑠(𝑥, 𝐴)] ∨ 𝐿𝑜𝑣𝑒𝑠(𝐵, 𝑥)
has an entirely different meaning.
○ The arguments of the Skolem function are all universally
quantified variables in whose scope the existential
quantifier appears.
● Drop universal quantifiers
45
Conversion to CNF
● Distribute ∨ over ∧
46
Resolution
● For example,
47
Resolution
48
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
49
Introduction to
Artificial Intelligence
Lecture: Bayesian Networks
Outline
● Representing Knowledge in an Uncertain Domain
● Exact Inference in Bayesian Networks
● Constructing Bayesian Networks
2
Full joint probability distribution
● The full joint probability distribution (JPD) can answer any
question about the domain.
● (Conditional) independence relationships among variables can
greatly reduce the number of probabilities required.
● Bayesian networks can represent essentially, and in many
cases very concisely, any full JPD.
○ Belief network, probabilistic network, causal network,
knowledge map
3
Bayesian networks
● A Bayesian network is a directed graph in which each node is
annotated with quantitative probability information.
● Each node presents a random discrete/continuous variable.
● A set of directed links or arrows connects pairs of nodes.
● Each node 𝑋i has a probability distribution 𝑷(𝑋i | 𝑃𝑎𝑟𝑒𝑛𝑡(𝑋i ))
that quantifies the effect of the parents on the node.
● DAG
● X and Y are parents of Z
● Y is the parent of P
4
Bayesian network topology
● The network topology defines the conditional independence
relationships that hold in the domain.
○ An arrow means that 𝑋 has a direct influence on 𝑌, which
suggests that causes should be parents of effects.
○ A domain expert decides what direct influences exist.
● Then, specify a conditional probability distribution for each
variable, given its parents.
5
BN Example
● Burglary and earthquakes directly affect the probability of the
alarm’s going off
● Whether John and Mary call depends only on the alarm.
6
Conditional probability table (CPT)
● The probabilities summarizes a potentially infinite set of
circumstances in which an event does (not) happen.
● In this way, a small agent can cope with a very large world, at
least approximately.
7
Represent the full joint distribution
● An entry in the joint distribution is the probability of a variable
assignment, such as 𝑷(𝑋1 = 𝑥1 ∧⋯∧ 𝑋n = 𝑥n).
● Thus, it is the product of the appropriate elements of the CPTs
in the Bayesian network.
8
Represent the full joint distribution
9
Represent the full joint distribution
10
Notations
● 𝑋 denotes the query variable.
● 𝑬 denotes the set of evidence variables 𝐸1,..., 𝐸m, and 𝒆 is a
particular observed event.
● 𝑌 denotes the non-evidence, non-query variables 𝑌1,... , 𝑌n
(called the hidden variables).
● Thus, the complete set of variables is 𝑿 = {𝑋} ∪ 𝐸 ∪ 𝑌.
● A typical query asks for the posterior probability 𝑷(𝑋 | 𝒆)
11
Represent the full joint distribution
12
Inference by enumeration
● A query can be answered by computing sums of products of
conditional probabilities from the Bayesian network.
○ where 𝛼 stands for the constant denominator term, which is
usually simplified during calculation.
13
Inference by enumeration
● Consider the following query
● The hidden variables are 𝐸𝑎𝑟th𝑞𝑢𝑎𝑘𝑒 and 𝐴𝑙𝑎𝑟𝑚.
● Using initial letters for the variables, we have
● For simplicity, we do this for 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = 𝑡𝑟𝑢𝑒.
● Complexity: 𝑶(𝒏𝟐𝒏) for a network with 𝑛 Boolean variables
14
Inference by enumeration
15
Construct a Bayesian network
● Scenario 1: Network structure known and all variables
observable
○ Compute only the CPT entries
● Scenario 2: Network structure known while some variables
hidden
○ Gradient descent (greedy hill-climbing) method, i.e., search
for a solution along the steepest descent of a criterion
function
● Scenario 3: Network structure unknown, all variables
observable
○ Search through the model space to reconstruct network
topology
● Scenario 4: Network structure unknown and all variables
hidden
○ No good algorithms known for this purpose
16
Construct a Bayesian network
● The Chain Rule holds for any set of random variables.
● We generally assert that, for every variable 𝑋i in the network
17
Construct a Bayesian network
● Each node must be conditionally independent of its other
predecessors in the node ordering, given its parents.
● Nodes: Identify the set of variables required to model the
domain and order them {X1,..., Xn}.
● Links: For 𝑖 = 1 to 𝑛 do
○ Choose, from {𝑋1,...,𝑋i-1}, a minimal set of parents for 𝑋
such that Equation (*) is satisfied.
○ For each parent insert a link from the parent to 𝑋𝑖.
○ CPTs: Write down the conditional probability table,
𝑃(𝑋𝑖|𝑃𝑎𝑟𝑒𝑛𝑡(𝑋𝑖))
● Intuitively, the parents of node 𝑋i should contain all those nodes
in {𝑋1,..., 𝑋i-1} that directly influence 𝑋i. 18
Construct a Bayesian network
● The network is guaranteed to be acyclic.
○ Each node is connected only to earlier nodes.
● Bayesian networks contain no redundant probability values.
○ If there is no redundancy, then there is no chance for
inconsistency.
● It is impossible for the domain expert to create a Bayesian
network that violates the axioms of probability.
19
Example
● You want to diagnose whether there is a fire in a building
● You can receive reports (possibly noisy) about whether
everyone is leaving the building.
● If everyone is leaving, this may have been caused by a fire
alarm.
● If there is a fire alarm, it may have been caused by a fire or by
tampering.
● If there is a fire, there may be smoke.
20
Example
● Start by choosing the random Boolean variables for this domain
● 𝑇𝑎𝑚𝑝𝑒𝑟𝑖𝑛𝑔 (𝑇) : the alarm has been tampered with
● 𝐹𝑖𝑟𝑒 (𝐹): there is a fire
● 𝐴𝑙𝑎𝑟𝑚 (𝐴): there is an alarm
● 𝑆𝑚𝑜𝑘𝑒 (𝑆): there is smoke
● 𝐿𝑒𝑎𝑣𝑖𝑛𝑔 (𝐿): there are lots of people leaving the building
● 𝑅𝑒𝑝𝑜𝑟𝑡 (𝑅): the sensor reports that everyone are leaving the
building.
21
Example
● Define a total ordering of variables
○ Choose an order that follows the causal sequence of events
○ Fire (F) Tampering (T) Alarm (A) Smoke (S) Leaving (L)
Report (R)
● Consider the following chain rule and use given clues to
simplify it
22
Example
23
Example
24
Example
25
Example
26
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
27
Introduction to
Artificial Intelligence
Lecture: Basic Machine Learning
Outline
● Introduction to Machine Learning
● ID3 Decision Tree Learning
● Naïve Bayesian Learning
2
Machine Learning
● Machine learning involves adaptive mechanisms that enable
computers to learn from experience, learn by example and
learn by analogy.
3
Machine Learning
4
Machine Learning
5
Supervised learning
● Learn a function that maps an input to an output based on
example input-output pairs.
● Example,
○ Spam detection: Decide which emails are spam and which
are important
○ Object detection and recognition: Localize and identify
instances of semantic objects of a certain class (e.g.,
humans, buildings, or cars) in digital images and videos.
6
Supervised learning
7
Classification vs. Regression
● Classification
○ Train a model to predict a categorical dependent variable
○ Case studies: predicting disease, classifying images,
predicting customer churn, buy or won’t buy, etc.
○ Binary classification vs. Multiclass classification vs.
Multilabel classification
● Regression
○ Train a model to predict a continuous dependent variable
○ Case studies: predicting height of children, predicting sales,
forecasting stock prices, etc.
8
Unsupervised learning
● Infer a function to describe hidden structure from "unlabeled"
data.
● Example,
○ Social network analysis: cluster users of social networks by
interest (community detection)
9
Semi-supervised learning
● The model is initially trained with a small amount of labeled
data and a large amount of unlabeled data.
10
Reinforcement learning
● The agent learns from the environment by interacting with it
and receives rewards for performing actions.
11
Machine Learning
12
Machine Learning
13
ID3 Decision Tree Learning
● The decision is based on the following attributes
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10.WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
14
ID3 Decision Tree Learning
15
ID3 Decision Tree Learning
16
ID3 Decision Tree
● Divide and conquer: Split data into smaller and smaller subsets
● Splits usually on a single variable
17
ID3 Decision Tree
● The remaining examples are all positive (or all negative)
→ DONE, it is possible to answer Yes or No
● There are some positive and some negative examples
→ choose the “best” attribute to split them
● No examples left at a branch
→ return a default value
● No attributes left but both positive and negative examples
→ return the plurality classification of remaining ones.
18
ID3 Decision Tree
function DECISION-TREE-LEARNING (𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠, 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠, 𝑝𝑎𝑟𝑒𝑛𝑡
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠) returns a tree
if 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 is empty
then return PLURALITY-VALUE(𝑝𝑎𝑟𝑒𝑛𝑡 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠)
else if all 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 have the same classification
then return the classification
else if 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 is empty
then return PLURALITY-VALUE(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠)
else
𝐴 ← 𝑎𝑟𝑔𝑚𝑎𝑥𝑎 ∈ 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠IMPORTANCE(𝑎, 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠)
𝑡𝑟𝑒𝑒 ← a new decision tree with root test A
for each value 𝑣𝑘 of A do
𝑒𝑥𝑠 ← {𝑒∶ 𝑒 ∈ 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 and 𝑒.𝐴 = 𝑣𝑘}
𝑠𝑢𝑏𝑡𝑟𝑒𝑒 ← DECISION-TREE-LEARNING(𝑒𝑥𝑠, 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠−𝐴,
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠)
add a branch to 𝑡𝑟𝑒𝑒 with label (𝐴 = 𝑣𝑘) and subtree 𝑠𝑢𝑏𝑡𝑟𝑒𝑒
19
return 𝑡𝑟𝑒𝑒
A purity measure with entropy
● Entropy is a measure of the uncertainty of a random variable 𝑉
with values 𝑣𝑘.
● 𝑣𝑘 is a class in 𝑉 (e.g., yes/no in binary classification)
● 𝑃(𝑣𝑘) is the proportion of the number of elements in class 𝑣𝑘 to
the number of elements in 𝑉.
20
A purity measure with entropy
● Entropy is maximal when all possibilities are equally likely.
● Entropy is zero in a pure “yes” (or pure “no”) node.
● Decision tree aims to decrease the entropy in each node.
21
Average Entropy
22
Information Gain
23
Information Gain
● IG(Alternate, S) = 0
● IG(Bar, S) = 0
● IG(Sat/Fri?, S) = 0.021
● IG(Hungry, S) = 0.196
● IG(Raining, S) = 0
● IG(Reservation, S) = 0.021
● IG(Patron, S) = 0.541
● IG(Price, S) = 0.23
● IG(Type, S) = 0
● IG(Est. waiting time, S) = 0.208
24
Decision Tree Learning
● Largest Information Gain (0.541) / Smallest Entropy (0.459)
achieved by splitting on Patrons.
25
Homework
26
Homework
from sklearn.tree import DecisionTreeClassifier, plot_tree
import pandas as pd
df = pd.DataFrame(
[{'W':1, 'U':0, 'S':0, 'L':0},
{'W':1, 'U':1, 'S':2, 'L':0},
{'W':0, 'U':1, 'S':1, 'L':0},
{'W':0, 'U':0, 'S':1, 'L':1},
{'W':1, 'U':0, 'S':2, 'L':1},
{'W':0, 'U':0, 'S':2, 'L':1},
]
)
clf = DecisionTreeClassifier()
clf.fit(df[['W', 'U', 'S']], df['L'])
plot_tree(clf)
27
Homework
28
Bayesian classification
● A statistical classifier performs probabilistic prediction, i.e.,
predicts class membership probabilities
○ Foundation: Based on Bayes’ Theorem
29
The Bayes’ Theorem
● Total Probability Theorem:
● Let 𝐗 be a data sample (“evidence”) with unknown class label
and 𝐻 be a hypothesis that 𝐗 belongs to class 𝐶.
● Bayes’ Theorem:
● Classification is to determine 𝑃(𝐻 | 𝐗), i.e. the probability that
the hypothesis 𝐻 holds given the observed data sample 𝐗.
30
Example data set
31
The Bayes’ Theorem
● 𝑷(𝑯) (prior probability): the initial probability
○ E.g., 𝐗 will buy computer, regardless of age, income, ...
● 𝑃(𝐗) : the probability that sample data is observed
○ E.g., 𝐗 is 31..40 and has a medium income, regardless of
the buying
● 𝑃(𝐗 | 𝐻) (likelihood): the probability of observing the sample 𝐗,
given that the hypothesis holds
○ E.g., Given that 𝐗 will buy computer, the probability that 𝐗 is
31..40 and has a medium income
● E.g., given that 𝐗 is 31..40 and has a medium income, the
probability that 𝐗 will buy computer
32
The Bayes’ Theorem
● Informally,
𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟𝑖 = 𝑙𝑖𝑘𝑒𝑙𝑖h𝑜𝑜𝑑 ∗ 𝑝𝑟𝑖𝑜𝑟 / 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
● 𝐗 belongs to 𝐶i iff the probability 𝑃(𝐶i | 𝐗) is the highest among
all the 𝑃(𝐶𝑘 | 𝐗) for all the 𝑘 classes.
33
Classification with Bayes’ Theorem
● Let 𝐷 be a training set of tuples and associated class labels
● Each tuple is represented by a 𝑛-attribute 𝐗 = (𝑥1, 𝑥2, ... , 𝑥𝑛)
● Suppose there are 𝑚 classes 𝐶1, 𝐶2,..., 𝐶m
● Classification is to derive the maximum posteriori 𝑃(𝐶i | 𝐗) from
Bayes’ theorem
● 𝑃(𝑋) is constant for all classes, only 𝑃(𝐗|𝐶i)𝑃(𝐶i) needs to be
maximized.
34
Naïve Bayesian classifier
● Class-conditional independence: There are no dependence
relationships among the attributes.
● The naïve Bayesian classification formula is written as
35
Naive Bayesian for the example dataset
36
Naive Bayesian for the example dataset
37
Avoiding the zero-probability problem
38
Avoiding the zero-probability problem
● Laplacian correction (or Laplacian estimator)
○ where 𝑚 is the number of classes, |𝑥k ∪ 𝐶i| denotes the
number of tuples contains both 𝐴k = 𝑥k and 𝐶i, and 𝑟 is the
number of values of attribute 𝐴𝑘
● The “corrected” probability estimates are close to their
“uncorrected” counterparts.
39
Naive Bayesian for the example dataset
40
Naive Bayesian for the example dataset
41
Handling missing values
● If the values of some attributes are missing, these attributes are
omitted from the product of probabilities
● As a result, the estimation is less accurate
● For example,
42
Homework
43
References
● Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A
Modern Approach (3rd ed.). Prentice Hall Press, Upper Saddle
River, NJ, USA.
● Lê Hoài Bắc, Tô Hoài Việt. 2014. Giáo trình Cơ sở Trí tuệ nhân
tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa học Tự
nhiên, ĐHQG-HCM.
● Nguyễn Ngọc Thảo, Nguyễn Hải Minh. 2020. Bài giảng Cơ sở
Trí tuệ Nhân tạo. Khoa Công nghệ Thông tin. Trường ĐH Khoa
học Tự nhiên, ĐHQG-HCM.
● Thanh-An NGUYEN, "3D-Xception-UNet: An Improved
Lightweight U-Net Variant for 3D Seismic Fault Segmentation,"
in Proceedings of the 2024 9th International Conference on
Intelligent Information Technology, 2024, pp. 197–202.
44