0% found this document useful (0 votes)
87 views311 pages

Knowledge Representation in AI

Uploaded by

chhavi tomar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views311 pages

Knowledge Representation in AI

Uploaded by

chhavi tomar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 311

KNOWLEDGE REPRESENTATION

KNOWLEDGE REPRESENTATION
• For the purpose of solving complex problems encountered in AI, we need both a large amount of
knowledge and some mechanism for manipulating that knowledge to create solutions to new
problems. A variety of ways of representing knowledge (facts) have been exploited in AI programs.
• In all variety of knowledge representations , we deal with two kinds of entities.
– A. Facts: Truths in some relevant world. These are the things we want to represent.
– B. Representations of facts in some chosen formalism . these are things we will actually be able to
manipulate.

One way to think of structuring these entities


is at two levels :
(a) the knowledge level, at which facts are
described, and
(b) the symbol level, at which
representations of objects at the
knowledge level are defined in terms of
symbols that can be manipulated by
programs.
The facts and representations are linked with
two-way mappings. This link is called
representation mappings. The forward
representation mapping maps from facts to
representations. The backward representation
mapping goes the other way, from
• One common representation is natural language (particularly English) sentences.
Regardless of the representation for facts we use in a program , we may also need
to be concerned with an English representation of those facts in order to facilitate
getting information into and out of the system.
• We need mapping functions from English sentences to the representation we
actually use and from it back to sentences.
• The Knowledge Representation models/mechanisms are often based on:
– Logic
– Rules
– Frames
– Semantic Net
Knowledge is categorized into two major types:
1. Tacit corresponds to “informal” or “implicit“
• Exists within a human being;
• It is embodied.
• Difficult to articulate formally.
• Difficult to communicate or share.
• Moreover, Hard to steal or copy.
• Drawn from experience, action, subjective insight
2. Explicit formal type of knowledge, Explicit
• Explicit knowledge
• Exists outside a human being;
• It is embedded.
• Can be articulated formally.
• Also, Can be shared, copied, processed and stored.
• So, Easy to steal or copy
• Drawn from the artifact of some type as a principle, procedure, process, concepts.
• A variety of ways of representing knowledge have been exploited in AI programs.
• There are two different kinds of entities, we are dealing with.
– Facts: Truth in some relevant world. Things we want to represent.
– Also, Representation of facts in some chosen formalism. Things we will actually be able to
manipulate.
• These entities structured at two levels:
– The knowledge level, at which facts described.
– Moreover, The symbol level, at which representation of objects defined in terms of symbols that
can manipulate by programs
Framework of Knowledge Representation
• The computer requires a well-defined problem description to process and provide a well
defined acceptable solution.
• Moreover, To collect fragments of knowledge we need first to formulate a description in our
spoken language and then represent it in formal language so that computer can understand.
• Also, The computer can then use an algorithm to compute an answer.
• So, This process illustrated in Fig,
• The steps are:
– The informal formalism of the problem takes place first.
– It then represented formally and the computer produces an output.
– This output can then represented in an informally described solution that user understands or checks
for consistency.
• The Problem solving requires,
– Formal knowledge representation, and
– Moreover, Conversion of informal knowledge to a formal knowledge that is the
– conversion of implicit knowledge to explicit knowledge.
Mapping between Facts and Representation
• Knowledge is a collection of facts from some domain.
• Also, We need a representation of “facts“ that can manipulate by a
program.
• Moreover, Normal English is insufficient, too hard currently for a
computer program to draw inferences in natural languages.
• Thus some symbolic representation is necessary.
A good knowledge representation enables fast and accurate access to
knowledge and understanding of the content. A knowledge representation
system should have following properties.
– Representational Adequacy: The ability to represent all kinds of knowledge
that are needed in that domain.
– Inferential Adequacy: Also, The ability to manipulate the representational
structures to derive new structures corresponding to new knowledge
inferred from old.
– Inferential Efficiency: The ability to incorporate additional information into
the knowledge structure that can be used to focus the attention of the
inference mechanisms in the most promising direction.
– Acquisitional Efficiency: Moreover, The ability to acquire new knowledge
using automatic methods wherever possible rather than reliance on human
intervention.
Knowledge Representation Schemes

Relational Knowledge
– The simplest way to represent declarative facts is a set of relations of
the same sort used in the database system.
– Provides a framework to compare two objects based on equivalent
attributes. Any instance in which two different objects are compared is
a relational type of knowledge.
– The table below shows a simple way to store facts.
• Also, The facts about a set of objects are put systematically in columns.
• This representation provides little opportunity for inference.
– Given the facts, it is not possible to answer a simple question such as:
“Who is the heaviest player?”
– Also, But if a procedure for finding the heaviest player is provided,
then these facts will enable that procedure to compute an answer.
– Moreover, We can ask things like who “bats – left” and “throws –
right”.
Inheritable Knowledge

• Here the knowledge elements inherit attributes from their parents.


• The knowledge embodied in the design hierarchies found in the functional,
physical and process domains.
• Within the hierarchy, elements inherit attributes from their parents, but in
many cases, not all attributes of the parent elements prescribed to the child
elements.
• Also, The inheritance is a powerful form of inference, but not adequate.
• Moreover, The basic KR (Knowledge Representation) needs to augment with
inference mechanism.
• Property inheritance: The objects or elements of specific classes inherit
attributes and values from more general classes.
• So, The classes organized in a generalized hierarchy.
• Boxed nodes — objects and values of
attributes of objects.
• Arrows — the point from object to its
value.
• This structure is known as a slot and
filler structure, semantic network or
a collection of frames.
• The steps to retrieve a value for an
attribute of an instance object:
1. Find the object in the knowledge base
2. If there is a value for the attribute
report it
3. Otherwise look for a value of an
instance, if none fail
4. Also, Go to that node and find a value
for the attribute and then report it
5. Otherwise, search through using is
until a value is found for the attribute.
Inferential Knowledge
• This knowledge generates new information from the given information.
• This new information does not require further data gathering form source
but does require analysis of the given information to generate new
knowledge.
• Example: given a set of relations and values, one may infer other values or
relations. A predicate logic (a mathematical deduction) used to infer from a
set of attributes. Moreover, Inference through predicate logic uses a set of
logical operations to relate individual data.
• Represent knowledge as formal logic:
All dogs have tails ∀x: dog(x) → hastail(x)
• Advantages:
– A set of strict rules.
– Can use to derive more facts.
– Also, Truths of new statements can be verified.
– Guaranteed correctness.
– So, Many inference procedures available to implement standard rules of logic
popular in AI systems. e.g Automated theorem proving.
Procedural Knowledge
• A representation in which the control information, to use the knowledge,
embedded in the knowledge itself. For example, computer programs,
directions, and recipes; these indicate specific use or implementation;
• Moreover, Knowledge encoded in some procedures, small programs that
know how to do specific things, how to proceed.
• Advantages:
– Heuristic or domain-specific knowledge can represent.
– Moreover, Extended logical inferences, such as default reasoning facilitated.
– Also, Side effects of actions may model. Some rules may become false in
time. Keeping track of this in large systems may be tricky.
• Disadvantages:
– Completeness — not all cases may represent.
– Consistency — not all deductions may be correct. e.g If we know that Fred is
a bird we might deduce that Fred can fly. Later we might discover that Fred
is an emu.
– Modularity sacrificed. Changes in knowledge base might have far-reaching
effects.
– Cumbersome control information.
Predicate
Representation of Simple Facts in Logic

• Propositional logic is useful because it is simple to


deal with and a decision procedure for it exists.
• Also, In order to draw conclusions, facts are
represented in a more convenient way as,
• 1. Marcus is a man. man(Marcus)
• 2. Marcus is a mortal. mortal(Marcus)
• 3. All men are mortal. mortal(men)
• But propositional logic fails to capture the
relationship between an individual being a man
and that individual being a mortal
• How can these sentences be represented so that
we can infer the third sentence from the first
two?
• Also, Propositional logic commits only to the
existence of facts that may or may not be the
case in the world being represented.
• Moreover, It has a simple syntax and simple
semantics. It suffices to illustrate the process of
inference.
• Propositional logic quickly becomes impractical,
even for very small worlds.
Predicate logic

• First-order Predicate logic (FOPL) models the


world in terms of
• Objects, which are things with individual
identities
• Properties of objects that distinguish them from
other objects
• Relations that hold among sets of objects
• Functions, which are a subset of relations where
there is only one “value” for any given “input”
First-order Predicate logic (FOPL)
• First-order Predicate logic (FOPL) provides
– Constants: a, b, dog33. Name a specific object.
– Variables: X, Y. Refer to an object without naming it.
– Functions: Mapping from objects to objects.
– Terms: Refer to objects
– Atomic Sentences: in(dad-of(X), food6) Can be true or false, Correspond to
propositional symbols P, Q.
• A well-formed formula (wff) is a sentence containing no “free” variables.
• So, That is, all variables are “bound” by universal or existential quantifiers.
• (∀x) router (x, y) has x bound as a universally quantified variable, but y is free.
• Constant symbols, which represent individuals in the world
– Mary
– 3
– Green
• Function symbols, which map individuals to individuals
– father-of(Mary) = John
– color-of(Sky) = Blue
• Predicate symbols, which map individuals to truth values
– greater(5,3)
– green(Grass)
– color(Grass, Green)
Quantifiers
Universal quantification
• (∀x) P(x) means that P holds for all values of x in the domain associated with that
variable
• E.g., (∀x) dolphin(x) → mammal(x)

Existential quantification
• (∃ x)P(x) means that P holds for some value of x in the domain associated with that
variable
• E.g., (∃ x) mammal(x) ∧ lays-eggs(x)
Also, Consider the following example that shows the use of predicate logic as a way of
representing knowledge.
1. Marcus was a man. P Q P->Q
2. Marcus was a Pompeian. T F F
3. All Pompeians were Romans. T T T
4. Caesar was a ruler. F T T
5. Also, All Pompeians were either loyal to Caesar or hated him.
F F T
6. Everyone is loyal to someone.
7. People only try to assassinate rulers they are not loyal to.
8. Marcus tried to assassinate Caesar.
Sentences are built from terms and atoms

• A term (denoting a real-world individual) is a constant symbol, a


variable symbol, or an n-place function of n terms.
x and f(x1, ..., xn) are terms, where each xi is a term. A term with no variables
is a ground term
• An atomic sentence (which has value true or false) is an n-place
predicate of n terms
• A complex sentence is formed from atomic sentences connected
by the logical connectives: P, PQ, PQ, PQ, PQ where P and Q
are sentences
• A quantified sentence adds quantifiers  and 
• A well-formed formula (wff) is a sentence containing no “free”
variables. That is, all variables are “bound” by universal or
existential quantifiers.
(x)P(x,y) has x bound as a universally quantified variable, but y is free.
Quantifiers
• Universal quantifiers are often used with “implies” to form “rules”:
(x) student(x)  smart(x) means “All students are smart”
• Universal quantification is rarely used to make blanket statements
about every individual in the world:
(x) student(x)  smart(x) means “Everyone in the world is a student and is
smart”
• Existential quantifiers are usually used with “and” to specify a list
of properties about an individual:
(x) student(x)  smart(x) means “There is a student who is smart”
• A common mistake is to represent this English sentence as the FOL
sentence:
(x) student(x)  smart(x)
– But what happens when there is a person who is not a student?
Quantifier Scope
• Switching the order of universal quantifiers does not
change the meaning:
– (x)(y)P(x,y) ↔ (y)(x) P(x,y)
• Similarly, you can switch the order of existential
quantifiers:
– (x)(y)P(x,y) ↔ (y)(x) P(x,y)
• Switching the order of universals and existential does
change meaning:
– Everyone likes someone: (x)(y) likes(x, y)
– Someone is liked by everyone: (y)(x) likes(x, y)
Connections between All and Exists

We can relate sentences involving  and 


using De Morgan’s laws:
(x) P(x) ↔ (x) P(x)
(x) P(x) ↔ (x) P(x)
(x) P(x) ↔  (x) P(x)
(x) P(x) ↔ (x) P(x)
Quantified inference rules
• Universal instantiation
– x P(x)  P(A)
• Universal generalization
– P(A)  P(B) …  x P(x)
• Existential instantiation
– x P(x) P(S1)  skolem constant S1
• Existential generalization
– P(A)  x P(x)
Universal instantiation
(a.k.a. universal elimination)
• If (x) P(x) is true, then P(C) is true, where C is
any constant in the domain of x
• Example:
(x) eats(Ziggy, x)  eats(Ziggy, IceCream)
• The variable symbol can be replaced by any
ground term, i.e., any constant symbol or
function symbol applied to ground terms only
Existential instantiation
(a.k.a. existential elimination)
• From (x) P(x) infer P(c)
• Example:
– (x) eats(Ziggy, x) = eats(Ziggy, IceCrem)
• Note that the variable is replaced by a brand-new
constant not occurring in this or any other sentence in
the KB
• Also known as skolemization; constant is a skolem
constant
• In other words, we don’t want to accidentally draw
other inferences about it by introducing the constant
• Convenient to use this to reason about the unknown
object, rather than constantly manipulating the
existential quantifier
Existential generalization
(a.k.a. existential introduction)
• If P(c) is true, then (x) P(x) is inferred.
• Example
eats(Ziggy, IceCream)  (x) eats(Ziggy, x)
• All instances of the given constant symbol are
replaced by the new variable symbol
• Note that the variable symbol cannot already
exist anywhere in the expression
• Model: an interpretation of a set of sentences
such that every sentence is True
• A sentence is
– satisfiable if it is true under some interpretation
– valid if it is true under all possible interpretations
– inconsistent if there does not exist any interpretation under which
the sentence is true
• Logical consequence: S |= X if all models of S are
also models of X
Translating English to FOL
Every gardener likes the sun.
x gardener(x)  likes(x, Sun)
You can fool some of the people all of the time.
x t person(x) time(t)  can-fool(x,t)
You can fool all of the people some of the time.
x t (person(x)  time(t) can-fool(x,t))
x (person(x)  t (time(t) can-fool(x,t)))
Equivalent
All purple mushrooms are poisonous.
x (mushroom(x)  purple(x))  poisonous(x)
No purple mushroom is poisonous.
x purple(x)  mushroom(x)  poisonous(x)
x (mushroom(x)  purple(x))  poisonous(x) Equivalent
There are exactly two purple mushrooms.
x y mushroom(x)  purple(x)  mushroom(y)  purple(y) ^ (x=y)  z
(mushroom(z)  purple(z))  ((x=z)  (y=z))
Clinton is not tall.
tall(Clinton)
X is above Y iff X is on directly on top of Y or there is a pile of one or more other
objects directly on top of one another starting with X and ending with Y.
x y above(x,y) ↔ (on(x,y)  z (on(x,z)  above(z,y)))
The facts described by these sentences can be represented as a
set of well-formed formulas (wffs) as follows

1. Marcus was a man. man(Marcus)


2. Marcus was a Pompeian. Pompeian(Marcus)
3. All Pompeians were Romans. ∀x: Pompeian(x) → Roman(x)
4. Caesar was a ruler. ruler(Caesar)
5. All Roman were either loyal to Caesar or hated him.
1. inclusive-or
2. ∀x: Roman(x) → loyalto(x, Caesar) ∨ hate(x, Caesar)
3. exclusive-or
4. ∀x: Roman(x) → (loyalto(x, Caesar) ∧¬ hate(x, Caesar)) ∨ (¬loyalto(x,
Caesar) ∧ hate(x, Caesar))
6. Everyone is loyal to someone. ∀x: ∃y: loyalto(x, y)
7. People only try to assassinate rulers they are not loyal to.
∀x: ∀y: person(x) ∧ ruler(y) ∧ tryassassinate(x, y) →¬loyalto(x, y)
8. Marcus tried to assassinate Caesar. tryassassinate(Marcus, Caesar)
9. ∀x : man(x) → person(x)
• Now suppose if we want to use these statements
to answer the question: Was Marcus loyal to
Caesar?
• Also, Now let’s try to produce a formal proof,
reasoning backward from the desired goal:
• ¬Ioyalto(Marcus, Caesar)
• In order to prove the goal, we need to use the
rules of inference to transform it into another
goal (or possibly a set of goals) that can, in turn,
transformed, and so on, until there are no
unsatisfied goals remaining.
¬Ioyalto(Marcus, Caesar)

MARCUS/X, CAESAR/Y

BACKWARD CHAINING

Man(Marcus) (9)
• The problem is that, although we know that Marcus was a
man, we do not have any way to conclude from that that
Marcus was a person.
• Also, We need to add the representation of another fact to our
system, namely: ∀ man(x) → person(x)
• Now we can satisfy the last goal and produce a proof that
Marcus was not loyal to Caesar.
• Moreover, From this simple example, we see that three
important issues must be addressed in the process of
converting English sentences into logical statements and then
using those statements to deduce new ones:
1. Many English sentences are ambiguous (for example, 5, 6, and 7
above). Choosing the correct interpretation may be difficult.
2. Also, There is often a choice of how to represent the knowledge.
Simple representations are desirable, but they may exclude
certain kinds of reasoning.
3. Similarly, Even in very simple situations, a set of sentences is
unlikely to contain all the information necessary to reason about
the topic at hand. In order to be able to use a set of statements
effectively. Moreover, It is usually necessary to have access to
another set of statements that represent facts that people
consider too obvious to mention.
Computable Functions and Predicates
• To express simple facts, such as the following
greater-than and less-than relationships:
gt(1,0) It(0,1) gt(2,1) It(1,2) gt(3,2) It( 2,3)
• It is often also useful to have computable
functions as well as computable predicates.
• Thus we might want to be able to evaluate the
truth of gt(2 + 3,1)
• To do so requires that we first compute the
value of the plus function given the arguments
2 and 3, and then send the arguments 5 and 1
to gt.
• Consider the following set of facts, again involving
Marcus:
1. Marcus was a man. man(Marcus)
2. Marcus was a Pompeian. Pompeian(Marcus)
3. Marcus was born in 40 A.D. born(Marcus, 40)
4. All men are mortal. x: man(x) → mortal(x)
5. All Pompeians died when the volcano erupted in 79
A.D. erupted(volcano, 79) ∧ ∀ x : [Pompeian(x) →
died(x, 79)]
6. No mortal lives longer than 150 years.
∀ x: ∀ t1: ∀ t2: mortal(x) ∧ born(x, t1) ∧ gt(t2 – t1,150) →
died(x, t2)
7. It is now 1991. now = 1991
So, Above example shows how these ideas of computable
functions and predicates can be useful.
• It also makes use of the notion of equality and allows equal
objects to be substituted for each other whenever it appears
helpful to do so during a proof.
• So, Now suppose we want to answer the question “Is Marcus
alive?”
• The statements suggested here, there may be two ways of
deducing an answer.
• Either we can show that Marcus is dead because he was killed
by the volcano or we can show that he must be dead because
he would otherwise be more than 150 years old, which we
know is not possible.
• Also, As soon as we attempt to follow either of those paths
rigorously, however, we discover, just as we did in the last
example, that we need some additional knowledge.
• For example, our statements talk about dying, but they say
nothing that relates to being alive, which is what the question
is asking.
So we add the following facts:
8) Alive means not dead.
∀ x: ∀t: [alive(x, t) → ¬ dead(x, t)]= [¬ dead(x, t) → alive(x, t)]
9) If someone dies, then he is dead at all later times.
∀ x: ∀ t1: ∀ t2: died(x, t1) ∧ gt(t2, t1) → dead(x, t2)
So, Now let’s attempt to answer the question “Is Marcus alive?” by
proving: ¬ alive(Marcus, now)
Resolution
Propositional Resolution
1. Convert all the propositions of F to clause form.
2. Negate P and convert the result to clause form. Add it to
the set of clauses obtained in step 1.
3. Repeat until either a contradiction is found or no progress
can be made:
1. Select two clauses. Call these the parent clauses.
2. Resolve them together. The resulting clause, called the
resolvent, will be the disjunction of all of the literals of both of
the parent clauses with the following exception: If there are
any pairs of literals L and ¬ L such that one of the parent
clauses contains L and the other contains ¬L, then select one
such pair and eliminate both L and ¬ L from the resolvent.
3. If the resolvent is the empty clause, then a contradiction has
been found. If it is not, then add it to the set of classes
available to the procedure.
Algorithm: Convert to Clause Form
1. Eliminate →, using the fact that a → b is equivalent to ¬ a V b.
Performing this transformation on the wff given above yields
∀x: ¬ [Roman(x) ^ know(x, Marcus)] V [hate(x, Caesar) V (∀y : ¬(Ǝz : hate(y, z)) V thinkcrazy(x,
y))]
2. Reduce the scope of each ¬ to a single term, using the fact that
¬ (¬ p) = p, deMorgan’s laws [which say that ¬ (a ^ b) = ¬ a V ¬ b and ¬ (a V b) = ¬ a ^ ¬
b ],
and the standard correspondences between quantifiers
[¬ ∀x: P(x) = Ǝx: ¬ P(x) and ¬ Ǝx: P(x) = Vx: ¬P(x)].
Performing this transformation on the wff from step 1 yields
∀x: [¬ Roman(x) V ¬ know(x, Marcus)] V [hate(x, Caesar) V (∀y: Ǝz: ¬ hate(y, z) V thinkcrazy(x,
y))]
3. Standardize variables so that each quantifier binds a unique variable. Since
variables are just dummy names, this process cannot affect the truth value of the wff.
For example, the formula ∀x: P(x) V ∀x: Q(x) would be converted to ∀x: P(x) V ∀y:
Q(y)
4. Move all quantifiers to the left of the formula without changing
their relative order. This is possible since there is no conflict among
variable names. Performing this operation on the formula of step 2, we
get
∀x: ∀y: ∀z: [¬ Roman(x) V ¬ know(x, Marcus)] V [hate(x, Caesar) V (¬
hate(y, z) V thinkcrazy(x, y))]
At this point, the formula is in what is known as prenex normal form. It
consists of a prefix of quantifiers followed by a matrix, which is
quantifier-free.
5.Eliminate existential quantifiers
A formula that contains an existentially quantified variable asserts that
there is a value that can be substituted for the variable that makes the
formula true. We can eliminate the quantifier by substituting for the
variable a reference to a function that produces the desired value. Since
we do not necessarily know how to produce the value, we must create
a new function name for every such replacement. We make no
assertions about these functions except that they must exist.
So, for example, the formula
Ǝy: President(y) can be transformed into the formula
President(S1)
Where, S1 is a function with no arguments that somehow
produces a value that satisfies President. If existential quantifiers
occur within the scope of universal quantifiers, then the value that
satisfies the predicate may depend on the values of the universally
quantified variables. For example, in the formula
∀x: Ǝy: father-of(y, x)

The value of y that satisfies father-of depends on the particular


value of x. Thus we must generate functions with the same
number of arguments as the number of universal quantifiers in
whose scope the expression occurs. So this example would be
transformed into ∀x: father-of(S2(x), x) These generated functions
are called Skolem functions. Sometimes ones with no arguments
are called Skolem constants
6. Drop the prefix At this point, all remaining variables are
universally quantified, so the prefix can just be dropped
and any proof procedure we use can simply assume that
any variable it sees is universally quantified.
Now the formula produced in step 4 appears as
[¬ Roman(x) V ¬ know(x, Marcus)] V [hate(x, Caesar) V (¬
hate(y, z) V thinkcrazy(x, y))]
7. Convert the matrix into a conjunction of disjuncts
In the case or our example, since there are no and’s, it is
only necessary to exploit the associative property of or
[ i.e., (a V (b V c) = (a V b ) V c]]
and simply remove the parentheses, giving
¬ Roman(x) V ¬ know(x, Marcus) V hate(x, Caesar) V ¬
hate(y, z) V thinkcrazy(x, y)
8. Create a separate clause corresponding to each conjunct
In order for a wff to be true, all the clauses that are generated from it must be
true.
9. Standardize apart the variables in the set of clauses generated in step 8.
By this we mean rename the variables so that no two clauses make reference to
the same variable. In making this transformation, we rely on the fact that
(∀x: P(x) ^ Q(x)) = ∀x: P(x) ^ ∀x: Q(x) Thus since each clause is a separate
conjunct and since all the variables are universally quantified, there need be no
relationship between the variables of two clauses, even if they were generated
from the same wff.

Performing this final step of standardization is important because during the


resolution procedure it is sometimes necessary to instantiate a universally
quantified variable (i.e., substitute for it a particular value).
But, in general, we want to keep clauses in their most general form as long as
possible. So when a variable is instantiated, we want to know the minimum
number of substitutions that must be made to preserve the truth value of the
system.
The Unification Algorithm
• In propositional logic, it is easy to determine that two
literals cannot both be true at the same time.
• Simply look for L and ¬L in predicate logic, this
matching process is more complicated since the
arguments of the predicates must be considered.
• For example, man(John) and ¬man(John) is a
contradiction, while the man(John) and ¬man(Spot) is
not.
• Thus, in order to determine contradictions, we need a
matching procedure that compares two literals and
discovers whether there exists a set of substitutions
that makes them identical.
• There is a straightforward recursive procedure, called
the unification algorithm, that does it.
Algorithm: Unify(L1, L2)
1. If L1 or L2 are both variables or constants, then:
1. If L1 and L2 are identical, then return NIL.
2. Else if L1 is a variable, then if L1 occurs in L2 then return {FAIL}, else return
(L2/L1).
3. Also, Else if L2 is a variable, then if L2 occurs in L1 then return {FAIL}, else return
(L1/L2). Else return {FAIL}.
2. If the initial predicate symbols in L1 and L2 are not identical, then return {FAIL}.
3. If LI and L2 have a different number of arguments, then return {FAIL}.
4. Set SUBST to NIL. (At the end of this procedure, SUBST will contain all the substitutions
used to unify L1 and L2.)
5. For I ← 1 to the number of arguments in L1 :
1. Call Unify with the ith argument of L1 and the ith argument of L2, putting
the result in S.
2. If S contains FAIL then return {FAIL}.
3. If S is not equal to NIL then:
1. Apply S to the remainder of both L1 and L2.
2. SUBST: = APPEND(S, SUBST).
6. Return SUBST.
Resolution in Predicate Logic
We can now state the resolution algorithm for predicate logic as follows, assuming
a set of given statements F and a statement to be proved P:
Algorithm: Resolution
1. Convert all the statements of F to clause form.
2. Negate P and convert the result to clause form. Add it to the set of clauses
obtained in 1.
3. Repeat until a contradiction found, no progress can make, or a
predetermined amount of effort has expanded.
1. Select two clauses. Call these the parent clauses.
2. Resolve them together. The resolvent will the disjunction of all the
literals of both parent clauses with appropriate substitutions
performed and with the following exception: If there is one pair of
literals T1 and ¬T2 such that one of the parent clauses contains T2
and the other contains T1 and if T1 and T2 are unifiable, then
neither T1 nor T2 should appear in the resolvent. We call T1 and
T2 Complementary literals. Use the substitution produced by the
unification to create the resolvent. If there is more than one pair
of complementary literals, only one pair should omit from the
resolvent.
3. If the resolvent is an empty clause, then a contradiction has
found. Moreover, If it is not, then add it to the set of classes
available to the procedure.
Introduction
• The rule-based system in AI bases choices or
inferences on established rules.
• These laws are frequently expressed in human-
friendly language, such as "if X is true, then Y is
true," to make them easier for readers to
comprehend.
• Expert and decision support systems are only two
examples of the many applications in which rule-
based systems have been employed.
What is a Rule-based System?
• A system that relies on a collection of predetermined rules to decide what to
do next is known as a rule-based system in AI.
• These laws are predicated on several circumstances and deeds. For instance,
if a patient has a fever, the doctor may recommend antibiotics because the
patient may have an infection.

• Expert systems,
decision support
systems, and chatbots
are examples of apps
that use rule-based
systems.
Characteristics of Rule-based Systems in AI

The following are some of the primary traits of the rule-based system
in AI:
• The rules are written simply for humans to comprehend, making
rule-based systems simple to troubleshoot and maintain.
• Given a set of inputs, rule-based systems will always create the
same output, making them predictable and dependable. This
property is known as determinism.
• A rule-based system in AI is transparent because the standards are
clear and open to human inspection, which makes it simpler to
comprehend how the system operates.
• A rule-based system in AI is scalable. When scaled up, large
quantities of data can be handled by rule-based systems.
• Rule-based systems can be modified or updated more
easily because the rules can be divided into smaller components
How does a Rule-based System Work?
• A rule-based system in AI generates an output by
using a collection of inputs and a set of rules.
• The system first determines which principles
apply to the inputs.
• If a rule is applicable, the system executes the
corresponding steps to generate the output.
• If no guideline is applicable, the system might
generate a default output or ask the user for
more details.
How does a Rule-based System Work?
• This is a real success story of AI – tens of thousands of working
systems deployed into many aspects of life
• Terms: a knowledge-based systems are really anything that stores
and uses explicit knowledge (at first, only RBSs, later other kinds of
system);
• a rule-base system (or production system) is a KBS in which the
knowledge is stored as rules;
• an expert system is a RBSs in which the rules come from human
experts in a particular domain
• Can be used for problem-solving, or for control (e.g., of animated
motion)
• In an RBS, the knowledge is separated from the AI reasoning
processes, which means that new RBSs are easy to create
• An RBS can be fast, flexible, and easy to expand
Main Components of a Rule-based
System
Main Components of a Rule-based
System
• Working memory – a small allocation of memory into which
only appropriate rules are copied
• Rulebase – the rules themselves, possibly stored specially to
facilitate efficient access to the antecedent
• Interpreter – The processing engine which carries out
reasoning on the rules and derives an answer
• An expert system will likely also have an Explanation Facility -
keeps track of the chain of reasoning leading to the current
answer. This can be queried for an explanation of how the
answer was deduced.
• Thus unlike many other problem-solving AI modules, an
expert system does not have to be a “black box” i.e., it can be
transparent about how it arrives at answers
Rule Based Systems – Forward Chaining
• To reason by forward chaining, the interpreter follows a recognize-
act cycle which begins from a set of initial assertions (input states
set in the working memory) and repeatedly applies rules until it
arrives at a solution (data driven processing)

Match – identify all rules whose antecedent clauses match the initial
assertions
Conflict resolution – if more than one rule matches, choose one
Execution – the consequent clause of the chosen rule is applied, usually updating the
state of working memory, finally generating output
Rule Based Systems - Backward
Chaining
• In backward chaining, the interpreter begins from a known goal state and
tries to find a logically supported path back to one of the initial assertions
(goal-directed inference)

Match – identify all rules whose consequent clauses match the working memory state
(initially the hypothesized goal state)
Conflict resolution – if more than one rule matches, choose one
Execution – the state of working memory is updated to reflect the antecedent clause of
the matched rule
Rule Based Systems – Conflict
Resolution
• The interpreter must choose one path to examine at a time, and so
needs a method of conflict resolution in case of multiple rules
• Methods include:
– first come, gets served: choose the first rule that applies
– rules are ordered into priorities (by expert): choose the highest rank
– most specific rule is chosen: e.g., apply the rule with most elements in
antecedent
– choose a rule at random, and use that
– keep a historical trace, and allow this to chose a different rule next
time
• If the first chosen rule leads to a dead end, it should be possible to
try another, provided a trace is kept
• Note that this is another example of search through a graph of
possibilities
Rule Based Systems - Development
• The process of encoding human
knowledge into an expert
system is called knowledge
elicitation
• It can be quite difficult to elicit
expert knowledge from experts
– they might not know how they
know
– they are very busy people
– can’t get their attention for long
– some knowledge might be
difficult to codify as rules
– different experts might disagree
on facts and methods
– facts and methods may change
=> need to maintain the system
• Programmer may customize an
expert system shell
How to Create a Rule-based System?
• The following actions are required to develop a rule-based system:
• Determine the issue:
Decide what issue needs to be resolved by a rule-based system.
• Establish the rules:
Establish a collection of guidelines that can be used to address the issue.
The laws ought to be founded on professional expertise or data analysis.
• Implement the rules:
In a rule-based structure, implement the rules. Software tools that
enable the development and administration of rule-based systems can
be used for this.
• Test and evaluate:
Verify that the rule-based system in AI operates as intended. Take stock
of how it's performing and make any required modifications.
Examples of Rule-based Systems
• Healthcare, finance, and engineering are just a few examples of the sectors
and applications that use rule-based systems. Following are some instances of
a rule-based system in AI:
• Medical Diagnosis:
Based on a patient's symptoms, medical history, and test findings, a rule-
based system in AI can make a diagnosis. The system can make a diagnosis by
adhering to a series of guidelines developed by medical professionals.
• Fraud Detection:
Based on particular criteria, such as the transaction's value, location, and time
of day, a rule-based system in AI can be used to spot fraudulent transactions.
The system, for the additional examination, can then flag the transaction.
• Quality Control:
A rule-based system in AI can ensure that products satisfy particular quality
standards. Based on a set of guidelines developed by quality experts, the
system can check for flaws.
• Decision support systems:
They are created to aid decision-making, such as choosing which assets to buy
or what to buy.
Advantages of Rule-based Systems in
AI
• Transparency and Explainability:
Because the rules are openly established, rule-based systems are
transparent and simple to comprehend. This makes it simpler for
programmers to comprehend and adjust the system and for users to
comprehend the rationale behind particular actions.
• Efficiency:
Rule-based systems work quickly and effectively since they don't need a
lot of data or intricate algorithms to function. Instead, they merely
conclude by applying rules to a specific scenario.
• Accuracy:
Because they rely on a set of clear rules and logical inferences, rule-
based systems have the potential to be very accurate. The system will
produce the right outcomes if the rules are written correctly.
• Flexibility:
Rule-based systems are updated and modified by adding or modifying
the rules. Because of this, they can easily adjust to new situations or
knowledge.
Distributional Semantics - Introduction
Distributional Hypothesis
Distributional Semantics : a linguistic perspective

referential meaning is about the direct connection between language and the
world, while differential meaning is about the distinctions and relationships
between words within the language system

Referential vs differential
Distribution Semantics: a cognitive perspective
Distributional Semantic Models (DSMs)
Distributional Semantics: The general intuition
Vector Space
Word Space
Constructing Word Spaces
Constructing Word Spaces: distributional
semantics
Comparing Similarities
Vector space model without distributional
similarity
Building dsm step-by-step
Many design choices
Many design choices
Document as context: word X document
word as context: word X word
Word as context
Context weighting: documents as context
Context weighting: word as context
Pointwise mutual information: PMI
PMI: Issues and Variations
Distributional vectors example
Application to Query Expansion: Addressing
term Mismatch
Query Expansion using Unstructured
DSMs
Similarity Measures for Binary Vectors
Similarity Measures for Vector Spaces
Similarity Measures for Probability Distribution
Attributional Similarity vs Relational Similarity
Relational Similarity: Pair-Pattern matrix
Structured DSMs
Structured DSMs
Structured DSMs
2-way Matrix
Structured DSMs for Selectional
Preferences
Structured DSMs for Selectional
Preferences
Word Vectors
Word Vectors: One-hot encodings
Limitations of one-hot encoding
Word2vec : a distributional
representation
Distributional representation
Distributional representation:
illustration
illustration
Reasoning with word Vector
Reasoning with word Vector: vector
offset for gender relation
vector offset for singular-plural
relationship
Encoding other Dimensions of
Similarity
Analogy Testing
Word Analogy
Learning word vectors
Two Variants
CBOW
CBOW : Input to Hidden
Hidden to Output Vector
Skip Gram Model
𝜕𝐽(𝜃)
𝜃𝑗 = 𝜃𝑗 -𝛼
𝜕𝜃𝑗
Final = v+𝑣 ′
GloVe
• Global Vectors for Word Representation, or GloVe for short, is an
unsupervised learning algorithm that generates vector representations, or
embeddings, of words.
• By using the statistical co-occurrence data of words in a given corpus, GloVe
is intended to capture the semantic relationships between words.
• The fundamental concept underlying GloVe is the representation of words as
vectors in a continuous vector space, where the angle and direction of the
vectors correspond to the semantic connections between the appropriate
words.
• To do this, GloVe builds a co-occurrence matrix using word pairs and then
optimizes the word vectors to minimize the difference between the pointwise
mutual information (PMI) of the corresponding words and the dot product of
vectors.
• Word embedding: word embedding is the technique to convert each word
into an equivalent float vector. Various techniques exist depending on the use
case of the model and dataset. Some of the techniques are One Hot
Encoding, TF-IDF, Word2Vec
GloVe
'the': [-0.123, 0.353, 0.652, -0.232]
'the' is very often used word in texts of any kind.
its equivalent 4-dimension dense vector has been given.

• The glove has pre-defined dense vectors for around every 6 billion words
of English literature along with many other general-use characters like
commas, braces, and semicolons.
• The algorithm’s developers frequently make the pre-trained GloVe
embeddings available.
• It is not necessary to train the model from scratch when using these pre-
trained embeddings, which can be downloaded and used immediately in a
variety of natural language processing (NLP) applications.
• Users can select a pre-trained GloVe embedding in a dimension (e.g., 50-d,
100-d, 200-d, or 300-d vectors) that best fits their needs in terms of
computational resources and task specificity.
GloVe
• The creation of a word co-occurrence matrix is the
fundamental component of GloVe.
• This matrix provides a quantitative measure of the semantic
affinity between words by capturing the frequency with which
they appear together in a given context.
• Next, by minimizing the difference between the dot product
of vectors and the pointwise mutual information of
corresponding words, GloVe optimises word vectors.
• GloVe is able to produce dense vector representations that
capture syntactic and semantic relationships thanks to its
innovative methodology.
GloVe Embeddings Example
Feature Extraction techniques
• NLP is the ability of computers to understand human
language.
• Need of feature extraction techniques Machine Learning
algorithms learn from a pre-defined set of features from
the training data to produce output for the test data.
• But the main problem in working with language processing
is that machine learning algorithms cannot work on the raw
text directly.
• So, we need some feature extraction techniques to convert
text into a matrix(or vector) of features. Some of the most
popular methods of feature extraction are :
– Bag-of-Words
– TF-IDF
Bag of Words

The bag of words model is used for text representation and
feature extraction
• It represents a text document as a multiset of its words,
disregarding grammar and word order, but keeping the
frequency of words.
• This representation is useful for tasks such as text
classification, document similarity, and text clustering.
• Bag-of-Words is one of the most fundamental methods to
transform tokens into a set of features.
• The BoW model is used in document classification, where
each word is used as a feature for training the classifier.
Bag of Words
• For example, in a task of review based sentiment
analysis, the presence of words like ‘fabulous’,
‘excellent’ indicates a positive review, while words
like ‘annoying’, ‘poor’ point to a negative review .
• There are 3 steps while creating a BoW model :
• The first step is text-preprocessing which involves:
– converting the entire text into lower case characters.
– removing all punctuations and unnecessary symbols.
• The second step is to create a vocabulary of all unique
words from the corpus. Let’s suppose, we have a hotel
review text.
• Let’s consider 3 of these reviews,
which are as follows :
– good movie
– not a good movie
– did not like
• Now, we consider all the unique words
from the above set of reviews to
create a vocabulary, which is going to
be as follows :
{good, movie, not, a, did, like}
• In the third step, we create a matrix of
features by assigning a separate
column for each word, while each row
corresponds to a review. This process
is known as Text Vectorization.
• Each entry in the matrix signifies the
presence(or absence) of the word in
the review. We put 1 if the word is
present in the review, and 0 if it is not
present.

• A major drawback in
using this model is that
the order of occurrence
of words is lost, as we
create a vector of
tokens in randomized
order. However, we can
solve this problem by
considering N-
grams(mostly bigrams)
instead of individual
words(i.e. unigrams).
This can preserve local
ordering of words. If we
consider all possible
bigrams from the given
reviews, the above
table would look like:
• However, this table will come out to be very large, as there can be a lot
of possible bigrams by considering all possible consecutive word pairs.
• Using N-grams can result in a huge sparse(has a lot of 0’s) matrix, if the
size of the vocabulary is large, making the computation really complex!!
• We have to remove a few N-grams based on their frequency.
• Like, we can always remove high-frequency N-grams, because they
appear in almost all documents.
• These high-frequency N-grams are generally articles, determiners, etc.
most commonly called as StopWords.
• Similarly, we can also remove low frequency N-grams because these are
really rare(i.e. generally appear in 1 or 2 reviews)!! These types of N-
grams are generally typos(or typing mistakes).
• Generally, medium frequency N-grams are considered as the most ideal.
• However, there are some N-grams which are really rare in our corpus but
can highlight a specific issue.
• Let’s suppose, there is a review that says – “Wi-Fi breaks often”.
• Here, the N-gram ‘Wi-Fi breaks can’t be too frequent, but it highlights a
major problem that needs to be looked upon.
• Our BoW model would not capture such N-grams since its frequency is
really low. To solve this type of problem, we need another model i.e. TF-
IDF
Issues of Bag of Words
• High dimensionality: The resulting feature space can be very high
dimensional; lead to issues with overfitting and computational efficiency.
• Lack of context information: The bag of words model only considers the
frequency of words, disregarding grammar, word order, and context.
• Insensitivity to word associations: The bag of words model doesn’t consider
the associations between words, and the semantic relationships between
words in a document.
• Lack of semantic information: As the bag of words model only considers
individual words, it does not capture semantic relationships or the meaning of
words in context.
• Importance of stop words: Stop words, such as “the”, “and”, “a”, etc., can have
a large impact on the bag of words representation of a document, even though
they may not carry much meaning.
• Sparsity: For many applications, the bag of words representation of a
document can be very sparse, meaning that most entries in the resulting
feature vector will be zero. This can lead to issues with computational
efficiency and difficulty in interpretability.
TF-IDF
• TF-IDF (Term Frequency-Inverse Document Frequency) is a
statistical measure used for information retrieval and natural
language processing tasks.
• It reflects the importance of a word in a document relative to
an entire corpus.
• The basic idea is that a word that occurs frequently in a
document but rarely in the entire corpus is more informative
than a word that occurs frequently in both the document and
the corpus.
• TF-IDF is used for:
1. Text retrieval and information retrieval systems
2. Document classification and text categorization
3. Text summarization
4. Feature extraction for text data in machine learning
algorithms.
TF-IDF
• It highlights a specific issue which might not be too frequent in our
corpus but holds great importance. The TF–IFD value increases
proportionally to the number of times a word appears in the document
and decreases with the number of documents in the corpus that contain
the word. It is composed of 2 sub-parts, which are :
• Term Frequency (TF)
• Inverse Document Frequency (IDF)
Term Frequency(TF) : Term frequency specifies how frequently a term
appears in the entire document.
It can be thought of as the probability of finding a word within the
document.
It calculates the number of times a word occurs in a review , with respect to
the total number of words in the review .It is formulated as:
Inverse Document Frequency(IDF)
• The inverse document frequency is a measure of
whether a term is rare or frequent across the
documents in the entire corpus.
• It highlights those words which occur in very few
documents across the corpus, or in simple language,
the words that are rare have high IDF score.
• IDF is a log normalized value, that is obtained by
dividing the total number of documents in the corpus
by the number of documents containing the term , and
taking the logarithm of the overall term.
tf-idf
Issues of TF-IDF
• High dimensionality: The resulting feature space can be very
high dimensional, which may lead to issues with overfitting
and computational efficiency.
• Lack of context information: TF-IDF only considers the
frequency of words in a document, disregarding the context
and meaning of words.
• Domain dependence: The results of TF-IDF can be domain-
specific, as the frequency and importance of words can vary
greatly depending on the domain of the text.
• Insensitivity to word associations: TF-IDF doesn’t consider the
associations between words, and the semantic relationships
between words in a document.
• Lack of semantic information: As TF-IDF only considers
individual words, it does not capture semantic relationships or
the meaning of words in context.
Ontology
Ontology in CSE/AI
Ontology in CSE/AI
Ontology vs Knowledgebase
Ontology vs Knowledgebase
Types of Ontologies
Lexical Analysis
Relations in Lexicons
Issues
Zeugma Test
Antonyms
Hyponymy & Hypernymy
Entailment and Transitivity
Meronyny
WordNet
Synsets in WordNet
Lemma vs. Synsets
All Relations in WordNet
Noun and Verb Relations
WorldNet Hierarchies
Word Similarity
Two Classes of Algorithms
Thesaurus based word Similarity
Path based similarity
Similarity
Leacock –Chodorow LC Similarity
Concept Probability
Information Contents
Resnik Similarity
JC Similarity
Stemming
Stemming vs. Lemmatization
• For grammatical reasons, documents are going to use different forms of a word, such
as organize, organizes, and organizing.
• Additionally, there are families of derivationally related words with similar meanings,
such as democracy, democratic, and democratization.
• In many situations, it seems as if it would be useful for a search for one of these
words to return documents that contain another word in the set.
• The goal of both stemming and lemmatization is to reduce inflectional forms and
sometimes derivationally related forms of a word to a common base form. For
instance:
• am, are, is => be
• car, cars, car's, cars' => car
The result of this mapping of text will be something like:
• the boy's cars are different colors => the boy car be differ color
Stemming vs. Lemmatization
• However, the two words differ in their flavor.
• Stemming usually refers to a crude heuristic process that chops off the ends of words in
the hope of achieving this goal correctly most of the time, and often includes the removal
of derivational affixes.
• Lemmatization usually refers to doing things properly with the use of a vocabulary and
morphological analysis of words, normally aiming to remove inflectional endings only and
to return the base or dictionary form of a word, which is known as the lemma .
• If confronted with the token saw, stemming might return just s, whereas lemmatization
would attempt to return either see or saw depending on whether the use of the token
was as a verb or a noun.
• The two may also differ in that stemming most commonly collapses derivationally related
words, whereas lemmatization commonly only collapses the different inflectional forms of
a lemma.
• Linguistic processing for stemming or lemmatization is often done by an additional plug-in
component to the indexing process, and a number of such components exist, both
commercial and open-source.
Stemming vs. Lemmatization
• The most common algorithm for stemming English, and one that has repeatedly
been shown to be empirically very effective, is Porter's algorithm.
• The entire algorithm is too long and intricate to present here, but we will
indicate its general nature. Its an assignment.
• Porter's algorithm consists of 5 phases of word reductions, applied sequentially.
• Within each phase there are various conventions to select rules, such as
selecting the rule from each rule group that applies to the longest suffix.
Stemming vs. Lemmatization
• Stemmers use language-specific rules, but they require less
knowledge than a lemmatizer, which needs a complete
vocabulary and morphological analysis to correctly lemmatize
words.
• Particular domains may also require special stemming rules.
• However, the exact stemmed form does not matter, only the
equivalence classes it forms.
• Rather than using a stemmer, you can use a lemmatizer , a tool
from Natural Language Processing which does full morphological
analysis to accurately identify the lemma for each word.
• Doing full morphological analysis produces at most very modest
benefits for retrieval. It is hard to say more, because either form
of normalization tends not to improve English information
retrieval performance in aggregate - at least not by very much.
Stemming vs. Lemmatization
• While it helps a lot for some queries, it equally hurts performance
a lot for others. Stemming increases recall while harming
precision. As an example of what can go wrong, note that the
Porter stemmer stems all of the following words:
• operate operating operates operation operative operatives
operationalto oper.
• However, since operate in its various forms is a common verb, we
would expect to lose considerable precision on queries such as
the following with Porter stemming:
operational and research
operating and system
operative and dentistry
Stemming vs. Lemmatization
• For a case like this, moving to using a lemmatizer would not
completely fix the problem because particular inflectional forms
are used in particular collocations: a sentence with the
words operate and system is not a good match for the query
operating and system.
• Getting better value from term normalization depends more on
pragmatic issues of word use than on formal issues of linguistic
morphology.
Autoencoders
Language Modeling with RNN
Image Representation using CNN
75 Parameters
Local
Information by
01 Neuron
through single
filter.
New image 28x28x6
VGG 16 Conv_1 VGG 16 Conv_2 VGG 16 Conv_3
VGG 16 Conv_1
Activation MAP stretched
Credits : Stanford University
I/P to O/P
Dimensions
(N-F)/stride + 1
(N-F)/stride + 1
(N-F)/stride + 1
Analogue
Credits : Stanford University

You might also like