Knowledge Representation in AI
Knowledge Representation in AI
KNOWLEDGE REPRESENTATION
• For the purpose of solving complex problems encountered in AI, we need both a large amount of
knowledge and some mechanism for manipulating that knowledge to create solutions to new
problems. A variety of ways of representing knowledge (facts) have been exploited in AI programs.
• In all variety of knowledge representations , we deal with two kinds of entities.
– A. Facts: Truths in some relevant world. These are the things we want to represent.
– B. Representations of facts in some chosen formalism . these are things we will actually be able to
manipulate.
Relational Knowledge
– The simplest way to represent declarative facts is a set of relations of
the same sort used in the database system.
– Provides a framework to compare two objects based on equivalent
attributes. Any instance in which two different objects are compared is
a relational type of knowledge.
– The table below shows a simple way to store facts.
• Also, The facts about a set of objects are put systematically in columns.
• This representation provides little opportunity for inference.
– Given the facts, it is not possible to answer a simple question such as:
“Who is the heaviest player?”
– Also, But if a procedure for finding the heaviest player is provided,
then these facts will enable that procedure to compute an answer.
– Moreover, We can ask things like who “bats – left” and “throws –
right”.
Inheritable Knowledge
Existential quantification
• (∃ x)P(x) means that P holds for some value of x in the domain associated with that
variable
• E.g., (∃ x) mammal(x) ∧ lays-eggs(x)
Also, Consider the following example that shows the use of predicate logic as a way of
representing knowledge.
1. Marcus was a man. P Q P->Q
2. Marcus was a Pompeian. T F F
3. All Pompeians were Romans. T T T
4. Caesar was a ruler. F T T
5. Also, All Pompeians were either loyal to Caesar or hated him.
F F T
6. Everyone is loyal to someone.
7. People only try to assassinate rulers they are not loyal to.
8. Marcus tried to assassinate Caesar.
Sentences are built from terms and atoms
MARCUS/X, CAESAR/Y
BACKWARD CHAINING
Man(Marcus) (9)
• The problem is that, although we know that Marcus was a
man, we do not have any way to conclude from that that
Marcus was a person.
• Also, We need to add the representation of another fact to our
system, namely: ∀ man(x) → person(x)
• Now we can satisfy the last goal and produce a proof that
Marcus was not loyal to Caesar.
• Moreover, From this simple example, we see that three
important issues must be addressed in the process of
converting English sentences into logical statements and then
using those statements to deduce new ones:
1. Many English sentences are ambiguous (for example, 5, 6, and 7
above). Choosing the correct interpretation may be difficult.
2. Also, There is often a choice of how to represent the knowledge.
Simple representations are desirable, but they may exclude
certain kinds of reasoning.
3. Similarly, Even in very simple situations, a set of sentences is
unlikely to contain all the information necessary to reason about
the topic at hand. In order to be able to use a set of statements
effectively. Moreover, It is usually necessary to have access to
another set of statements that represent facts that people
consider too obvious to mention.
Computable Functions and Predicates
• To express simple facts, such as the following
greater-than and less-than relationships:
gt(1,0) It(0,1) gt(2,1) It(1,2) gt(3,2) It( 2,3)
• It is often also useful to have computable
functions as well as computable predicates.
• Thus we might want to be able to evaluate the
truth of gt(2 + 3,1)
• To do so requires that we first compute the
value of the plus function given the arguments
2 and 3, and then send the arguments 5 and 1
to gt.
• Consider the following set of facts, again involving
Marcus:
1. Marcus was a man. man(Marcus)
2. Marcus was a Pompeian. Pompeian(Marcus)
3. Marcus was born in 40 A.D. born(Marcus, 40)
4. All men are mortal. x: man(x) → mortal(x)
5. All Pompeians died when the volcano erupted in 79
A.D. erupted(volcano, 79) ∧ ∀ x : [Pompeian(x) →
died(x, 79)]
6. No mortal lives longer than 150 years.
∀ x: ∀ t1: ∀ t2: mortal(x) ∧ born(x, t1) ∧ gt(t2 – t1,150) →
died(x, t2)
7. It is now 1991. now = 1991
So, Above example shows how these ideas of computable
functions and predicates can be useful.
• It also makes use of the notion of equality and allows equal
objects to be substituted for each other whenever it appears
helpful to do so during a proof.
• So, Now suppose we want to answer the question “Is Marcus
alive?”
• The statements suggested here, there may be two ways of
deducing an answer.
• Either we can show that Marcus is dead because he was killed
by the volcano or we can show that he must be dead because
he would otherwise be more than 150 years old, which we
know is not possible.
• Also, As soon as we attempt to follow either of those paths
rigorously, however, we discover, just as we did in the last
example, that we need some additional knowledge.
• For example, our statements talk about dying, but they say
nothing that relates to being alive, which is what the question
is asking.
So we add the following facts:
8) Alive means not dead.
∀ x: ∀t: [alive(x, t) → ¬ dead(x, t)]= [¬ dead(x, t) → alive(x, t)]
9) If someone dies, then he is dead at all later times.
∀ x: ∀ t1: ∀ t2: died(x, t1) ∧ gt(t2, t1) → dead(x, t2)
So, Now let’s attempt to answer the question “Is Marcus alive?” by
proving: ¬ alive(Marcus, now)
Resolution
Propositional Resolution
1. Convert all the propositions of F to clause form.
2. Negate P and convert the result to clause form. Add it to
the set of clauses obtained in step 1.
3. Repeat until either a contradiction is found or no progress
can be made:
1. Select two clauses. Call these the parent clauses.
2. Resolve them together. The resulting clause, called the
resolvent, will be the disjunction of all of the literals of both of
the parent clauses with the following exception: If there are
any pairs of literals L and ¬ L such that one of the parent
clauses contains L and the other contains ¬L, then select one
such pair and eliminate both L and ¬ L from the resolvent.
3. If the resolvent is the empty clause, then a contradiction has
been found. If it is not, then add it to the set of classes
available to the procedure.
Algorithm: Convert to Clause Form
1. Eliminate →, using the fact that a → b is equivalent to ¬ a V b.
Performing this transformation on the wff given above yields
∀x: ¬ [Roman(x) ^ know(x, Marcus)] V [hate(x, Caesar) V (∀y : ¬(Ǝz : hate(y, z)) V thinkcrazy(x,
y))]
2. Reduce the scope of each ¬ to a single term, using the fact that
¬ (¬ p) = p, deMorgan’s laws [which say that ¬ (a ^ b) = ¬ a V ¬ b and ¬ (a V b) = ¬ a ^ ¬
b ],
and the standard correspondences between quantifiers
[¬ ∀x: P(x) = Ǝx: ¬ P(x) and ¬ Ǝx: P(x) = Vx: ¬P(x)].
Performing this transformation on the wff from step 1 yields
∀x: [¬ Roman(x) V ¬ know(x, Marcus)] V [hate(x, Caesar) V (∀y: Ǝz: ¬ hate(y, z) V thinkcrazy(x,
y))]
3. Standardize variables so that each quantifier binds a unique variable. Since
variables are just dummy names, this process cannot affect the truth value of the wff.
For example, the formula ∀x: P(x) V ∀x: Q(x) would be converted to ∀x: P(x) V ∀y:
Q(y)
4. Move all quantifiers to the left of the formula without changing
their relative order. This is possible since there is no conflict among
variable names. Performing this operation on the formula of step 2, we
get
∀x: ∀y: ∀z: [¬ Roman(x) V ¬ know(x, Marcus)] V [hate(x, Caesar) V (¬
hate(y, z) V thinkcrazy(x, y))]
At this point, the formula is in what is known as prenex normal form. It
consists of a prefix of quantifiers followed by a matrix, which is
quantifier-free.
5.Eliminate existential quantifiers
A formula that contains an existentially quantified variable asserts that
there is a value that can be substituted for the variable that makes the
formula true. We can eliminate the quantifier by substituting for the
variable a reference to a function that produces the desired value. Since
we do not necessarily know how to produce the value, we must create
a new function name for every such replacement. We make no
assertions about these functions except that they must exist.
So, for example, the formula
Ǝy: President(y) can be transformed into the formula
President(S1)
Where, S1 is a function with no arguments that somehow
produces a value that satisfies President. If existential quantifiers
occur within the scope of universal quantifiers, then the value that
satisfies the predicate may depend on the values of the universally
quantified variables. For example, in the formula
∀x: Ǝy: father-of(y, x)
• Expert systems,
decision support
systems, and chatbots
are examples of apps
that use rule-based
systems.
Characteristics of Rule-based Systems in AI
The following are some of the primary traits of the rule-based system
in AI:
• The rules are written simply for humans to comprehend, making
rule-based systems simple to troubleshoot and maintain.
• Given a set of inputs, rule-based systems will always create the
same output, making them predictable and dependable. This
property is known as determinism.
• A rule-based system in AI is transparent because the standards are
clear and open to human inspection, which makes it simpler to
comprehend how the system operates.
• A rule-based system in AI is scalable. When scaled up, large
quantities of data can be handled by rule-based systems.
• Rule-based systems can be modified or updated more
easily because the rules can be divided into smaller components
How does a Rule-based System Work?
• A rule-based system in AI generates an output by
using a collection of inputs and a set of rules.
• The system first determines which principles
apply to the inputs.
• If a rule is applicable, the system executes the
corresponding steps to generate the output.
• If no guideline is applicable, the system might
generate a default output or ask the user for
more details.
How does a Rule-based System Work?
• This is a real success story of AI – tens of thousands of working
systems deployed into many aspects of life
• Terms: a knowledge-based systems are really anything that stores
and uses explicit knowledge (at first, only RBSs, later other kinds of
system);
• a rule-base system (or production system) is a KBS in which the
knowledge is stored as rules;
• an expert system is a RBSs in which the rules come from human
experts in a particular domain
• Can be used for problem-solving, or for control (e.g., of animated
motion)
• In an RBS, the knowledge is separated from the AI reasoning
processes, which means that new RBSs are easy to create
• An RBS can be fast, flexible, and easy to expand
Main Components of a Rule-based
System
Main Components of a Rule-based
System
• Working memory – a small allocation of memory into which
only appropriate rules are copied
• Rulebase – the rules themselves, possibly stored specially to
facilitate efficient access to the antecedent
• Interpreter – The processing engine which carries out
reasoning on the rules and derives an answer
• An expert system will likely also have an Explanation Facility -
keeps track of the chain of reasoning leading to the current
answer. This can be queried for an explanation of how the
answer was deduced.
• Thus unlike many other problem-solving AI modules, an
expert system does not have to be a “black box” i.e., it can be
transparent about how it arrives at answers
Rule Based Systems – Forward Chaining
• To reason by forward chaining, the interpreter follows a recognize-
act cycle which begins from a set of initial assertions (input states
set in the working memory) and repeatedly applies rules until it
arrives at a solution (data driven processing)
Match – identify all rules whose antecedent clauses match the initial
assertions
Conflict resolution – if more than one rule matches, choose one
Execution – the consequent clause of the chosen rule is applied, usually updating the
state of working memory, finally generating output
Rule Based Systems - Backward
Chaining
• In backward chaining, the interpreter begins from a known goal state and
tries to find a logically supported path back to one of the initial assertions
(goal-directed inference)
Match – identify all rules whose consequent clauses match the working memory state
(initially the hypothesized goal state)
Conflict resolution – if more than one rule matches, choose one
Execution – the state of working memory is updated to reflect the antecedent clause of
the matched rule
Rule Based Systems – Conflict
Resolution
• The interpreter must choose one path to examine at a time, and so
needs a method of conflict resolution in case of multiple rules
• Methods include:
– first come, gets served: choose the first rule that applies
– rules are ordered into priorities (by expert): choose the highest rank
– most specific rule is chosen: e.g., apply the rule with most elements in
antecedent
– choose a rule at random, and use that
– keep a historical trace, and allow this to chose a different rule next
time
• If the first chosen rule leads to a dead end, it should be possible to
try another, provided a trace is kept
• Note that this is another example of search through a graph of
possibilities
Rule Based Systems - Development
• The process of encoding human
knowledge into an expert
system is called knowledge
elicitation
• It can be quite difficult to elicit
expert knowledge from experts
– they might not know how they
know
– they are very busy people
– can’t get their attention for long
– some knowledge might be
difficult to codify as rules
– different experts might disagree
on facts and methods
– facts and methods may change
=> need to maintain the system
• Programmer may customize an
expert system shell
How to Create a Rule-based System?
• The following actions are required to develop a rule-based system:
• Determine the issue:
Decide what issue needs to be resolved by a rule-based system.
• Establish the rules:
Establish a collection of guidelines that can be used to address the issue.
The laws ought to be founded on professional expertise or data analysis.
• Implement the rules:
In a rule-based structure, implement the rules. Software tools that
enable the development and administration of rule-based systems can
be used for this.
• Test and evaluate:
Verify that the rule-based system in AI operates as intended. Take stock
of how it's performing and make any required modifications.
Examples of Rule-based Systems
• Healthcare, finance, and engineering are just a few examples of the sectors
and applications that use rule-based systems. Following are some instances of
a rule-based system in AI:
• Medical Diagnosis:
Based on a patient's symptoms, medical history, and test findings, a rule-
based system in AI can make a diagnosis. The system can make a diagnosis by
adhering to a series of guidelines developed by medical professionals.
• Fraud Detection:
Based on particular criteria, such as the transaction's value, location, and time
of day, a rule-based system in AI can be used to spot fraudulent transactions.
The system, for the additional examination, can then flag the transaction.
• Quality Control:
A rule-based system in AI can ensure that products satisfy particular quality
standards. Based on a set of guidelines developed by quality experts, the
system can check for flaws.
• Decision support systems:
They are created to aid decision-making, such as choosing which assets to buy
or what to buy.
Advantages of Rule-based Systems in
AI
• Transparency and Explainability:
Because the rules are openly established, rule-based systems are
transparent and simple to comprehend. This makes it simpler for
programmers to comprehend and adjust the system and for users to
comprehend the rationale behind particular actions.
• Efficiency:
Rule-based systems work quickly and effectively since they don't need a
lot of data or intricate algorithms to function. Instead, they merely
conclude by applying rules to a specific scenario.
• Accuracy:
Because they rely on a set of clear rules and logical inferences, rule-
based systems have the potential to be very accurate. The system will
produce the right outcomes if the rules are written correctly.
• Flexibility:
Rule-based systems are updated and modified by adding or modifying
the rules. Because of this, they can easily adjust to new situations or
knowledge.
Distributional Semantics - Introduction
Distributional Hypothesis
Distributional Semantics : a linguistic perspective
referential meaning is about the direct connection between language and the
world, while differential meaning is about the distinctions and relationships
between words within the language system
Referential vs differential
Distribution Semantics: a cognitive perspective
Distributional Semantic Models (DSMs)
Distributional Semantics: The general intuition
Vector Space
Word Space
Constructing Word Spaces
Constructing Word Spaces: distributional
semantics
Comparing Similarities
Vector space model without distributional
similarity
Building dsm step-by-step
Many design choices
Many design choices
Document as context: word X document
word as context: word X word
Word as context
Context weighting: documents as context
Context weighting: word as context
Pointwise mutual information: PMI
PMI: Issues and Variations
Distributional vectors example
Application to Query Expansion: Addressing
term Mismatch
Query Expansion using Unstructured
DSMs
Similarity Measures for Binary Vectors
Similarity Measures for Vector Spaces
Similarity Measures for Probability Distribution
Attributional Similarity vs Relational Similarity
Relational Similarity: Pair-Pattern matrix
Structured DSMs
Structured DSMs
Structured DSMs
2-way Matrix
Structured DSMs for Selectional
Preferences
Structured DSMs for Selectional
Preferences
Word Vectors
Word Vectors: One-hot encodings
Limitations of one-hot encoding
Word2vec : a distributional
representation
Distributional representation
Distributional representation:
illustration
illustration
Reasoning with word Vector
Reasoning with word Vector: vector
offset for gender relation
vector offset for singular-plural
relationship
Encoding other Dimensions of
Similarity
Analogy Testing
Word Analogy
Learning word vectors
Two Variants
CBOW
CBOW : Input to Hidden
Hidden to Output Vector
Skip Gram Model
𝜕𝐽(𝜃)
𝜃𝑗 = 𝜃𝑗 -𝛼
𝜕𝜃𝑗
Final = v+𝑣 ′
GloVe
• Global Vectors for Word Representation, or GloVe for short, is an
unsupervised learning algorithm that generates vector representations, or
embeddings, of words.
• By using the statistical co-occurrence data of words in a given corpus, GloVe
is intended to capture the semantic relationships between words.
• The fundamental concept underlying GloVe is the representation of words as
vectors in a continuous vector space, where the angle and direction of the
vectors correspond to the semantic connections between the appropriate
words.
• To do this, GloVe builds a co-occurrence matrix using word pairs and then
optimizes the word vectors to minimize the difference between the pointwise
mutual information (PMI) of the corresponding words and the dot product of
vectors.
• Word embedding: word embedding is the technique to convert each word
into an equivalent float vector. Various techniques exist depending on the use
case of the model and dataset. Some of the techniques are One Hot
Encoding, TF-IDF, Word2Vec
GloVe
'the': [-0.123, 0.353, 0.652, -0.232]
'the' is very often used word in texts of any kind.
its equivalent 4-dimension dense vector has been given.
• The glove has pre-defined dense vectors for around every 6 billion words
of English literature along with many other general-use characters like
commas, braces, and semicolons.
• The algorithm’s developers frequently make the pre-trained GloVe
embeddings available.
• It is not necessary to train the model from scratch when using these pre-
trained embeddings, which can be downloaded and used immediately in a
variety of natural language processing (NLP) applications.
• Users can select a pre-trained GloVe embedding in a dimension (e.g., 50-d,
100-d, 200-d, or 300-d vectors) that best fits their needs in terms of
computational resources and task specificity.
GloVe
• The creation of a word co-occurrence matrix is the
fundamental component of GloVe.
• This matrix provides a quantitative measure of the semantic
affinity between words by capturing the frequency with which
they appear together in a given context.
• Next, by minimizing the difference between the dot product
of vectors and the pointwise mutual information of
corresponding words, GloVe optimises word vectors.
• GloVe is able to produce dense vector representations that
capture syntactic and semantic relationships thanks to its
innovative methodology.
GloVe Embeddings Example
Feature Extraction techniques
• NLP is the ability of computers to understand human
language.
• Need of feature extraction techniques Machine Learning
algorithms learn from a pre-defined set of features from
the training data to produce output for the test data.
• But the main problem in working with language processing
is that machine learning algorithms cannot work on the raw
text directly.
• So, we need some feature extraction techniques to convert
text into a matrix(or vector) of features. Some of the most
popular methods of feature extraction are :
– Bag-of-Words
– TF-IDF
Bag of Words
•
The bag of words model is used for text representation and
feature extraction
• It represents a text document as a multiset of its words,
disregarding grammar and word order, but keeping the
frequency of words.
• This representation is useful for tasks such as text
classification, document similarity, and text clustering.
• Bag-of-Words is one of the most fundamental methods to
transform tokens into a set of features.
• The BoW model is used in document classification, where
each word is used as a feature for training the classifier.
Bag of Words
• For example, in a task of review based sentiment
analysis, the presence of words like ‘fabulous’,
‘excellent’ indicates a positive review, while words
like ‘annoying’, ‘poor’ point to a negative review .
• There are 3 steps while creating a BoW model :
• The first step is text-preprocessing which involves:
– converting the entire text into lower case characters.
– removing all punctuations and unnecessary symbols.
• The second step is to create a vocabulary of all unique
words from the corpus. Let’s suppose, we have a hotel
review text.
• Let’s consider 3 of these reviews,
which are as follows :
– good movie
– not a good movie
– did not like
• Now, we consider all the unique words
from the above set of reviews to
create a vocabulary, which is going to
be as follows :
{good, movie, not, a, did, like}
• In the third step, we create a matrix of
features by assigning a separate
column for each word, while each row
corresponds to a review. This process
is known as Text Vectorization.
• Each entry in the matrix signifies the
presence(or absence) of the word in
the review. We put 1 if the word is
present in the review, and 0 if it is not
present.
•
• A major drawback in
using this model is that
the order of occurrence
of words is lost, as we
create a vector of
tokens in randomized
order. However, we can
solve this problem by
considering N-
grams(mostly bigrams)
instead of individual
words(i.e. unigrams).
This can preserve local
ordering of words. If we
consider all possible
bigrams from the given
reviews, the above
table would look like:
• However, this table will come out to be very large, as there can be a lot
of possible bigrams by considering all possible consecutive word pairs.
• Using N-grams can result in a huge sparse(has a lot of 0’s) matrix, if the
size of the vocabulary is large, making the computation really complex!!
• We have to remove a few N-grams based on their frequency.
• Like, we can always remove high-frequency N-grams, because they
appear in almost all documents.
• These high-frequency N-grams are generally articles, determiners, etc.
most commonly called as StopWords.
• Similarly, we can also remove low frequency N-grams because these are
really rare(i.e. generally appear in 1 or 2 reviews)!! These types of N-
grams are generally typos(or typing mistakes).
• Generally, medium frequency N-grams are considered as the most ideal.
• However, there are some N-grams which are really rare in our corpus but
can highlight a specific issue.
• Let’s suppose, there is a review that says – “Wi-Fi breaks often”.
• Here, the N-gram ‘Wi-Fi breaks can’t be too frequent, but it highlights a
major problem that needs to be looked upon.
• Our BoW model would not capture such N-grams since its frequency is
really low. To solve this type of problem, we need another model i.e. TF-
IDF
Issues of Bag of Words
• High dimensionality: The resulting feature space can be very high
dimensional; lead to issues with overfitting and computational efficiency.
• Lack of context information: The bag of words model only considers the
frequency of words, disregarding grammar, word order, and context.
• Insensitivity to word associations: The bag of words model doesn’t consider
the associations between words, and the semantic relationships between
words in a document.
• Lack of semantic information: As the bag of words model only considers
individual words, it does not capture semantic relationships or the meaning of
words in context.
• Importance of stop words: Stop words, such as “the”, “and”, “a”, etc., can have
a large impact on the bag of words representation of a document, even though
they may not carry much meaning.
• Sparsity: For many applications, the bag of words representation of a
document can be very sparse, meaning that most entries in the resulting
feature vector will be zero. This can lead to issues with computational
efficiency and difficulty in interpretability.
TF-IDF
• TF-IDF (Term Frequency-Inverse Document Frequency) is a
statistical measure used for information retrieval and natural
language processing tasks.
• It reflects the importance of a word in a document relative to
an entire corpus.
• The basic idea is that a word that occurs frequently in a
document but rarely in the entire corpus is more informative
than a word that occurs frequently in both the document and
the corpus.
• TF-IDF is used for:
1. Text retrieval and information retrieval systems
2. Document classification and text categorization
3. Text summarization
4. Feature extraction for text data in machine learning
algorithms.
TF-IDF
• It highlights a specific issue which might not be too frequent in our
corpus but holds great importance. The TF–IFD value increases
proportionally to the number of times a word appears in the document
and decreases with the number of documents in the corpus that contain
the word. It is composed of 2 sub-parts, which are :
• Term Frequency (TF)
• Inverse Document Frequency (IDF)
Term Frequency(TF) : Term frequency specifies how frequently a term
appears in the entire document.
It can be thought of as the probability of finding a word within the
document.
It calculates the number of times a word occurs in a review , with respect to
the total number of words in the review .It is formulated as:
Inverse Document Frequency(IDF)
• The inverse document frequency is a measure of
whether a term is rare or frequent across the
documents in the entire corpus.
• It highlights those words which occur in very few
documents across the corpus, or in simple language,
the words that are rare have high IDF score.
• IDF is a log normalized value, that is obtained by
dividing the total number of documents in the corpus
by the number of documents containing the term , and
taking the logarithm of the overall term.
tf-idf
Issues of TF-IDF
• High dimensionality: The resulting feature space can be very
high dimensional, which may lead to issues with overfitting
and computational efficiency.
• Lack of context information: TF-IDF only considers the
frequency of words in a document, disregarding the context
and meaning of words.
• Domain dependence: The results of TF-IDF can be domain-
specific, as the frequency and importance of words can vary
greatly depending on the domain of the text.
• Insensitivity to word associations: TF-IDF doesn’t consider the
associations between words, and the semantic relationships
between words in a document.
• Lack of semantic information: As TF-IDF only considers
individual words, it does not capture semantic relationships or
the meaning of words in context.
Ontology
Ontology in CSE/AI
Ontology in CSE/AI
Ontology vs Knowledgebase
Ontology vs Knowledgebase
Types of Ontologies
Lexical Analysis
Relations in Lexicons
Issues
Zeugma Test
Antonyms
Hyponymy & Hypernymy
Entailment and Transitivity
Meronyny
WordNet
Synsets in WordNet
Lemma vs. Synsets
All Relations in WordNet
Noun and Verb Relations
WorldNet Hierarchies
Word Similarity
Two Classes of Algorithms
Thesaurus based word Similarity
Path based similarity
Similarity
Leacock –Chodorow LC Similarity
Concept Probability
Information Contents
Resnik Similarity
JC Similarity
Stemming
Stemming vs. Lemmatization
• For grammatical reasons, documents are going to use different forms of a word, such
as organize, organizes, and organizing.
• Additionally, there are families of derivationally related words with similar meanings,
such as democracy, democratic, and democratization.
• In many situations, it seems as if it would be useful for a search for one of these
words to return documents that contain another word in the set.
• The goal of both stemming and lemmatization is to reduce inflectional forms and
sometimes derivationally related forms of a word to a common base form. For
instance:
• am, are, is => be
• car, cars, car's, cars' => car
The result of this mapping of text will be something like:
• the boy's cars are different colors => the boy car be differ color
Stemming vs. Lemmatization
• However, the two words differ in their flavor.
• Stemming usually refers to a crude heuristic process that chops off the ends of words in
the hope of achieving this goal correctly most of the time, and often includes the removal
of derivational affixes.
• Lemmatization usually refers to doing things properly with the use of a vocabulary and
morphological analysis of words, normally aiming to remove inflectional endings only and
to return the base or dictionary form of a word, which is known as the lemma .
• If confronted with the token saw, stemming might return just s, whereas lemmatization
would attempt to return either see or saw depending on whether the use of the token
was as a verb or a noun.
• The two may also differ in that stemming most commonly collapses derivationally related
words, whereas lemmatization commonly only collapses the different inflectional forms of
a lemma.
• Linguistic processing for stemming or lemmatization is often done by an additional plug-in
component to the indexing process, and a number of such components exist, both
commercial and open-source.
Stemming vs. Lemmatization
• The most common algorithm for stemming English, and one that has repeatedly
been shown to be empirically very effective, is Porter's algorithm.
• The entire algorithm is too long and intricate to present here, but we will
indicate its general nature. Its an assignment.
• Porter's algorithm consists of 5 phases of word reductions, applied sequentially.
• Within each phase there are various conventions to select rules, such as
selecting the rule from each rule group that applies to the longest suffix.
Stemming vs. Lemmatization
• Stemmers use language-specific rules, but they require less
knowledge than a lemmatizer, which needs a complete
vocabulary and morphological analysis to correctly lemmatize
words.
• Particular domains may also require special stemming rules.
• However, the exact stemmed form does not matter, only the
equivalence classes it forms.
• Rather than using a stemmer, you can use a lemmatizer , a tool
from Natural Language Processing which does full morphological
analysis to accurately identify the lemma for each word.
• Doing full morphological analysis produces at most very modest
benefits for retrieval. It is hard to say more, because either form
of normalization tends not to improve English information
retrieval performance in aggregate - at least not by very much.
Stemming vs. Lemmatization
• While it helps a lot for some queries, it equally hurts performance
a lot for others. Stemming increases recall while harming
precision. As an example of what can go wrong, note that the
Porter stemmer stems all of the following words:
• operate operating operates operation operative operatives
operationalto oper.
• However, since operate in its various forms is a common verb, we
would expect to lose considerable precision on queries such as
the following with Porter stemming:
operational and research
operating and system
operative and dentistry
Stemming vs. Lemmatization
• For a case like this, moving to using a lemmatizer would not
completely fix the problem because particular inflectional forms
are used in particular collocations: a sentence with the
words operate and system is not a good match for the query
operating and system.
• Getting better value from term normalization depends more on
pragmatic issues of word use than on formal issues of linguistic
morphology.
Autoencoders
Language Modeling with RNN
Image Representation using CNN
75 Parameters
Local
Information by
01 Neuron
through single
filter.
New image 28x28x6
VGG 16 Conv_1 VGG 16 Conv_2 VGG 16 Conv_3
VGG 16 Conv_1
Activation MAP stretched
Credits : Stanford University
I/P to O/P
Dimensions
(N-F)/stride + 1
(N-F)/stride + 1
(N-F)/stride + 1
Analogue
Credits : Stanford University