0% found this document useful (0 votes)
16 views55 pages

Unit 1

Uploaded by

lencho03406
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views55 pages

Unit 1

Uploaded by

lencho03406
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

AI and Expert System

Department of Information
Technology
Ambo University
About the course
 Course Title: AI & Expert Systems
 Course Code: MIT6124
 Credit Hours: 3
 Contact Hour: 2 lecture + 1 lab
Course Objective
 This course focuses on to develop
an appreciation of the ideas
underpinning different theories of
knowledge of AI, Expert systems,
robotics fundamentals, Neural
Network, Pattern recognition,
Image processing, Genetic
algorithm & machine learning
Contents
 Unit 1: Recap of basic Artificial
Intelligence concepts
 Unit 2: Expert Systems
 Unit 3: Introduction to Neural Network
 Unit 4: Fuzzy system
 Unit 5: Introduction to Machine
learning &
Genetic Algorithm
 Unit 6: Introduction to Pattern
Recognition &
Image processing
What is AI?
 There is no officially agreed
definition for AI.
 It could be better to defined based
on two terminologies:
 Autonomy - The ability to perform
tasks in complex environments
without constant guidance by a user.
 Adaptivity - The ability to improve
performance by learning from
experience.
Example
 Self-driving cars:
 search and planning to find the most
convenient route from A to B
 computer vision to identify obstacles, and
 decision making under uncertainty to cope
with the complex and dynamic environment.
 The same technologies are also used in other
autonomous systems such as:
 delivery robots, flying drones, and autonomous
ships.
Exercises
 Which of the following are AI and
which are not?
 1. Spreadsheet that calculates sums and
other pre-defined functions on given
data
 The outcome is determined by the user-
specified formula, no AI needed;
 2. Predicting the stock market by fitting
a curve to past data about stock prices
 Fitting a simple curve is not really AI, but
there are so many different curves to choose
from, even if there's a lot of data to constrain
Related Fields
 In addition to AI, there are several other closely
related topics.
 These include
 machine learning,
 data science, and
 deep learning
 Machine learning
 Systems that improve their performance in a
given task with more and more experience or
data
Related Fields (contd)
 Deep learning
 is a subfield of machine learning, which itself is a
subfield of AI, which itself is a subfield of computer
science.
 The depth of deep learning refers to
 the complexity of a mathematical model, and
 that the increased computing power of modern
computers has allowed researchers to increase this
complexity
 to reach levels that appear not only quantitatively
but also qualitatively different from before
Related Fields (contd)
 Data Science
 is a recent umbrella term (term that
covers several subdisciplines) that
includes
 machine learning and statistics
 certain aspects of computer science
including
 algorithms,
 data storage, and
 web application development
Related Fields (contd)
 Robotics means
 building and programming robots so
that
 they can operate in complex, real-world
scenarios.
 In a way, robotics is the ultimate
challenge of AI
 since it requires a combination of
virtually all areas of AI.
 For example:
 Computer vision and speech
Related Fields (contd)
 Natural language processing,
information retrieval, and reasoning
under uncertainty
 for processing instructions and predicting
consequences of potential actions
 Cognitive modeling and affective
computing
 (systems that respond to expressions of
human feelings or that mimic feelings)
 for interacting and working together with
humans
Philosophy of AI
 The very nature of the term
artificial intelligence brings up
philosophical questions

 whether intelligent behavior implies


or requires the existence of a mind,
and

 to what extent is consciousness


replicable as computation.
The Turing Test
 Alan Turing (1912-1954) was an
English mathematician and
logician.
 He is rightfully considered to be the
father of computer science.
 Turing was fascinated by
 intelligence and thinking, and
 the possibility of simulating them by
machines
 Turings most prominent contribution
to AI is
 his imitation game, which later became
The Turing Test (contd)
 In the test, a human interrogator
interacts with two players, A and B, by
exchanging written messages (in a
chat).
 If the interrogator cannot determine
which player, A or B, is a computer and
which is a human,
 the computer is said to pass the test.
 The argument is that if a computer is
indistinguishable from a human in a
general natural language conversation,
 then it must have reached human-level
Probability
 One of the reasons why modern AI
methods actually work in real-
world problems
 is their ability to deal with uncertainty
 Probability has turned out to be
 the best approach for reasoning
under uncertainty, and
 almost all current AI applications are
based, to at least some degree, on
probabilities.
Why probability matters
 Probability can be used
 to quantify and compare risks in
everyday life:
 what are the chances of crashing your car
– if you exceed the speed limit,
 what are the chances that the interest
rates on your mortgage will go up by five
percentage points within the next five
years, or
 what are the chances that AI will
automate particular tasks
– such as detecting fractured bones in X-ray
images or waiting tables in a restaurant.
The key lesson about probability
 It is the ability to think of
uncertainty as
 a thing that can be quantified at least
in principle.
 This means that we can talk about
uncertainty as if it were a number:
 numbers can be compared (is this
thing more probable than that thing),
and
 they can often be measured.
The key lesson about probability
(contd)
 Granted, measuring probabilities is
hard:
 we usually need many observations
about a phenomenon to draw
conclusions.
 However, by systematically collecting
data,
 we can critically evaluate probabilistic
statements, and
 our numbers can sometimes be found to
be right or wrong.
The key lesson about probability
(contd)
 In other words, the key lesson is
that
 uncertainty is not beyond the scope
of rational thinking and discussion,
and
 probability provides a systematic way
of doing just that.
Why quantifying uncertainty matters

 If we think of uncertainty as
something that can't be quantified
or measured,
 the uncertainty aspect may become
an obstacle for rational discussion.

 We may for example argue that


 since we don't know exactly whether
 a vaccine may cause a harmful side-
effect, it is too dangerous to use.
Why quantifying uncertainty matters

 However, this may lead us


 to ignore a life-threatening disease
that the vaccine will eradicate.
 In most cases, the benefits and
risks are known
 to sufficient precision to clearly see
that one is more significant than the
other.
Odds
 Probably the easiest way
 to represent uncertainty is through
odds.
 They make it particularly easy
 to update beliefs when more
information becomes available
 By odds, we mean
 for example 3:1 (three to one), which
means that
 we expect that for every three cases of
an outcome, there is one case of the
Odds (contd)
 The other way to express the same
would be to say that
 the chances of winning are 3/4 (three
in four).
 These are called natural frequencies
since they involve only whole numbers.
 With whole numbers, it is easy to
imagine,
 for example, four people out of whom,
three have brown eyes.
 Or four days out of which it rains on three
(if you're in Waliso).
Why we use odds and not
percentages
 Three out of four is of course the
same as 75%
 (mathematicians prefer to use
fractions like 0.75 instead of
percentages).
 It has been found that people get
confused and make mistakes more
easily
 when dealing with fractions and
 percentages than with natural
frequencies or odds.
Example
 The odds 1:5 mean that
 you'd have to play the game six times
to get one win on the average.
 The probability 20% means that
 you'd have to play five times to get
one win on the average.
Example (contd)
 For odds that are greater than one,
such as 5:1,
 it is easy to remember that
 we are not dealing with probabilities
because no probability can be greater
than 1 (or greater than 100%),
 but for odds that are less than one
such as 1:5, the danger of confusion
lurks around the corner.
The Bayes rule
 The Bayes rule can be expressed in
many forms.
 The simplest one is in terms of
odds.
 The idea is
 to take the odds for something
happening (against it not happening),
which we´ll write as prior odds.
 The word prior refers to
 our assessment of the odds before
obtaining some new information that
The Bayes rule (contd)
 The purpose of the formula is
 to update the prior odds
 when new information becomes
available,
– to obtain the posterior odds, or
– the odds after obtaining the information
How odds change
 In order to weigh the new
information, and decide how the
odds change when it becomes
available,
 we need to consider how likely we
would be to encounter this
information in alternative situations.
 Let's take as an example,
 the odds that it will rain later today.
 Imagine getting up in the morning in
Woliso.
How odds change (contd)
 The chances of rain are 206 in 365
 The number of days without rain is
therefore 159.
 This converts to prior odds of
206:159 for rain,
 so the cards are stacked against you
already before you open your eyes.
How odds change (contd)
 However, after opening your eyes and
taking a look outside,
 you notice it's cloudy.
 Suppose the chances of having a cloudy
morning on a rainy day are 9 out of 10
 that means that only one out of 10 rainy
days start out with blue skies.
 But sometimes there are also clouds
without rain:
 the chances of having clouds on a rainless
day are 1 in 10.
How odds change (contd)
 Now how much higher are the
chances of clouds on a rainy day
compared to a rainless day?
 Think about this carefully as it will
be important
 to be able to comprehend the
question and obtain the answer in
what follows.
 The answer is that
 the chances of clouds are nine times
higher on a rainy day than on a
Likelihood ratio
 The above ratio
 (nine times higher chance of clouds
on a rainy day than on a rainless day)
is called
 the likelihood ratio.
 More generally, the likelihood ratio
is
 the probability of the observation in
case the event of interest (in the
above, rain), divided by the
probability of the observation in case
Likelihood ratio (contd)
 So we concluded that on a cloudy
morning, we have:
 likelihood ratio = (9/10) / (1/10) =
9
 The mighty Bayes rule
 for converting prior odds into
posterior odds is as follows:

 posterior odds = likelihood ratio ×


prior odds
The Bayes rule in practice: breast
cancer screening
 This example also illustrates
 a common bias in dealing with
uncertain information called the base-
rate fallacy.
 Consider mammographic screening
for breast cancer.
 Using made up percentages for the
sake of simplifying the numbers,
 let's assume that 5 in 100 women
have breast cancer.
...The Bayes rule in practice
 Suppose that if a person has
breast cancer,
 then the mammograph test will find it
80 times out of 100.
 When the test comes out
suggesting that breast cancer is
present,
 we say that the result is positive
...The Bayes rule in practice
 The test may also fail in the other
direction,
 namely to indicate breast cancer when none
exists.
 This is called a false positive finding.
 Suppose that if the person being tested
actually doesn't have breast cancer,
 the chances that the test nevertheless
comes out positive are 10 in 100.
 Based on the above probabilities,
Assignment - 1
1. Calculate prior probability
2. Calculate the likelihood ratio.
Naive Bayes classification
 One of the most useful applications
of the Bayes rule is
 the so-called naive Bayes classifier.
 The Bayes classifier is a machine
learning technique that can be
used
 to classify objects such as text
documents into two or more classes.
 The classifier is trained by
 analyzing a set of training data, for
which the correct classes are given.
...Naive Bayes classification
 The naive Bayes classifier can be
used
 to determine
 the probabilities of the classes given
a number of different observations.
 The assumption in the model is
that
 the feature variables are conditionally
independent given the class
Real world application: spam filters

 We will use a spam email filter as a


running example
 for illustrating the idea of the naive
Bayes classifier.
 Thus, the class variable indicates
 whether a message is spam (or junk
email) or
 whether it is a legitimate message
(also called ham).
...Real world application: spam
filters
 The words in the message
correspond to the feature
variables,
 so that the number of feature
variables in the model is
determined by
 the length of the message.
Why we call it naive
 Using spam filters as an example,
 the idea is to think of the words as being
produced by choosing one word after the
other
 so that the choice of the word depends only on
whether the message is spam or ham.
 This is a crude simplification of the
process because
 it means that there is no dependency
between adjacent words, and
 the order of the words has no significance.
 This is in fact why the method is called naive.
Estimating parameters
 To get started,
 we need to specify the prior odds for spam
(against ham).
 For simplicity assume this to be 1:1
 which means that on the average half of the
incoming messages are spam
 (in reality, the amount of spam is probably much
higher).
 To get our likelihood ratios,
 we need two different probabilities for any
word occurring:
 one in spam messages and another one in
ham messages.
...Estimating parameters
 The word distributions for the two
classes are
 best estimated from actual training
data
 that contains some spam messages as
well as legitimate messages.
 The simplest way is
 to count how many times each word,
 appears in the data and divide the
number by the total word count.
...Estimating parameters
 To illustrate the idea,
 lets assume that we have at our
disposal some spam and some
ham.
 You can easily obtain such data by
saving a batch of your emails in
two files.
 Assume that we have calculated
 the number of occurrences of the
following words (along with all other
words) in the two classes of
...Estimating parameters

word spam ham


million 156 98
dollars 29 119
adclick 51 0
conferences 0 12
total 95791 306438
...Estimating parameters

word spam ham


million 156 98
dollars 29 119
adclick 51 0
conferences 0 12
total 95791 306438
...Estimating parameters
 We can now estimate that
 the probability that a word in a spam
message is million,
 for example, is about 156 out of 95791,
which is roughly the same as 1 in 614.
 Likewise,
 we get the estimate that 98 out of
306438 words,
 which is about the same as 1 in 3127, in
a ham message are million.
...Estimating parameters
 Both of these probability estimates
are small, less than 1 in 500,
 but more importantly,
 the former is higher than the latter: 1
in 614 is higher than 1 in 3127.
 This means that the likelihood
ratio:
 the ratio is (1/614) / (1/3127)
 = 3127/614 = 5.1 (rounded
to one decimal digit).
Zero means trouble
 One problem with estimating the
probabilities directly from the counts is
that
 zero counts lead to zero estimates.
 This can be quite harmful for the
performance of the classifier
 it easily leads to situations where the
posterior odds are 0/0, which is nonsense.
 The simplest solution is to use a small
lower bound for all probability
estimates.
 The value 1/100000, for instance, does the
... means trouble
 Using the above logic,
 we can determine the likelihood
ratio for all possible words without
having to use zero, giving us the
following likelihood ratios:
word likelihood ratio
million 5.1
dollars 0.8
adclick 53.2
conferences 0.3
... means trouble
 We are now ready to apply the
method to classify new messages.
Assignment -2

1. One word spam filter


Let's start with a message that only has
one word in it: million

a) Calculate the posterior odds for


spam given this word using the table
above.
b) Is that spam or ham?
Assignment -2 (contd)

2. Full spam filter


a) Now use the naive Bayes method to
calculate the posterior odds for
spam given the message: million
dollars adclick conferences.
b) Is spam or ham?

You might also like