0% found this document useful (0 votes)
25 views6 pages

ML CH 1 Notes

Machine learning (ML) is a subset of artificial intelligence focused on developing algorithms that improve through experience by identifying patterns in data and making predictions. It encompasses various techniques, including supervised learning (classification and regression), unsupervised learning, and reinforcement learning, each with distinct applications such as spam detection, customer segmentation, and self-driving cars. The document also discusses the importance of data mining and the need for machine learning systems to adapt to changing environments.

Uploaded by

iamadesigr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views6 pages

ML CH 1 Notes

Machine learning (ML) is a subset of artificial intelligence focused on developing algorithms that improve through experience by identifying patterns in data and making predictions. It encompasses various techniques, including supervised learning (classification and regression), unsupervised learning, and reinforcement learning, each with distinct applications such as spam detection, customer segmentation, and self-driving cars. The document also discusses the importance of data mining and the need for machine learning systems to adapt to changing environments.

Uploaded by

iamadesigr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine learning

Chapter 1
What is machine learning?
Each of us is not only a generator but also a consumer of data. We want to have products and services
specialized for us. We want our needs to be understood and interests to be predicted.
Think, for example, of a supermarket chain that is selling thousands of goods to millions of customers either
at hundreds of brick-and-mortar stores all over a country or through a virtual store over the web. The de
tails of each transaction are stored: date, customer id, goods bought and their amount, total money spent,
and so forth. This typically amounts to a lot of data every day. What the supermarket chain wants is to be
able to predict which customer is likely to buy which product, to maximize sales and profit. Similarly each
customer wants to find the set of products best matching his/her needs.
However, Customer behavior changes in time and by geographic location. But we know that it is not
completely random. There are certain patterns in the data.
For example, one can devise an algorithm for sorting. The input is a set of numbers and the output is their
ordered list. For the same task, there may be various algorithms and we may be interested in f inding the
most efficient one, requiring the least number of instructions or memory or both. we would like the computer
(machine) to extract automatically the algorithm for this task. There is no need to learn to sort numbers
since we already have algorithms for that, but there are many applications for which we do not have an
algorithm but have lots of data.
We believe that though identifying the complete process may not be possible, we can still detect certain
patterns or regularities. This is the niche of machine learning. Such patterns may help us understand the
process, or we can use those patterns to make predictions.
Application of machine learning methods to large databases is called data mining. The analogy is that a
large volume of earth and raw material is extracted from a mine, which when processed leads to a small
amount of very precious material; similarly, in data mining, a large volume of data is processed to construct
a simple model with valuable use, for example, having high predictive accuracy.
But machine learning is not just a database problem; it is also a part of artificial intelligence. To be
intelligent, a system that is in a changing environment should have the ability to learn. If the system can
learn and adapt to such changes, the system designer need not foresee and provide solutions for all
possible situations. Machine learning also helps us find solutions to many problems in vision, speech
recognition, and robotics.
Small notes for this section:
Machine Learning (ML) is a field of artificial intelligence focused on designing algorithms that improve
automatically through experience.
The primary goal of ML is to extract patterns from data and make predictions or decisions without explicit
programming.
ML models can be categorized into:

• Supervised Learning: The system is trained on labeled data.


• Unsupervised Learning: The system finds hidden patterns in unlabeled data.
• Reinforcement Learning: The system learns through trial and error based on rewards and
punishments.

Examples of Machine Learning Applications

1.2.1 Learning Associations

Definition:

• Learning associations refers to identifying relationships between different variables in a dataset.


• It is widely used in market basket analysis, where relationships between purchased items are
discovered.

Example: Market Basket Analysis

• If a customer buys bread, they are likely to buy butter as well.


• By analyzing large transaction datasets, retailers can offer personalized promotions and optimize
product placement.
• This principle is applied in Amazon’s recommendation system, which suggests “Customers who
bought this also bought…”.

Techniques Used:

• Association Rule Learning (e.g., Apriori Algorithm, FP-Growth).


• Probability-based models that compute P(Y|X) (the probability of buying Y given that X is already
bought).

Other Applications:

• Recommender systems (Netflix movie recommendations).


• Medical diagnosis (associating symptoms with diseases).
• Fraud detection (identifying unusual transactions).

1.2.2 Classification

Definition:

• Classification is a supervised learning technique where the goal is to assign input data to a
predefined category.
• It is used when the output is discrete (e.g., spam vs. non-spam, malignant vs. benign tumor).

Example: Email Spam Detection

• Given an email, classify it as spam or not spam based on words, sender, and email structure.
• Emails containing words like “free money” or “win a prize” might be flagged as spam.

Techniques Used:

• Decision Trees
• Support Vector Machines (SVM)
• Neural Networks
• Naïve Bayes Classifier

Other Applications:

• Credit Scoring: Predict whether a loan applicant is low-risk or high-risk.


• Facial Recognition: Classifying an image as a particular person or unknown.
• Medical Diagnosis: Detecting diseases from symptoms and medical reports.

1.2.3 Regression

Definition:

• Regression is a supervised learning technique used to predict continuous numerical values.


• Unlike classification, which outputs categories, regression predicts values on a continuous scale.

Example: House Price Prediction

• Predicting house prices based on variables like:


o Area (sq. ft.)
o Number of bedrooms
o Location
o Age of the house

If a house has 2000 sq. ft., 3 bedrooms, in a good locality, the ML model predicts its price based on past
sales data.

Techniques Used:

• Linear Regression
• Polynomial Regression
• Support Vector Regression (SVR)
• Neural Networks

Other Applications:

• Stock Market Prediction: Forecasting stock prices based on historical data.


• Weather Forecasting: Predicting temperature, rainfall, or humidity.
• Salary Prediction: Estimating salary based on experience, qualifications, and industry.

1.2.4 Unsupervised Learning

Definition:

• Unlike supervised learning, unsupervised learning does not have labeled data.
• The model finds hidden patterns or clusters in the data.

Example: Customer Segmentation

• A bank wants to classify its customers based on their spending behavior.


• The ML model groups similar customers without predefined categories.
• Customers can be segmented into:
o High Spenders (frequent luxury purchases).
o Moderate Spenders (balanced spending).
o Low Spenders (spend only on essentials).
• This information helps in targeted marketing campaigns.

Techniques Used:

• Clustering (K-Means, DBSCAN, Hierarchical Clustering)


• Dimensionality Reduction (PCA, t-SNE)

Other Applications:

• Anomaly Detection: Detecting credit card fraud by finding unusual spending patterns.
• Genomics: Identifying gene expression patterns in biology.
• Document Clustering: Automatically categorizing news articles based on content.
1.2.5 Reinforcement Learning

Definition:

• Reinforcement learning (RL) is used in sequential decision-making problems.


• Instead of learning from labeled data, RL agents interact with the environment and learn through
rewards and punishments.

Example: Self-Driving Cars

• A self-driving car learns by interacting with the environment.


• The agent (car) receives rewards for correct decisions (staying in lane, stopping at red lights).
• The agent receives penalties for mistakes (collisions, crossing lanes without signals).
• The goal is to maximize rewards and minimize penalties over time.

Techniques Used:

• Q-Learning
• Deep Q Networks (DQN)
• Policy Gradient Methods

Other Applications:

• Game Playing: AlphaGo defeated human champions in Go by using RL.


• Robotics: RL is used to train humanoid robots to walk and interact.
• Healthcare: RL is applied to personalized treatment recommendations for patients.

Summary Table of Machine Learning Applications

Application Type Examples Techniques Used


Learning Market basket analysis, recommendation
Apriori, FP-Growth
Associations systems
Spam detection, medical diagnosis, credit risk Decision Trees, SVM, Neural
Classification
analysis Networks
House price prediction, stock forecasting, Linear Regression, SVR, Neural
Regression
salary estimation Networks
Unsupervised Customer segmentation, anomaly detection, K-Means, PCA, Hierarchical
Learning genomics Clustering
Reinforcement Self-driving cars, robotics, game playing
Q-Learning, Deep Q Networks
Learning (AlphaGo)

Exercises
1. Imagine you have two possibilities: You can fax a document, that is, send the image, or you can
use an optical character reader (OCR) and send the text file. Discuss the advantage and
disadvantages of the two approaches in a comparative manner. When would one be preferable
over the other?
The text file typically is shorter than the image file but a faxed document can also contain diagrams,
pictures, etc. After using an OCR, we lose properties such as font, size, etc (unless we also recognize
and transmit such information) or the personal touch if it is handwritten text. OCR may not be
perfect, and for ambigious cases, OCR should identify those image blocks and transmit them as they
are. A fax machine is cheaper and easier to find than a computer with scanner and OCR software.
OCR is good if we have high volume, good quality documents; for documents of few pages with small
amount of text, it is better to transmit the image.

Let us say we are building an OCR and for each character, we store the bitmap of that character as
a template that we match with the read character pixel by pixel. Explain when such a system
would fail. Why are barcode readers still used?
Such a system allows only one template per character and cannot distinguish characters from
multiple fonts, for example. There are standardized fonts such as OCR-A and OCR-B, the fonts you
typically see in vouchers and banking slips, which are used with OCR software, and you may have
already noticed how the characters in these fonts have been slightly changed to minimize the
similarities between them. Barcode readers are still used because reading barcodes is still a better
(cheaper, more reliable, more available) technology than reading characters.

2. Assume we are given the task to build a system that can distinguish junk e-mail. What is in a junk
e-mail that lets us know that it is junk? How can the computer detect junk through a syntactic
analysis? What would you like the computer to do if it detects a junk e-mail—delete it
automatically, move it to a different file, or just highlight it on the screen?
Typically, spam filters check for the existence/absence of words and symbols. Words such as
“opportunity”, ”viagra”, ”dollars” as well as characters such as ’$’, ’!’ increase the probability that
the email is spam. These probabilities are learned from a training set of example past emails that the
user has previously marked as spam (One very frequently used method for spam filtering is the naive
Bayes’ classifier which we discuss in Section 5.7). The spam filters do not work with 100 percent
reliability and frequently make errors in classification. If a junk mail is not filtered and showed to the
user, this is not good, but it is not as bad as filtering a good mail as spam. Therefore, mail messages
that the system considers as spam should not be automatically deleted but kept aside so that the
user can see them if he/she wants to, especially in the early stages of using the spam filter when the
system has not yet been trained sufficiently. Note that filtering spam will probably never be solved
completely as the spammers keep finding novel ways to outdo the filters: They use digit ‘0’ instead
of the letter ’O’, digit ‘1’ instead of letter ‘l’ to pass the word tests, add pieces of texts from regular
messages for the mail to be considered not spam, or send it as image not as text (and lately distort
the image in small random amounts to that it is not always the same image). Still, spam filtering is
probably one of the best application areas of machine learning where learning systems can adapt to
changes in the ways spam messages are generated.

3. Let us say you are given the task of building an automated taxi. Define the constraints. What are
the inputs? What is the output? How can you communicate with the passenger? Do you need to
communicate with the other automated taxis, that is, do you need a “language”?
An automated taxi should be able to pick a passenger and drive him/her to a destination. It should
have some positioning system (GPS/GIS) and should have other sensors (cameras) to be able to
sense cars, pedestrians, obstacles etc on the road. The output should be the sequence of actions to
reach the destination in the smallest time with the minimum inconvenience to the passenger. The
automated taxi needs to communicate with the passenger to receive commands and may also need
to interact with other automated taxis to exchange information about road traffic or scheduling,
load balancing, etc.

7. If a face image is a 100×100 image, written in row-major, this is a 10,000 dimensional vector. If
we shift the image one pixel to the right, this will be a very different vector in the 10,000-
dimensional space. How can we build face recognizers robust to such distortions?
Face recognition systems typically have a preprocessing stage for normalization where the input is
centered and possibly resized before recog nition. This is generally done by first finding the eyes and
then translating the image accordingly. There are also recognizers that do not use the face image as
pixels but rather extract structural features from the image, for example, the ratio of the distance
between the two eyes to the size of the whole face. Such features would be invariant to translations
and size changes.

You might also like