0% found this document useful (0 votes)

25 views6 pages

ML CH 1 Notes

Machine learning (ML) is a subset of artificial intelligence focused on developing algorithms that improve through experience by identifying patterns in data and making predictions. It encompasses various techniques, including supervised learning (classification and regression), unsupervised learning, and reinforcement learning, each with distinct applications such as spam detection, customer segmentation, and self-driving cars. The document also discusses the importance of data mining and the need for machine learning systems to adapt to changing environments.

Uploaded by

iamadesigr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views6 pages

ML CH 1 Notes

Uploaded by

iamadesigr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Machine learning

Chapter 1
What is machine learning?
Each of us is not only a generator but also a consumer of data. We want to have products and services
specialized for us. We want our needs to be understood and interests to be predicted.
Think, for example, of a supermarket chain that is selling thousands of goods to millions of customers either
at hundreds of brick-and-mortar stores all over a country or through a virtual store over the web. The de
tails of each transaction are stored: date, customer id, goods bought and their amount, total money spent,
and so forth. This typically amounts to a lot of data every day. What the supermarket chain wants is to be
able to predict which customer is likely to buy which product, to maximize sales and profit. Similarly each
customer wants to find the set of products best matching his/her needs.
However, Customer behavior changes in time and by geographic location. But we know that it is not
completely random. There are certain patterns in the data.
For example, one can devise an algorithm for sorting. The input is a set of numbers and the output is their
ordered list. For the same task, there may be various algorithms and we may be interested in f inding the
most efficient one, requiring the least number of instructions or memory or both. we would like the computer
(machine) to extract automatically the algorithm for this task. There is no need to learn to sort numbers
since we already have algorithms for that, but there are many applications for which we do not have an
algorithm but have lots of data.
We believe that though identifying the complete process may not be possible, we can still detect certain
patterns or regularities. This is the niche of machine learning. Such patterns may help us understand the
process, or we can use those patterns to make predictions.
Application of machine learning methods to large databases is called data mining. The analogy is that a
large volume of earth and raw material is extracted from a mine, which when processed leads to a small
amount of very precious material; similarly, in data mining, a large volume of data is processed to construct
a simple model with valuable use, for example, having high predictive accuracy.
But machine learning is not just a database problem; it is also a part of artificial intelligence. To be
intelligent, a system that is in a changing environment should have the ability to learn. If the system can
learn and adapt to such changes, the system designer need not foresee and provide solutions for all
possible situations. Machine learning also helps us find solutions to many problems in vision, speech
recognition, and robotics.
Small notes for this section:
Machine Learning (ML) is a field of artificial intelligence focused on designing algorithms that improve
automatically through experience.
The primary goal of ML is to extract patterns from data and make predictions or decisions without explicit
programming.
ML models can be categorized into:

• Supervised Learning: The system is trained on labeled data.

• Unsupervised Learning: The system finds hidden patterns in unlabeled data.
• Reinforcement Learning: The system learns through trial and error based on rewards and
punishments.

Examples of Machine Learning Applications

1.2.1 Learning Associations

Definition:

• Learning associations refers to identifying relationships between different variables in a dataset.

• It is widely used in market basket analysis, where relationships between purchased items are
discovered.

Example: Market Basket Analysis

• If a customer buys bread, they are likely to buy butter as well.

• By analyzing large transaction datasets, retailers can offer personalized promotions and optimize
product placement.
• This principle is applied in Amazon’s recommendation system, which suggests “Customers who
bought this also bought…”.

Techniques Used:

• Association Rule Learning (e.g., Apriori Algorithm, FP-Growth).

• Probability-based models that compute P(Y|X) (the probability of buying Y given that X is already
bought).

Other Applications:

• Recommender systems (Netflix movie recommendations).

• Medical diagnosis (associating symptoms with diseases).
• Fraud detection (identifying unusual transactions).

1.2.2 Classification

Definition:

• Classification is a supervised learning technique where the goal is to assign input data to a
predefined category.
• It is used when the output is discrete (e.g., spam vs. non-spam, malignant vs. benign tumor).

Example: Email Spam Detection

• Given an email, classify it as spam or not spam based on words, sender, and email structure.
• Emails containing words like “free money” or “win a prize” might be flagged as spam.

Techniques Used:

• Decision Trees
• Support Vector Machines (SVM)
• Neural Networks
• Naïve Bayes Classifier

Other Applications:

• Credit Scoring: Predict whether a loan applicant is low-risk or high-risk.

• Facial Recognition: Classifying an image as a particular person or unknown.
• Medical Diagnosis: Detecting diseases from symptoms and medical reports.

1.2.3 Regression

Definition:

• Regression is a supervised learning technique used to predict continuous numerical values.

• Unlike classification, which outputs categories, regression predicts values on a continuous scale.

Example: House Price Prediction

• Predicting house prices based on variables like:

o Area (sq. ft.)
o Number of bedrooms
o Location
o Age of the house

If a house has 2000 sq. ft., 3 bedrooms, in a good locality, the ML model predicts its price based on past
sales data.

Techniques Used:

• Linear Regression
• Polynomial Regression
• Support Vector Regression (SVR)
• Neural Networks

Other Applications:

• Stock Market Prediction: Forecasting stock prices based on historical data.

• Weather Forecasting: Predicting temperature, rainfall, or humidity.
• Salary Prediction: Estimating salary based on experience, qualifications, and industry.

1.2.4 Unsupervised Learning

Definition:

• Unlike supervised learning, unsupervised learning does not have labeled data.
• The model finds hidden patterns or clusters in the data.

Example: Customer Segmentation

• A bank wants to classify its customers based on their spending behavior.

• The ML model groups similar customers without predefined categories.
• Customers can be segmented into:
o High Spenders (frequent luxury purchases).
o Moderate Spenders (balanced spending).
o Low Spenders (spend only on essentials).
• This information helps in targeted marketing campaigns.

Techniques Used:

• Clustering (K-Means, DBSCAN, Hierarchical Clustering)

• Dimensionality Reduction (PCA, t-SNE)

Other Applications:

• Anomaly Detection: Detecting credit card fraud by finding unusual spending patterns.
• Genomics: Identifying gene expression patterns in biology.
• Document Clustering: Automatically categorizing news articles based on content.
1.2.5 Reinforcement Learning

Definition:

• Reinforcement learning (RL) is used in sequential decision-making problems.

• Instead of learning from labeled data, RL agents interact with the environment and learn through
rewards and punishments.

Example: Self-Driving Cars

• A self-driving car learns by interacting with the environment.

• The agent (car) receives rewards for correct decisions (staying in lane, stopping at red lights).
• The agent receives penalties for mistakes (collisions, crossing lanes without signals).
• The goal is to maximize rewards and minimize penalties over time.

Techniques Used:

• Q-Learning
• Deep Q Networks (DQN)
• Policy Gradient Methods

Other Applications:

• Game Playing: AlphaGo defeated human champions in Go by using RL.

• Robotics: RL is used to train humanoid robots to walk and interact.
• Healthcare: RL is applied to personalized treatment recommendations for patients.

Summary Table of Machine Learning Applications

Application Type Examples Techniques Used

Learning Market basket analysis, recommendation
Apriori, FP-Growth
Associations systems
Spam detection, medical diagnosis, credit risk Decision Trees, SVM, Neural
Classification
analysis Networks
House price prediction, stock forecasting, Linear Regression, SVR, Neural
Regression
salary estimation Networks
Unsupervised Customer segmentation, anomaly detection, K-Means, PCA, Hierarchical
Learning genomics Clustering
Reinforcement Self-driving cars, robotics, game playing
Q-Learning, Deep Q Networks
Learning (AlphaGo)

Exercises
1. Imagine you have two possibilities: You can fax a document, that is, send the image, or you can
use an optical character reader (OCR) and send the text file. Discuss the advantage and
disadvantages of the two approaches in a comparative manner. When would one be preferable
over the other?
The text file typically is shorter than the image file but a faxed document can also contain diagrams,
pictures, etc. After using an OCR, we lose properties such as font, size, etc (unless we also recognize
and transmit such information) or the personal touch if it is handwritten text. OCR may not be
perfect, and for ambigious cases, OCR should identify those image blocks and transmit them as they
are. A fax machine is cheaper and easier to find than a computer with scanner and OCR software.
OCR is good if we have high volume, good quality documents; for documents of few pages with small
amount of text, it is better to transmit the image.

Let us say we are building an OCR and for each character, we store the bitmap of that character as
a template that we match with the read character pixel by pixel. Explain when such a system
would fail. Why are barcode readers still used?
Such a system allows only one template per character and cannot distinguish characters from
multiple fonts, for example. There are standardized fonts such as OCR-A and OCR-B, the fonts you
typically see in vouchers and banking slips, which are used with OCR software, and you may have
already noticed how the characters in these fonts have been slightly changed to minimize the
similarities between them. Barcode readers are still used because reading barcodes is still a better
(cheaper, more reliable, more available) technology than reading characters.

2. Assume we are given the task to build a system that can distinguish junk e-mail. What is in a junk
e-mail that lets us know that it is junk? How can the computer detect junk through a syntactic
analysis? What would you like the computer to do if it detects a junk e-mail—delete it
automatically, move it to a different file, or just highlight it on the screen?
Typically, spam filters check for the existence/absence of words and symbols. Words such as
“opportunity”, ”viagra”, ”dollars” as well as characters such as ’$’, ’!’ increase the probability that
the email is spam. These probabilities are learned from a training set of example past emails that the
user has previously marked as spam (One very frequently used method for spam filtering is the naive
Bayes’ classifier which we discuss in Section 5.7). The spam filters do not work with 100 percent
reliability and frequently make errors in classification. If a junk mail is not filtered and showed to the
user, this is not good, but it is not as bad as filtering a good mail as spam. Therefore, mail messages
that the system considers as spam should not be automatically deleted but kept aside so that the
user can see them if he/she wants to, especially in the early stages of using the spam filter when the
system has not yet been trained sufficiently. Note that filtering spam will probably never be solved
completely as the spammers keep finding novel ways to outdo the filters: They use digit ‘0’ instead
of the letter ’O’, digit ‘1’ instead of letter ‘l’ to pass the word tests, add pieces of texts from regular
messages for the mail to be considered not spam, or send it as image not as text (and lately distort
the image in small random amounts to that it is not always the same image). Still, spam filtering is
probably one of the best application areas of machine learning where learning systems can adapt to
changes in the ways spam messages are generated.

3. Let us say you are given the task of building an automated taxi. Define the constraints. What are
the inputs? What is the output? How can you communicate with the passenger? Do you need to
communicate with the other automated taxis, that is, do you need a “language”?
An automated taxi should be able to pick a passenger and drive him/her to a destination. It should
have some positioning system (GPS/GIS) and should have other sensors (cameras) to be able to
sense cars, pedestrians, obstacles etc on the road. The output should be the sequence of actions to
reach the destination in the smallest time with the minimum inconvenience to the passenger. The
automated taxi needs to communicate with the passenger to receive commands and may also need
to interact with other automated taxis to exchange information about road traffic or scheduling,
load balancing, etc.

7. If a face image is a 100×100 image, written in row-major, this is a 10,000 dimensional vector. If
we shift the image one pixel to the right, this will be a very different vector in the 10,000-
dimensional space. How can we build face recognizers robust to such distortions?
Face recognition systems typically have a preprocessing stage for normalization where the input is
centered and possibly resized before recog nition. This is generally done by first finding the eyes and
then translating the image accordingly. There are also recognizers that do not use the face image as
pixels but rather extract structural features from the image, for example, the ratio of the distance
between the two eyes to the size of the whole face. Such features would be invariant to translations
and size changes.

DA Chap2
No ratings yet
DA Chap2
14 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
5 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Maharana Pratap Group of Institutions, Mandhana, Kanpur: Department of Computer Science Engineering)
No ratings yet
Maharana Pratap Group of Institutions, Mandhana, Kanpur: Department of Computer Science Engineering)
115 pages
Chapter 1 1
No ratings yet
Chapter 1 1
12 pages
Session One Machine Learning
No ratings yet
Session One Machine Learning
18 pages
ML Unit1 (HKB)
No ratings yet
ML Unit1 (HKB)
7 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
6 pages
Presenttion 33
No ratings yet
Presenttion 33
2 pages
AI Chapter 3 Part 1
No ratings yet
AI Chapter 3 Part 1
33 pages
Introduction To Data Science Module 3
No ratings yet
Introduction To Data Science Module 3
24 pages
UNIT III DKD
No ratings yet
UNIT III DKD
48 pages
Unit-I Machine Leaning Notes
No ratings yet
Unit-I Machine Leaning Notes
13 pages
7 Machine Learning Algirithms
No ratings yet
7 Machine Learning Algirithms
20 pages
Unit Iii - Aiml
No ratings yet
Unit Iii - Aiml
47 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
25 pages
Machine Learning Overview & Types
No ratings yet
Machine Learning Overview & Types
25 pages
11 Introduction To Machine Learning
No ratings yet
11 Introduction To Machine Learning
13 pages
ML Notes Unit 1-2
No ratings yet
ML Notes Unit 1-2
55 pages
UNIT-1 Machine Learning
No ratings yet
UNIT-1 Machine Learning
43 pages
Unit3 - Updated
No ratings yet
Unit3 - Updated
116 pages
Report Rahul
No ratings yet
Report Rahul
26 pages
PROJECT REPORT p2
No ratings yet
PROJECT REPORT p2
82 pages
ML Unit - 1
No ratings yet
ML Unit - 1
70 pages
UNIT I-Part 1
No ratings yet
UNIT I-Part 1
52 pages
ML Report
No ratings yet
ML Report
19 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
Module 1
No ratings yet
Module 1
54 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Unit 1
No ratings yet
Unit 1
19 pages
And Where The Machine Learning Models Are Being Used?
100% (1)
And Where The Machine Learning Models Are Being Used?
4 pages
MLP Unit-I
No ratings yet
MLP Unit-I
62 pages
MLDAP Module1
No ratings yet
MLDAP Module1
43 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
ML@Chapter 1
No ratings yet
ML@Chapter 1
29 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
46 pages
Machine Learning
No ratings yet
Machine Learning
39 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
11 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
20 pages
Introduction To Machine Learning Basics
No ratings yet
Introduction To Machine Learning Basics
12 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
CS 601-Machine Learning
No ratings yet
CS 601-Machine Learning
82 pages
Magic Labeled (ML)
No ratings yet
Magic Labeled (ML)
689 pages
Module 5.1
No ratings yet
Module 5.1
43 pages
Ai Faheem
No ratings yet
Ai Faheem
16 pages
Mechine Learning
No ratings yet
Mechine Learning
10 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
Uvuhiihijno
No ratings yet
Uvuhiihijno
14 pages
Unit 1 ML
No ratings yet
Unit 1 ML
14 pages
Training Report On Machine Learning
No ratings yet
Training Report On Machine Learning
27 pages
ML Unit 1
No ratings yet
ML Unit 1
34 pages
ML Notes
No ratings yet
ML Notes
101 pages
Intro To ML
No ratings yet
Intro To ML
4 pages
r20 ML Unit 1 KKJJVBBJJV
No ratings yet
r20 ML Unit 1 KKJJVBBJJV
24 pages
Unit 5
No ratings yet
Unit 5
26 pages
6.1.unit-1 ML Handsout
No ratings yet
6.1.unit-1 ML Handsout
18 pages
Module 1 (ML)
No ratings yet
Module 1 (ML)
17 pages
Carrier Furnace Warranty Guide
No ratings yet
Carrier Furnace Warranty Guide
2 pages
Delhi State Spatial Data Infrastructure: A Trend Setter in Urban Management
No ratings yet
Delhi State Spatial Data Infrastructure: A Trend Setter in Urban Management
29 pages
Edwards Makubuya V KCC Kawempe Division
No ratings yet
Edwards Makubuya V KCC Kawempe Division
9 pages
Australian Super PDS
No ratings yet
Australian Super PDS
28 pages
Memo To Principal
No ratings yet
Memo To Principal
4 pages
5 LinearRegression With One-Variable
No ratings yet
5 LinearRegression With One-Variable
21 pages
Inquiry # 1000028814-2025
No ratings yet
Inquiry # 1000028814-2025
4 pages
Legal Documents Formate (Honhaar)
No ratings yet
Legal Documents Formate (Honhaar)
6 pages
5 Ton Electrical Drawing
No ratings yet
5 Ton Electrical Drawing
29 pages
IT Systems Admin Expertise
No ratings yet
IT Systems Admin Expertise
3 pages
Guidelines For GNSS Positioning in The Oil and Gas Industry: February
No ratings yet
Guidelines For GNSS Positioning in The Oil and Gas Industry: February
91 pages
DR Salim Vohra: Centre For Health Impact Assessment, Institute of Occupational Medicine
No ratings yet
DR Salim Vohra: Centre For Health Impact Assessment, Institute of Occupational Medicine
17 pages
Mailroom
100% (1)
Mailroom
17 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
2 pages
Offer Letter for Education Consultant
No ratings yet
Offer Letter for Education Consultant
2 pages
Sign Bilingual Ed: UK & Beyond
No ratings yet
Sign Bilingual Ed: UK & Beyond
33 pages
4es of Marketing
No ratings yet
4es of Marketing
14 pages
SEO Complete Guide by Surojit
No ratings yet
SEO Complete Guide by Surojit
55 pages
Safety Feature of RNPP
No ratings yet
Safety Feature of RNPP
29 pages
Bluguard l900 User Manual v1.0
No ratings yet
Bluguard l900 User Manual v1.0
23 pages
TS2 - COA-Laws Rules and Regulations On Government Expenditures PDF
No ratings yet
TS2 - COA-Laws Rules and Regulations On Government Expenditures PDF
92 pages
BSB Group vs. Go: Theft Case Analysis
No ratings yet
BSB Group vs. Go: Theft Case Analysis
14 pages
Mastering Stm32fx Bootloaders - Furuta Kimiko
No ratings yet
Mastering Stm32fx Bootloaders - Furuta Kimiko
238 pages
BMO Everyday Banking Statement
No ratings yet
BMO Everyday Banking Statement
3 pages
Active Directory Enumeration 1682922094
No ratings yet
Active Directory Enumeration 1682922094
7 pages
WEG CFW11 Installation Guide 10001803811 en Es PT de FR Ru It Tu
No ratings yet
WEG CFW11 Installation Guide 10001803811 en Es PT de FR Ru It Tu
212 pages
Spring Element With Nonlinear Analysis Parameters (Large Displacements Off)
No ratings yet
Spring Element With Nonlinear Analysis Parameters (Large Displacements Off)
24 pages
CONSTITUTION
No ratings yet
CONSTITUTION
4 pages
Water Control Valve Specs
No ratings yet
Water Control Valve Specs
2 pages
SolidWorks Toolbox Guide
No ratings yet
SolidWorks Toolbox Guide
69 pages