0% found this document useful (0 votes)
91 views35 pages

Study Material For Reference

The document provides an overview of artificial intelligence (AI), its subsets like machine learning (ML) and deep learning (DL), and various applications such as ChatGPT and Google Translate. It discusses different machine learning models, including supervised learning techniques like classification and regression, along with examples and algorithms like K-Nearest Neighbors and Support Vector Machines. Additionally, it covers decision tree analysis and random forests for customer analysis and prediction tasks.

Uploaded by

Muskan Sikarwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views35 pages

Study Material For Reference

The document provides an overview of artificial intelligence (AI), its subsets like machine learning (ML) and deep learning (DL), and various applications such as ChatGPT and Google Translate. It discusses different machine learning models, including supervised learning techniques like classification and regression, along with examples and algorithms like K-Nearest Neighbors and Support Vector Machines. Additionally, it covers decision tree analysis and random forests for customer analysis and prediction tasks.

Uploaded by

Muskan Sikarwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Study Material for Reference

Study Material for Reference


Artificial Intelligence-An Overview
❑ Artificial intelligence (AI) is the theory and development of
computer systems capable of performing tasks that historically
required human intelligence, such as recognizing speech, making
decisions, and identifying patterns.
❑ AI is an umbrella term that encompasses a wide variety of
technologies, including machine learning, deep learning, and
natural language processing (NLP).

❑ ChatGPT: Uses large language models (LLMs) to generate text in


response to questions or comments posed to it.
❑ Google Translate: Uses deep learning algorithms to translate text from
one language to another.
❑ Netflix: Uses machine learning algorithms to create personalized
recommendation engines for users based on their previous viewing
history.
❑ Tesla: Uses computer vision to power self-driving features on their
Study Material for Reference
cars.
Strong AI vs Weak AI

Study Material for Reference


How does AI Work?

Study Material for Reference


What is Machine Learning?
• A subset of artificial intelligence (AI) and computer science, machine learning
(ML) deals with the study and use of data and algorithms that mimic how
humans learn. This helps machines gradually improve their accuracy.
• ML allows software applications to improve their prediction accuracy without
being specifically programmed to do so. It estimates new output values by using
historical data as input.

Machine learning makes it possible to


discover patterns in supply chain data by
relying on algorithms that quickly pinpoint
the most influential factors to a supply
networks’ success, while constantly
learning in the process.

Study Material for Reference


Relationship between AI, ML and DL

Study Material for Reference


Relationship between AI, ML and DL

Study Material for Reference


Machine Learning Models

Study Material for Reference


Supervised Learning

Study Material for Reference


Machine Learning Models

Study Material for Reference


Supervised Learning-Classification
Classification :
1. Binary Classification Problem
In binary classification, the task involves classifying instances into one of two classes
or categories.
Examples
✓ Spam email detection (classifying emails as spam or not spam).
✓ Medical diagnosis (predicting whether a patient has a particular disease or not).
✓ Customer churn prediction (determining whether a customer will churn or not).
2. Multi-class Classification Problem
In multiclass classification, the task involves classifying instances into one of three
or more classes or categories.
Examples
✓ Species classification in biology (identifying different species of plants).
✓ Sentiment analysis with multiple Study classes (e.g., positive, negative, neutral).
Material for Reference
Examples of Classification Problems
Customer Segmentation:
Classes: Different segments or groups of customers (e.g., high-value customers, occasional
buyers, price-sensitive customers).
Features: Demographic information (age, gender, location), behavioral data (purchase
frequency, average order value), psychographic data (lifestyle, interests), and transaction
history.
Churn Prediction:
Classes: Churners (customers who leave) vs. Non-churners (customers who stay).
Features: Customer demographics, usage patterns (frequency of interaction with the
product/service), tenure (length of time as a customer), customer service interactions, and
recent activity (e.g., decreased usage).
Credit Scoring:
Classes: Creditworthy vs. Non-creditworthy applicants.
Features: Credit history (credit score, payment history), financial information (income, debt-to-
income ratio), employment status, length of credit history, and other relevant factors such as
outstanding debts and loan history.
Study Material for Reference
Examples of Classification Problems
Fraud Detection:
Classes: Genuine transactions vs. Fraudulent transactions.
Features: Transaction amount, location, time, frequency, deviations from typical behavior, IP
address, device information, and other contextual data.

Sentiment Analysis:
Classes: Positive sentiment, Negative sentiment, Neutral sentiment.
Features: Text data (customer reviews, social media posts), linguistic features (word frequency,
sentiment words, emoticons), metadata (time of posting, user demographics), and context
(product or service being reviewed).
Product Recommendation:
Classes: Recommended products or services for each customer.
Features: Customer behavior (purchase history, browsing history, items added to cart), product
attributes (price, category, brand), similarity measures between products (collaborative
filtering, content-based filtering), and contextual information (seasonality, trends).

Study Material for Reference


Examples of Classification Problems
Fault Diagnosis:
Classes: Types of faults or malfunctions (e.g., Mechanical fault, Electrical fault, Software error).
Features: Sensor data (temperature, pressure, vibration), performance metrics (speed, efficiency),
maintenance logs, environmental conditions, and historical failure data.

Market Segmentation:
Classes: Different market segments (e.g., Urban, Suburban, Rural).
Features: Demographic data (population density, income levels), geographic information
(location, climate), economic indicators (GDP per capita, unemployment rates), consumer
behavior (buying habits, brand preferences), and market size.

Study Material for Reference


Classification Models
1) Decision Trees

2) Random Forests

3) Support Vector Machines

4) Logistic Regression

5) Neural Networks

6) K-Nearest Neighbors (KNN)

7) Naïve Bayes

Study Material for Reference


Supervised Learning- Regression
Regression:
▪ Regression problems involve predicting a continuous numerical value
rather than class labels.
▪ These problems are often focused on estimating or forecasting quantities
that are essential for decision-making.
▪ It investigates the relationship between one or more independent variables
and a dependent variable. It's primary goal is to estimate the strength and
direction of the relationship between these variables.
▪ Regression analysis, in essence, helps us understand how changes in one
variable can affect another.

Study Material for Reference


Examples of Regression Problems
Sales Forecasting:
Target: Future sales volume (units or revenue)..
Features: Historical sales data, Advertising spend, Seasonal effects, Pricing strategies,
Economic indicators (e.g., GDP growth, inflation).
Customer Lifetime Value (CLV) Prediction:
Target: Predicted customer lifetime value (monetary value).
Features: Customer demographics (age, gender, location), Purchase history
(frequency, recency, monetary value), Customer engagement metrics (website visits,
email opens), Marketing campaign responses, Product/service preferences.
Inventory Level Estimation:
Target Variable: Number of units in inventory.
Features: Sales history, lead times, order frequency, shelf life.

Study Material for Reference


Examples of Regression Problems
Price Optimization:
Target: Optimal price point.
Features: Historical pricing and sales data, Competitor pricing, Market demand,
Production and distribution costs, Brand positioning.
Credit Scoring:
Target: Credit score or probability of default.
Features: Financial history (credit card usage, loan repayments), Income and
employment status, Credit utilization ratio, Length of credit history, Number of
recent credit inquiries, Demographic information.
Warehouse Throughput Prediction:
Target Variable: Throughput in units per hour.
Features: Warehouse layout, historical throughput data, order volume.

Study Material for Reference


Regression Models
1) Linear Regression: Simple and interpretable, suitable for problems with linear
relationships.
2) Decision Trees and Random Forests: Effective for capturing non-linear relationships and
handling complex interactions.
3) Support Vector Regression (SVR): Extends SVM to regression problems, suitable for
cases with non-linear patterns.
4) Neural Networks (Deep Learning): Can capture complex relationships in data but may
require a larger dataset.
5) Time Series Models : Especially relevant for problems involving time-dependent data,
such as Long-Short Term Memory

Study Material for Reference


K-Nearest Neighbor(KNN) Algorithm
1) K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category
that is most similar to the available categories.
2) K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data
appears then it can be easily classified into a well suite category by using K- NN algorithm.
3) K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems.
4) K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.
5) It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset
and at the time of classification, it performs an action on the dataset.
6) KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a
category that is much similar to the new data.

Study Material for Reference


Distance Metrics Used in KNN Algorithm
Euclidean Distance

Manhattan Distance

Minkowski Distance

Study Material for Reference


Advantages and Disadvantages
➢ Advantages of the KNN Algorithm
Easy to implement as the complexity of the algorithm is not that high.
Adapts Easily – As per the working of the KNN algorithm it stores all the data in memory storage and hence whenever a new example
or data point is added then the algorithm adjusts itself as per that new example and has its contribution to the future predictions as well.
Few Hyperparameters – The only parameters which are required in the training of a KNN algorithm are the value of k and the choice of
the distance metric which we would like to choose from our evaluation metric.

➢ Disadvantages of the KNN Algorithm


Does not scale – As we have heard about this that the KNN algorithm is also considered a Lazy Algorithm. The main significance of this
term is that this takes lots of computing power as well as data storage. This makes this algorithm both time-consuming and resource
exhausting.
Curse of Dimensionality – There is a term known as the peaking phenomenon according to this the KNN algorithm is affected by the
curse of dimensionality which implies the algorithm faces a hard time classifying the data points properly when the dimensionality is
too high.
Prone to Overfitting – As the algorithm is affected due to the curse of dimensionality it is prone to the problem of overfitting as well.
Hence generally feature selection as well as dimensionality reduction techniques are applied to deal with this problem.

Study Material for Reference


How does KNN Work?
➢ Step-1: Select the number K of the neighbors
➢ Step-2: Calculate the Euclidean distance of K number of neighbors
➢ Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
➢ Step-4: Among these k neighbors, count the number of the data points in each category.
➢ Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.

Study Material for Reference


Support Vector Machine
• The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so
that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine.

• SVMs can handle both linearly separable and non-linearly separable


data. They do this by using different types of kernel functions, such as
the linear kernel, polynomial kernel or radial basis function (RBF)
kernel. These kernels enable SVMs to effectively capture complex
relationships and patterns in the data.
• The kernel function plays a critical role in SVMs, as it makes it possible
to map the data from the original feature space to the kernel space. The
choice of kernel function can have a significant effect on the
performance of the SVM algorithm, and choosing the best kernel
function for a particular problem depends on the characteristics of the
data.
❑ Linear Kernel
❑ Polynomial Kernel
❑ RBF Kernel
❑ Sigmoid Kernel
Study Material for Reference
Support Vector Machine

Study Material for Reference


Decision Tree Analysis

Study Material for Reference


Customer Analysis in Retail using Decision Tree
While the food industry and the global brands continuously extend their offer, the
category managers in retail have all the difficulties to fit the available assortment in
their stores. Indeed the space allocated for a particular category is limited and it can
quickly become congested if no category management is done to avoid the product
proliferation.

By category management, the retailer need to answer to the following main


questions:

1) What is the optimal space allocated for the category?


2) What is the optimal assortment (range or variety) for the category ?
3) How the category should be segmented in the aisle, according to the customer flow ?

4) What is the optimal space allocated by product (facing), and where to display the
products ?
Study Material for Reference
Customer Analysis in Retail using Decision Tree
A retailer wants to understand how the customers are purchasing on the each
product Category. They start to interview their customers and the decision tree can
have many different structures :

1. Brand → Shape → Weight → Type of wheat


2. Weight → Shape → Brand → Type of wheat
3. Type of wheat → Weight → Shape → Brand

For the first option, it means that customers are very loyal to the brand as it is the
first entry’s key for the category. The demand transfer between brands in this case is
very low, so the retailer should focus on top brands and group all the products by
brand.

Well, but what if each customer has a different decision tree?

Study Material for Reference


Customer Analysis in Retail using Decision Tree
It is quite obvious that the customer behavior extracted and can be identified
through the purchase history of each customer. That’s where analytics can help !

Retailer have a secret weapon for it : The loyalty card.


They are able to track each customer purchase history and generate a global pattern
for each category.

• The dendrogram is generated based on similarity coefficient and will be then


analyzed by the category manager to try identifying patterns in the generated
groups.
• He may see that products are grouped by brand, or any another attribute. He can
then extract a decision tree for the category and use it during the category
management process.

Study Material for Reference


Working with Decision Tree

Study Material for Reference


Attribute Selection Measures
Information Gain:

1. Entropy is referred to as the randomness or the impurity in a system.


2. Information gain is the decrease in entropy. Information gain computes the
difference between entropy before the split and average entropy after the split of
the dataset based on given attribute values.

Study Material for Reference


Attribute Selection Measures
Gini Index:

The Entropy and Information Gain method focuses on purity and impurity in a
node. The Gini Index or Impurity measures the probability for a random instance
being misclassified when chosen randomly. The lower the Gini Index, the better the
lower the likelihood of misclassification.

The Gini index has a maximum impurity is 0.5 and maximum purity is 0, whereas
Entropy has a maximum impurity of 1 and maximum purity is 0.

Study Material for Reference


Random Forest

Study Material for Reference


Example of Random Forest
Objective: Predict whether a customer will churn based on various features.
Features: Customer Tenure, Monthly Charges, Contract Type (Month-to-Month, One
Year, Two Years), Internet Service (DSL, Fiber optic, None), Tech Support (Yes/No),
Other possible features like demographics, usage patterns, etc.

1.Tree 1: Predicts Churn: Yes Benefits:


2.Tree 2: Predicts Churn: No • Overfitting
3.Tree 3: Predicts Churn: Yes • Stability and Robustness
4.... • Accuracy
5.Tree 100: Predicts Churn: No
Final Prediction: The majority of trees predict
“Yes,” so the final prediction is that the
customer will churn.

Study Material for Reference

You might also like