0% found this document useful (0 votes)
22 views3 pages

ML

The document provides an overview of various machine learning concepts, including supervised learning, Bayes Theorem, and the differences between classification and regression. It discusses the steps to design a spam detection system, the properties of Support Vector Machines (SVM), and compares different algorithms like Naïve Bayes and Logistic Regression. Additionally, it covers reinforcement learning, decision trees, genetic algorithms, and deep learning architectures.

Uploaded by

ap6789012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views3 pages

ML

The document provides an overview of various machine learning concepts, including supervised learning, Bayes Theorem, and the differences between classification and regression. It discusses the steps to design a spam detection system, the properties of Support Vector Machines (SVM), and compares different algorithms like Naïve Bayes and Logistic Regression. Additionally, it covers reinforcement learning, decision trees, genetic algorithms, and deep learning architectures.

Uploaded by

ap6789012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1. Define supervised learning with an example. 2.

Data Collection: Collect a labeled dataset of emails with


Supervised learning is a type of machine learning where a model learns "spam" or "not spam" labels.
from a labeled dataset — each input is paired with a correct output. The 3. Preprocessing: Clean the text (remove HTML, punctuation),
goal is for the model to generalize from the training data to make accurate tokenize words, remove stop words.
predictions on unseen data. 4. Feature Engineering: Extract features using Bag of Words,
Example: TF-IDF, or embeddings.
Consider predicting house prices. You are given a dataset where each 5. Model Selection: Choose an algorithm like Naïve Bayes or
house has features like square footage, number of rooms, and location, Logistic Regression.
along with the actual selling price. The algorithm learns a mapping from 6. Training: Fit the model to the training data.
features to price and can then predict prices for new houses. 7. Evaluation: Use metrics like accuracy, precision, recall, and
2. What is the Bayes Theorem? Write its formula. F1-score on test data.
Bayes Theorem is a mathematical formula used to determine the 8. Deployment: Integrate the model into an email system.
probability of a hypothesis given some observed evidence. It updates the 9. Monitoring & Updates: Track performance and retrain
probability of a hypothesis as more evidence or information becomes periodically with new data.
available. 8. What are the properties of SVM? Discuss max-margin
Formula: intuition.
P(A∣B)=P(B∣A)⋅P(A)P(B) Properties of SVM:
Where:  Effective in high-dimensional spaces.
 P(A∣B)P(A|B): Posterior probability (probability of A given B)  Can handle linear and non-linear data using kernel functions.
 P(B∣A)P(B|A): Likelihood (probability of B given A)  Robust to overfitting, especially in high dimensions.
 P(A)P(A): Prior probability of A  Only relies on support vectors (data points closest to the
 P(B)P(B): Probability of B hyperplane).
This theorem is fundamental in probabilistic reasoning, including the Max-Margin Intuition:
Naïve Bayes algorithm. SVM aims to find the hyperplane that maximizes the margin, i.e., the
3. Differentiate between classification and regression. distance between itself and the nearest data points from each class. A
Feature Classification Regression larger margin implies better generalization to unseen data.
9. Define Reinforcement Learning with an example.
Output Discrete categories (e.g., Continuous values (e.g.,
Reinforcement Learning (RL) is a learning paradigm where an agent
Type spam/not spam) temperature, price)
learns to make decisions by interacting with an environment. The agent
Goal Predict class labels Predict numeric values receives rewards or penalties for its actions and learns to maximize
Accuracy, precision, recall, cumulative reward.
Evaluation MSE, RMSE, R² Example:
F1-score
Decision Trees, SVM, Naïve Linear Regression, SVR, A robot learning to navigate a maze.
Algorithms  The robot (agent) tries different paths (actions).
Bayes Ridge Regression
Classifying if an email is Predicting the price of a car  Each move leads to a new position (state).
Example  It gets a reward for reaching the exit and penalties for hitting
spam or not based on features
walls.
4. What does the term "hyperplane" mean in SVM? Over time, it learns the optimal path by trial and error.
In Support Vector Machines (SVMs), a hyperplane is a decision
boundary that separates different classes of data.
12. Define the three components: Task (T), Performance
Measure (P), Experience (E). Illustrate with an example.
 In 2D space, it’s a line.
In machine learning, the learning problem is described using:
 In 3D space, it’s a plane.
 Task (T): What the system is trying to do.
 In higher dimensions, it’s a hyperplane.
SVM selects the hyperplane that best separates the classes by maximizing
 Performance Measure (P): How success is measured.
the margin — the distance between the hyperplane and the nearest data  Experience (E): The data the system learns from.
points from each class (support vectors). Example:
5. Compare decision trees and Bayesian networks in terms of Building a spam detection model:
interpretability.  T: Classify emails into spam and not spam.
Aspect Decision Trees Bayesian Networks
 P: Accuracy, F1-score.
 E: A labeled dataset of emails with known classifications.
Very high — decisions Moderate — requires
Interpretability
can be traced as rules understanding of probabilities
13. Compare Naïve Bayes and Logistic Regression for binary
classification tasks.
Easy to visualize as a Complex — visualized as a
Visualization Feature Naïve Bayes Logistic Regression
tree structure probabilistic graph
Requires knowledge of Features are conditionally Linear relationship between
User Friendly Suitable for non-experts Assumptions
probability theory independent features and log-odds
Customer churn Medical diagnosis under Type Probabilistic classifier Discriminative classifier
Example Use
Case prediction uncertainty Slower due to iterative
Training Speed Very fast
Decision trees are more interpretable because you can follow a path from optimization
root to leaf to see how a decision was made. Bayesian networks model Works well with small Better with larger, more
Performance
conditional dependencies, which can be less intuitive. datasets and text data complex datasets
6. How does logistic regression work for binary classification? Struggles if features are Handles correlated features
Logistic Regression models the probability that a data point belongs to a Robustness
correlated better
particular class. It uses the sigmoid (logistic) function to squash the Easy to interpret Also interpretable through
output of a linear equation between 0 and 1. Interpretability
probabilities coefficients
Formula:
P(y=1∣x)=1/1+e−(wTx+b) Summary:
If P(y=1∣x)>0.5P(y=1|x) > 0.5, classify as class 1; otherwise class 0.  Use Naïve Bayes when speed is crucial and feature
 It’s trained by optimizing the log-loss (cross-entropy) function. independence holds.
 It assumes linearity in the log-odds space.  Use Logistic Regression when features are correlated or more
complex relationships need modeling.
7. Describe the steps to design a learning system for a spam
a. Define Entropy in Decision Trees.
detector.
1. Problem Definition: Classify emails as spam or not.
Entropy is a metric used in decision trees (especially ID3) to measure the  Tanh: Used for values between -1 and 1
amount of uncertainty or impurity in a dataset. It helps determine the best Without activation functions, CNNs would behave like linear regressors
attribute to split the data at each node. and not learn complex features.
 If all examples belong to the same class, entropy is 0 (pure). g. Describe Q-Learning with Pseudo Code.
 If the examples are evenly mixed, entropy is 1 (maximum Q-Learning is a reinforcement learning algorithm that learns the value
impurity). (Q-value) of taking a certain action in a given state.
Mathematically: Update Rule:
Entropy(S)=−∑i=1pilog2pi Q(s,a)=Q(s,a)+α[r+γmax a′Q(s′,a′)−Q(s,a)]
Where pip_i is the probability of class ii in dataset SS. Where:
 α\alpha: learning rate
Entropy is used in conjunction with Information Gain to choose the best  γ\gamma: discount factor
attribute for splitting the data.  rr: reward
b. What is Perceptron? Explain.
 s,as, a: current state and action
A Perceptron is a type of artificial neuron and a fundamental building
block of neural networks. It is used for binary classification tasks and is  s′s': next state
inspired by the human brain. Pseudo Code:
Initialize Q(s, a) arbitrarily
Components: For each episode:
 Inputs (x₁, x₂, ..., xₙ) Initialize state s
Repeat:
 Weights (w₁, w₂, ..., wₙ) Choose a from s using ε-greedy
 Bias (b) Take action a, observe r and s'
Q(s, a) ← Q(s, a) + α [r + γ * max Q(s', a') - Q(s, a)]
 Activation function (typically step or sign function) s ← s'
Function: until s is terminal
y=activation(∑i=1wixi+b) h. What are the Components of Genetic Algorithms?
If the output is greater than 0, it predicts class 1; otherwise class 0. 1. Population – Set of potential solutions (chromosomes).
Limitation: It can only solve linearly separable problems. It was later 2. Selection – Selects the fittest individuals to be parents.
extended to Multilayer Perceptrons to handle complex problems. 3. Crossover (Recombination) – Combines two parents to
c. Define Markov Decision Process (MDP). produce offspring.
An MDP provides a mathematical framework for modeling decision- 4. Mutation – Introduces random variation.
making where outcomes are partly random and partly under the control of 5. Fitness Function – Evaluates how good a solution is.
a decision-maker. 6. Termination Condition – When to stop (e.g., after N
Components: generations).
1. States (S) – All possible situations the agent can be in. These components mimic natural evolution to find optimal or near-
2. Actions (A) – All actions available to the agent. optimal solutions.
3. Transition Model (P) – Probability of moving to a new state i. What is Inductive Inference in Decision Trees?
s′s' given state ss and action aa: P(s′∣s,a)P(s'|s, a) Inductive Inference refers to the process of learning general rules from
4. Reward Function (R) – Immediate reward received after specific examples. In decision trees, it involves generalizing from a
transitioning. training dataset to produce a tree that can classify new, unseen data
5. Discount Factor (γ) – Degree of importance of future rewards accurately.
(0 ≤ γ ≤ 1). This is the core principle behind machine learning: learning patterns that
MDPs are the basis for Reinforcement Learning. generalize well.
d. What is Genetic Mutation in GA? j. Explain Structure and Working of Multilayer Perceptron.
In Genetic Algorithms (GA), mutation is a genetic operator used to Multilayer Perceptron (MLP) is a type of feedforward neural network
maintain diversity in the population and prevent premature convergence. that consists of:
Mutation:  Input layer – Accepts the features.
 Randomly alters one or more genes (bits) in a chromosome.  Hidden layers – One or more layers where learning happens.
 Helps explore new solutions that may not be reachable through  Output layer – Produces the final prediction.
crossover alone. Each layer uses weights, biases, and activation functions (ReLU, Sigmoid,
Example: etc.).
Chromosome: 101101 Working:
Mutation (flip bit 3): 100101 1. Inputs are fed forward layer by layer.
Mutation ensures the GA does not get stuck in local optima and promotes 2. The weighted sum and activation functions compute neuron
exploration. outputs.
e. Explain ID3 Algorithm with Example. 3. The error is calculated at the output.
ID3 (Iterative Dichotomiser 3) is a decision tree algorithm developed by 4. Backpropagation is used to update weights by propagating the
Ross Quinlan that uses Information Gain to split data. error backward.
Steps: MLPs can solve non-linear problems and are the foundation of deep
1. Calculate entropy of the dataset. learning.
2. For each attribute, calculate the Information Gain: k. Explain Reinforcement Learning with Key Components and
IG(S,A)=Entropy(S)−∑v∈Values(A)∣Sv∣∣S∣Entropy(Sv) Example.
Choose the attribute with the highest gain. Reinforcement Learning (RL) is a feedback-based learning approach
3. Recursively apply to each subset. where an agent learns to make decisions by interacting with an
Example: environment.
Dataset: Decide "Play Tennis" based on Outlook, Temperature, etc. Key Components:
Outlook = {Sunny, Overcast, Rain}  Agent: Learner and decision-maker.
Entropy is computed, and the attribute with the highest information gain
(e.g., Outlook) is chosen for the root.
 Environment: The world the agent interacts with.
f. What is the Role of Activation Function in CNN?  States (S): Configurations of the environment.
In a Convolutional Neural Network (CNN), the activation function  Actions (A): Choices available to the agent.
introduces non-linearity into the system, enabling it to model complex  Rewards (R): Feedback from the environment.
patterns like edges, textures, and shapes.  Policy (π): Mapping from states to actions.
Common Activation Functions:  Value Function (V): Expected return from a state.
 ReLU (Rectified Linear Unit): f(x)=max (0,x)f(x) = \max(0, Example:
x) A robot learning to walk:
 Sigmoid: Used for binary classification  State: Its current balance and position.
 Action: Move left, right, or forward.
 Reward: +1 for moving correctly, -1 for falling.
 Over time, it learns to walk by maximizing cumulative reward.
l. Explain Deep Learning Architectures.
Deep Learning Architectures are composed of multiple layers that
transform data through learnable parameters.
Types:
1. Feedforward Networks (MLP): Basic architecture, good for
structured data.
2. Convolutional Neural Networks (CNN): For image
processing, uses filters to detect features.
3. Recurrent Neural Networks (RNN): For sequential data like
time series or language.
4. Long Short-Term Memory (LSTM): Variant of RNN for
long-term dependencies.
5. Autoencoders: For feature learning and dimensionality
reduction.
6. GANs (Generative Adversarial Networks): Generate new
data via a generator and discriminator.
7. Transformers: Used in NLP, based on self-attention
mechanisms (e.g., BERT, GPT).
Each architecture is suited to different kinds of data and tasks, and many
can be combined in hybrid systems.

You might also like