0% found this document useful (0 votes)

24 views22 pages

ML Imp Ques 1

The document provides an overview of key concepts in machine learning, including definitions, applications, and principles such as Occam's Razor. It discusses various machine learning techniques, including linear regression, Bayesian networks, and dimensionality reduction methods like PCA and LDA. Additionally, it covers classification methods like Naive Bayes and the Apriori algorithm for association rule mining.

Uploaded by

Kiruthika GS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views22 pages

ML Imp Ques 1

Uploaded by

Kiruthika GS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

ML IMP QUES

UNIT 1

1.DEFINE MACHINE LEARNING AND GIVE 3 APPLICATIONS

An application of ai that provides a system ability to automatically learn and improve from
experience without being explicitly programmed.

Applications are

➢ Computational finance
➢ Image processing and computer vision
➢ Tumor detection, drug discovery, and DNA sequencing
➢ Energy production, for price and load forecasting
➢ Natural language processing

2.GIVE OCCUM’S RAZOR PRINCIPLE

➢ Occam’s Razor states that "Among competing hypotheses, the one with the fewest
assumptions should be preferred." This principle emphasizes simplicity in reasoning,
suggesting that unnecessary complexities should be avoided when explaining a
phenomenon.
➢ It is widely used in science, philosophy, and problem-solving to choose the most logical and
straightforward explanation.

3.DIFFERENTIATE BIAS AND VARIANCE

4.WHAT IS NOISE IN MACHINE LEARNING DATA?

In machine learning, noise refers to irrelevant or random variations in data that do not represent the
actual patterns or relationships. It can be caused by errors in data collection, human mistakes, or
inherent randomness in real-world data.

Types of Noise:

➢ Label Noise – Incorrectly labelled data points.

➢ Feature Noise – Errors in input features (e.g., sensor inaccuracies).

Effects of Noise:

➢ Reduces model accuracy.

➢ Makes predictions less reliable.
5.WHAT IS LEARNING CLASS BY EXAMPLE

Learning Class by Example is a machine learning approach where a model learns to classify data
based on labelled examples. Instead of relying on predefined rules, the system generalizes patterns
from given training examples to classify new, unseen data.

Example:

A spam filter learns to classify emails as spam or not spam by analysing a dataset of previously
labelled emails.

This method is widely used in classification tasks like image recognition, fraud detection, and
medical diagnosis.

12M

1.EXPLAIN THE 4 TYPES OF MACHINE LEARNING

2.EXPLAIN LINER REGRESSION WITH AN EXAMPLE PROBLEM

Linear Regression is a statistical and machine learning technique used to model the relationship
between a dependent variable (Y) and one or more independent variables (X) using a straight-line
equation.

Equation of Linear Regression:

For simple linear regression (one independent variable), the equation is:

Y=mX+c

Where:

Y = Predicted output (dependent variable)

X = Input feature (independent variable)

m = Slope (coefficient)

c = Intercept (constant)

For multiple linear regression (multiple independent variables), the equation extends to:

Y=b0+b1X1+b2X2+...+bnXn

Working of Linear Regression:

Finds the Best Fit Line – Minimizes the difference between predicted and actual values.

Uses Least Squares Method – Reduces error by minimizing the sum of squared differences between
actual and predicted values.

Predicts Continuous Values – Used for tasks like price prediction, sales forecasting, and trend
analysis.
Conclusion

From the given sales data (2005–2009), we derived the least squares regression equation:

y=8.4x−16830.4y

Using this equation, we predicted the company's sales in 2012 to be 70.4 million dollars.
3.EXPLAIN THE LEARNING PROCESS OF MACHINE LEARNING

Machine Learning (ML) is a process where computers learn patterns from data to make predictions
or decisions without being explicitly programmed. The learning process involves multiple stages,
from data collection to model evaluation, ensuring accurate and meaningful insights.

This process typically consists of the following steps:

➢ Data Collection: Gathering raw data from real-world sources.

➢ Pre-processing: Cleaning and transforming data for better quality.
➢ Dimensionality Reduction: Selecting the most relevant features to improve efficiency.
➢ Model Learning: Training machine learning models using algorithms.
➢ Model Testing: Evaluating performance using validation techniques.
➢ Analysis Results: Using insights for decision-making.

1. Measuring Devices (Data Collection)

➢ The process starts with gathering real-world data using:

Sensors (e.g., temperature sensors, IoT devices).
Cameras (e.g., image and video data collection).
Databases (e.g., structured and unstructured data storage).
➢ This step provides raw data for processing.

2. Pre-processing

➢ Why it’s needed? Raw data may contain noise, missing values, or irrelevant information.

Key tasks:

➢ Noise filtering: Removing outliers and incorrect data points.

➢ Feature extraction: Selecting important features from raw data.
➢ Normalization: Scaling data to a consistent range for better model performance.
3. Dimensionality Reduction

➢ Why it’s needed? High-dimensional data can make models complex and slow.

Key techniques:

➢ Feature selection: Choosing only the most important features.

➢ Feature projection: Transforming data into a lower-dimensional space (e.g., PCA – Principal
Component Analysis).

4. Model Learning (Training)

➢ This step involves selecting and training a machine learning model.

Common learning techniques include:

➢ Supervised Learning (Classification, Regression).

➢ Unsupervised Learning (Clustering, Anomaly Detection).
➢ Reinforcement Learning (Decision-making and AI agents).
➢ The model adjusts its parameters based on the training data.

5. Model Testing (Evaluation)

➢ Purpose: To check the model’s accuracy and generalization.

Key techniques:

➢ Cross-validation: Splitting data into training and testing sets to evaluate model performance.
➢ Bootstrap: Random sampling to estimate model accuracy.
➢ The model’s performance is assessed using:
➢ Classification Metrics (e.g., accuracy, precision, recall).
➢ Regression Metrics (e.g., RMSE, MAE).
➢ Clustering Metrics (e.g., silhouette score).

6. Analysis Results

➢ The final step involves interpreting model results and making data-driven decisions.
➢ The trained model can now be deployed for real-world applications.
UNIT 2

1.STATE BAYES THEORM AND BAYESIAN NETWORK

Bayes' Theorem

➢ Bayes' Theorem is a fundamental concept in probability theory and machine learning. It

describes how to update the probability of an event based on new evidence.

Example:
In spam detection, Bayes' Theorem helps determine the probability that an email is spam given that
it contains certain words.

Bayesian Network (Bayes Net)

➢ A Bayesian Network is a graphical model that represents probabilistic relationships among a

set of variables using Directed Acyclic Graphs (DAGs).

Key Features:

➢ Nodes represent random variables (e.g., symptoms, diseases).

➢ Edges represent conditional dependencies between variables.
➢ Each node has a Conditional Probability Table (CPT) defining how it depends on its parent
nodes.

Example:

In a medical diagnosis system, a Bayesian Network can represent the relationships between diseases
and symptoms.

Application Areas:

➢ Medical Diagnosis
➢ Spam Filtering
➢ Fraud Detection
➢ Speech Recognition
2.STATE 0/1 LOSS FUNCTION

The 0/1 loss function is a simple loss function used in classification problems to measure the
accuracy of predictions. It assigns a loss of 0 for correct predictions and 1 for incorrect predictions.

Explanation:

➢ If the predicted value matches the actual label, the loss is 0 (no penalty).
➢ If the predicted value does not match the actual label, the loss is 1 (full penalty).

3.HOW TO MEASURE SUPPORT COUNT AND CONFIDENCE

4.GIVE FEW DIMENSION REDUCTION TECHNIQUES

Dimensionality reduction helps in reducing the number of features while preserving essential
information. It improves model performance, reduces overfitting, and speeds up computations.

1. Principal Component Analysis (PCA)

➢ Introduction: PCA transforms high-dimensional data into a lower-dimensional space by

selecting the most important features (principal components).
➢ Applications: Used in image compression, finance, and genetics.
➢ Example: Reducing a 1000-feature dataset to 10 principal components while retaining 90%
variance.

2. Linear Discriminant Analysis (LDA)

➢ Introduction: LDA reduces dimensionality while maintaining class separability by maximizing

the difference between multiple classes.
➢ Applications: Used in face recognition, text classification, and medical diagnosis.
➢ Example: LDA can reduce a 10-feature dataset into 2 dimensions while keeping different
class labels distinct.

3. t-Distributed Stochastic Neighbour Embedding (t-SNE)

➢ Introduction: A non-linear technique that visualizes high-dimensional data by mapping it to

2D or 3D space.
➢ Applications: Used in data visualization for clustering and anomaly detection.
➢ Example: t-SNE is widely used in gene expression analysis to group similar genes together.

4. Autoencoders (Deep Learning-Based)

➢ Introduction: A type of neural network that learns a compressed representation of input

data.
➢ Applications: Used in image denoising, anomaly detection, and speech recognition.
➢ Example: Compressing high-resolution images into low-dimensional representations for
faster processing.

5. Feature Selection (Filter, Wrapper, Embedded Methods)

➢ Introduction: Selecting only the most relevant features based on statistical tests, model-
based selection, or domain knowledge.
➢ Applications: Used in healthcare, fraud detection, and finance.
➢ Example: In a medical dataset, removing irrelevant patient attributes (e.g., name, address)
to keep only significant factors like blood pressure and sugar level.
12M

1.EXPLAIN NAIVE BASIEN WITH A PROBLEM

Consider the dataset. Predict for the given X value whether the customer can purchase a computer
or not using Bayes Classification.

Given input:
X = (Age = Youth, Income = Medium, Student = Yes, Credit_Rating = Fair)

Given:

A dataset from AllElectronics Customer Database with 14 records (tuples).

Contains 5 attributes:

➢ Age (youth, middle-aged, senior)

➢ Income (high, medium, low)
➢ Student (yes, no)
➢ Credit Rating (fair, excellent)
➢ Class Label: Buys Computer (yes, no)

The dataset is used for classification, where the goal is to predict if a customer will buy a computer
based on their attributes.

SOLN:
2.EXPLAIN APRIORI ALG WITH A PROBLEM

APRIORI PRUNING PRINCIPLE

If there is any itemset which is infrequent, its superset should not be generated/tested.

ALGORITHM

➢ Initially, scan DB once to get frequent 1-itemset

➢ Generate length (k+1) candidate itemsets from length k frequent itemsets
➢ Test the candidates against DB
➢ Terminate when no frequent or candidate set can be generated

PSEUDO-CODE

Ck: Candidate itemset of size k

Lk: frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk! = NULL; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return Uk Lk;
Minimum support count is 2 minimum confidence is 60%

STEP 1 K=1

(I) Create a table containing support count of each item present in dataset – Called C1(candidate set)

(II) Compare candidate set item’s support count with minimum support count (if support_count of is
less than min_support then remove those items). This gives us itemset L1.

STEP 2 K=2

(I)Generate candidate set C2 using L1 (this is called join step). Condition of joining Lk-1 and Lk-1 is
that it should have (K-2) elements in common.

➢ Check all subsets of an itemset are frequent or not and if not frequent remove that itemset.
(Example subset of {I1, I2} are {I1}, {I2} they are frequent. Check for each itemset)
➢ Now find support count of these itemsets by searching in dataset.
(II) compare candidate (C2) support count with minimum support count (here min_support=2 if
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L2.

STEP 3

(I)Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and Lk-1 is that it should
have (K-2) elements in common.

➢ Check if all subsets of these itemsets are frequent or not and if not, then remove that
itemset
➢ find support count of these remaining itemset by searching in dataset.

(II) Compare candidate (C3) support count with minimum support count (here min_support=2 if
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L3.

STEP 4

Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and Lk-1 (K=4) is that, they
should have (K-2) elements in common.

Check all subsets of these itemsets are frequent or not (Here itemset formed by joining L3 is {I1, I2,
I3, I5} so its subset contains {I1, I3, I5}, which is not frequent). So, no itemset in C4

We stop here because no frequent itemsets are found further.

ASSOCIATION RULE GENERATION

A confidence of 60% means that 60% of the customers, who purchased milk and bread also bought
butter.

Confidence(A->B) =Support_count(A∪B)/Support_count(A)

Itemset {I1, I2, I3}

So, rules can be

[I1Î2] =>[I3] //confidence = sup(I1Î2Î3)/sup(I1Î2) = 2/4*100=50%

[I1Î3] =>[I2] //confidence = sup(I1Î2Î3)/sup(I1Î3) = 2/4*100=50%
[I2Î3] =>[I1] //confidence = sup(I1Î2Î3)/sup(I2Î3) = 2/4*100=50%
[I1] =>[I2Î3] //confidence = sup(I1Î2Î3)/sup(I1) = 2/6*100=33%
[I2] =>[I1Î3] //confidence = sup(I1Î2Î3)/sup(I2) = 2/7*100=28%
[I3] =>[I1Î2] //confidence = sup(I1Î2Î3)/sup(I3) = 2/6*100=33%

Itemset {I1, I2, I5}

So, rules can be

[I1Î2] =>[I5] //confidence = sup(I1Î2Î5)/sup(I1Î2) = 2/4*100=50%

[I1Î5] =>[I2] //confidence = sup(I1Î2Î5)/sup(I1Î5) = 2/2*100=100%
[I2Î5] =>[I1] //confidence = sup(I1Î2Î5)/sup(I2Î5) = 2/2*100=100%
[I1] =>[I2Î5] //confidence = sup(I1Î2Î5)/sup(I1) = 2/6*100=33%
[I2] =>[I1Î5] //confidence = sup(I1Î2Î5)/sup(I2) = 2/7*100=28%
[I5] =>[I1Î2] //confidence = sup(I1Î2Î5)/sup(I5) =2/2*100=100%

Itemset {I1, I2, I3}:

➢ The highest confidence observed is 50%, which is below the common threshold.
➢ No strong association rules can be extracted from this itemset.

Itemset {I1, I2, I5}:

➢ Three rules have 100% confidence, making them strong association rules:
➢ [I1Î5] → [I2] (100%)
➢ [I2Î5] → [I1] (100%)
➢ [I5] → [I1Î2] (100%) These rules indicate a strong relationship among the items.

PROPERTIES

➢ Monotonicity: If an itemset is frequent, all its subsets are frequent.

➢ Anti-Monotonicity: If an itemset is infrequent, all its supersets are infrequent.

ADVANTAGES

➢ Easy to understand and implement.

➢ Can handle large datasets.

DISADVANTAGES

➢ Can be expensive for large datasets.

➢ Generates many unnecessary candidates.
3.DESCRIBE DIMENSIONALITY REDUCTION

Dimensionality Reduction

Dimensionality reduction is a technique used in machine learning and data science to reduce the
number of features (variables) in a dataset while preserving important information. High-
dimensional data can lead to increased computation time, overfitting, and difficulties in
visualization, making dimensionality reduction essential.

Why is Dimensionality Reduction Important?

➢ Improves Model Performance – Reduces overfitting by eliminating redundant and irrelevant

features.
➢ Enhances Computational Efficiency – Less data means faster training and prediction times.
➢ Better Data Visualization – High-dimensional data is challenging to visualize; reducing
dimensions allows for graphical representation in 2D or 3D.
➢ Removes Noise – Helps eliminate less useful or noisy features, improving model
generalization.

Types of Dimensionality Reduction Techniques

1.Principal Component Analysis (PCA)

In pattern recognition, Dimension Reduction is defined as-

➢ It is a process of converting a data set having vast dimensions into a data set with lesser
dimensions.
➢ It ensures that the converted data set conveys similar information concisely.

Example-

Consider the following example

➢ The following graph shows two dimensions x1 and x2.

➢ x1 represents the measurement of several objects in cm.
➢ x2 represents the measurement of several objects in inches.

Using dimension reduction techniques-

➢ We convert the dimensions of data from 2 dimensions (x1 and x2) to 1 dimension (z1).
➢ It makes the data relatively easier to explain.
Dimension reduction offers several benefits such as-

➢ It compresses the data and thus reduces the storage space requirements.
➢ It reduces the time required for computation since less dimensions require less
computation.
➢ It eliminates the redundant features.
➢ It improves the model performance.

ALG

➢ Step-01: Get data.

➢ Step-02: Compute the mean vector (µ).
➢ Step-03: Subtract mean from the given data.
➢ Step-04: Calculate the covariance matrix.
➢ Step-05: Calculate the Eigen vectors and Eigen values of the covariance matrix.
➢ Step-06: Choosing components and forming a feature vector.
➢ Step-07: Deriving the new data set.

2.Linear Discriminant Analysis (LDA)

Logistic regression is a classification algorithm traditionally limited to only two-class classification

problems.

➢ If you have more than two classes then Linear Discriminant Analysis is the preferred linear
classification technique.

Limitations of Logistic Regression

➢ Logistic regression is a simple and powerful linear classification algorithm.

➢ Two-Class Problems. Logistic regression is intended for two-class or binary classification
problems. It can be extended for multi-class classification, but is rarely used for this purpose.
➢ Unstable with Well Separated Classes. Logistic regression can become unstable when the
classes are well separated.
➢ Unstable with Few Examples. Logistic regression can become unstable when there are few
examples from which to estimate the parameters.

Example:

➢ Suppose we have two sets of data points belonging to two different classes that we want to
classify.
➢ As shown in the given 2D graph, when the data points are plotted on the 2D plane, there’s
no straight line that can separate the two classes of the data points completely.
➢ Hence, in this case, LDA (Linear Discriminant Analysis) is used which reduces the 2D graph
into a 1D graph in order to maximize the separability between the two classes.
ALG

➢ Calculate Class Means: Compute the average vectors for each class in your data.
➢ Compute Scatter Matrices: Determine the between-class and within-class scatter matrices.
➢ Eigenvalue/Eigenvector Calculation: Find the eigenvectors and corresponding eigenvalues
for the scatter matrices.
➢ Sort and Select Eigenvectors: Sort eigenvectors by decreasing eigenvalues and pick the top k
eigenvectors.
➢ Transform Data: Use the selected k eigenvectors to form a matrix, and transform your data
into the new lower-dimensional space using matrix multiplication.

Applications of Dimensionality Reduction

➢ Machine Learning & Deep Learning

➢ Computer Vision
➢ NLP
➢ Bioinformatics & Genetics

Advantages of Dimensionality Reduction:

➢ Improves Model Performance

➢ Reduces Overfitting
➢ Storage Efficiency

Disadvantages of Dimensionality Reduction:

➢ Loss of Information
➢ Computationally Expensive
➢ Risk of Over-Simplification

Intro to Machine Learning Concepts
100% (1)
Intro to Machine Learning Concepts
58 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
ML Mdu 2024 10939237
No ratings yet
ML Mdu 2024 10939237
20 pages
ML Exam
No ratings yet
ML Exam
32 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
ML (Theory)
No ratings yet
ML (Theory)
11 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
ML - Part - A
No ratings yet
ML - Part - A
10 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
39 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
Machine Learning QB
No ratings yet
Machine Learning QB
15 pages
Ass Bigd
No ratings yet
Ass Bigd
9 pages
SDL Unit 1
No ratings yet
SDL Unit 1
7 pages
ML Important
No ratings yet
ML Important
8 pages
Ai Notes
No ratings yet
Ai Notes
8 pages
Advance ML - Unit 1
No ratings yet
Advance ML - Unit 1
12 pages
Module 3
No ratings yet
Module 3
63 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Final ML
No ratings yet
Final ML
2 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
8 pages
Regression Bayesian SVM Notes
No ratings yet
Regression Bayesian SVM Notes
6 pages
Machine Learning Basics Guide
100% (1)
Machine Learning Basics Guide
124 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
No ratings yet
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
145 pages
OTML Module1 Completed
No ratings yet
OTML Module1 Completed
185 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Data - Analytics - Chapter 2
No ratings yet
Data - Analytics - Chapter 2
58 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Machine Learning (R20a0518)
No ratings yet
Machine Learning (R20a0518)
87 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Lec05 - Supervised
No ratings yet
Lec05 - Supervised
26 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
MLSC Final Notes
No ratings yet
MLSC Final Notes
24 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
275 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
MLE
No ratings yet
MLE
15 pages
Machine Lar Arii
No ratings yet
Machine Lar Arii
9 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
Machine Learning
No ratings yet
Machine Learning
95 pages
Fam QB Ans
No ratings yet
Fam QB Ans
9 pages
Unit 1
No ratings yet
Unit 1
38 pages
ML Unit 3
No ratings yet
ML Unit 3
10 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Machine 2023 Part 1
No ratings yet
Machine 2023 Part 1
4 pages
Unit-Iv-Networking Tools and Techniques
No ratings yet
Unit-Iv-Networking Tools and Techniques
74 pages
Network Security
No ratings yet
Network Security
5 pages
Ooase Lab 2024
No ratings yet
Ooase Lab 2024
18 pages
Ethernet
No ratings yet
Ethernet
3 pages
Compiler Design Essentials
No ratings yet
Compiler Design Essentials
18 pages
The Clock of Life
No ratings yet
The Clock of Life
10 pages
User's Manual: Isolated, RS232 / RS485 To USB Converter. Model SC11U
No ratings yet
User's Manual: Isolated, RS232 / RS485 To USB Converter. Model SC11U
5 pages
G4 - LR - 1G - 4.2.3 A World Tour of Cultures
No ratings yet
G4 - LR - 1G - 4.2.3 A World Tour of Cultures
10 pages
Key 1
No ratings yet
Key 1
3 pages
Predicting Employees Performance Using Data Mining Techniques
No ratings yet
Predicting Employees Performance Using Data Mining Techniques
12 pages
BN 4577
No ratings yet
BN 4577
1 page
The Second Law of Thermodynamics (Continued) (Lecture 05) : - Prashant Uday Manohar
No ratings yet
The Second Law of Thermodynamics (Continued) (Lecture 05) : - Prashant Uday Manohar
28 pages
Easy Cornbread Recipe - Moist, Fluffy Homemade Cornbread
No ratings yet
Easy Cornbread Recipe - Moist, Fluffy Homemade Cornbread
2 pages
TB 20230615 Upgrade Instructions 1.8.7
No ratings yet
TB 20230615 Upgrade Instructions 1.8.7
4 pages
Electrical Circuits
No ratings yet
Electrical Circuits
30 pages
Cox Mill Elementary September News
No ratings yet
Cox Mill Elementary September News
7 pages
The Role of Vitamins in Human Physiology
No ratings yet
The Role of Vitamins in Human Physiology
22 pages
A Wadood CV
No ratings yet
A Wadood CV
3 pages
EB Neuro - STM9000 Products Line PDF
100% (1)
EB Neuro - STM9000 Products Line PDF
36 pages
Effects of PH High Pressure Processing and Ultraviolet Light On Carotenoids Chlorophylls and
No ratings yet
Effects of PH High Pressure Processing and Ultraviolet Light On Carotenoids Chlorophylls and
12 pages
III. B2 Q & A 1-7 EnP Strategies and Implementation
No ratings yet
III. B2 Q & A 1-7 EnP Strategies and Implementation
16 pages
Long Quiz On Internal Auditing
No ratings yet
Long Quiz On Internal Auditing
4 pages
TAX LG Taxation Preferential Taxation Sample Problems
No ratings yet
TAX LG Taxation Preferential Taxation Sample Problems
3 pages
Sick Verse
100% (3)
Sick Verse
503 pages
J of App Behav Analysis - 2018 - Vessells - Effects of Delay Fading and Signals On Self Control Choices by Children
No ratings yet
J of App Behav Analysis - 2018 - Vessells - Effects of Delay Fading and Signals On Self Control Choices by Children
8 pages
Cbse Class8 Math Question Paper
No ratings yet
Cbse Class8 Math Question Paper
1 page
QA Section2
No ratings yet
QA Section2
6 pages
Understanding Discourse Analysis 1st Edition Vine 2024 Scribd Download
100% (6)
Understanding Discourse Analysis 1st Edition Vine 2024 Scribd Download
71 pages
Physics Lab: Gyroscope Experiment
No ratings yet
Physics Lab: Gyroscope Experiment
4 pages
Understanding Drug Abuse and Addiction
No ratings yet
Understanding Drug Abuse and Addiction
2 pages
PARASITOLOGY 3rd Ed
No ratings yet
PARASITOLOGY 3rd Ed
520 pages
Gas Engine Lubricant Guide
No ratings yet
Gas Engine Lubricant Guide
2 pages
The Blue Collar Theoretically - John F Lavelle
No ratings yet
The Blue Collar Theoretically - John F Lavelle
288 pages
Halton M.A.R.V.E.L.Brochure BR-002
No ratings yet
Halton M.A.R.V.E.L.Brochure BR-002
16 pages

ML Imp Ques 1

Uploaded by

ML Imp Ques 1

Uploaded by

ML IMP QUES

1.DEFINE MACHINE LEARNING AND GIVE 3 APPLICATIONS

2.GIVE OCCUM’S RAZOR PRINCIPLE

3.DIFFERENTIATE BIAS AND VARIANCE

4.WHAT IS NOISE IN MACHINE LEARNING DATA?

➢ Label Noise – Incorrectly labelled data points.

➢ Reduces model accuracy.

1.EXPLAIN THE 4 TYPES OF MACHINE LEARNING

Equation of Linear Regression:

Y = Predicted output (dependent variable)

X = Input feature (independent variable)

Working of Linear Regression:

This process typically consists of the following steps:

➢ Data Collection: Gathering raw data from real-world sources.

1. Measuring Devices (Data Collection)

➢ The process starts with gathering real-world data using:

➢ Noise filtering: Removing outliers and incorrect data points.

➢ Feature selection: Choosing only the most important features.

4. Model Learning (Training)

➢ This step involves selecting and training a machine learning model.

Common learning techniques include:

➢ Supervised Learning (Classification, Regression).

5. Model Testing (Evaluation)

➢ Purpose: To check the model’s accuracy and generalization.

1.STATE BAYES THEORM AND BAYESIAN NETWORK

➢ Bayes' Theorem is a fundamental concept in probability theory and machine learning. It

Bayesian Network (Bayes Net)

➢ A Bayesian Network is a graphical model that represents probabilistic relationships among a

➢ Nodes represent random variables (e.g., symptoms, diseases).

3.HOW TO MEASURE SUPPORT COUNT AND CONFIDENCE

1. Principal Component Analysis (PCA)

➢ Introduction: PCA transforms high-dimensional data into a lower-dimensional space by

2. Linear Discriminant Analysis (LDA)

➢ Introduction: LDA reduces dimensionality while maintaining class separability by maximizing

3. t-Distributed Stochastic Neighbour Embedding (t-SNE)

➢ Introduction: A non-linear technique that visualizes high-dimensional data by mapping it to

4. Autoencoders (Deep Learning-Based)

➢ Introduction: A type of neural network that learns a compressed representation of input

5. Feature Selection (Filter, Wrapper, Embedded Methods)

1.EXPLAIN NAIVE BASIEN WITH A PROBLEM

A dataset from AllElectronics Customer Database with 14 records (tuples).

➢ Age (youth, middle-aged, senior)

APRIORI PRUNING PRINCIPLE

➢ Initially, scan DB once to get frequent 1-itemset

Ck: Candidate itemset of size k

We stop here because no frequent itemsets are found further.

Itemset {I1, I2, I3}

So, rules can be

[I1Î2] =>[I3] //confidence = sup(I1Î2Î3)/sup(I1Î2) = 2/4*100=50%

Itemset {I1, I2, I5}

So, rules can be

[I1Î2] =>[I5] //confidence = sup(I1Î2Î5)/sup(I1Î2) = 2/4*100=50%

Itemset {I1, I2, I3}:

Itemset {I1, I2, I5}:

➢ Monotonicity: If an itemset is frequent, all its subsets are frequent.

➢ Easy to understand and implement.

➢ Can be expensive for large datasets.

Why is Dimensionality Reduction Important?

➢ Improves Model Performance – Reduces overfitting by eliminating redundant and irrelevant

Types of Dimensionality Reduction Techniques

1.Principal Component Analysis (PCA)

In pattern recognition, Dimension Reduction is defined as-

Consider the following example

➢ The following graph shows two dimensions x1 and x2.

Using dimension reduction techniques-

➢ Step-01: Get data.

2.Linear Discriminant Analysis (LDA)

Logistic regression is a classification algorithm traditionally limited to only two-class classification

Limitations of Logistic Regression

➢ Logistic regression is a simple and powerful linear classification algorithm.

Applications of Dimensionality Reduction

➢ Machine Learning & Deep Learning

Advantages of Dimensionality Reduction:

➢ Improves Model Performance

Disadvantages of Dimensionality Reduction:

You might also like