0% found this document useful (0 votes)
24 views22 pages

ML Imp Ques 1

The document provides an overview of key concepts in machine learning, including definitions, applications, and principles such as Occam's Razor. It discusses various machine learning techniques, including linear regression, Bayesian networks, and dimensionality reduction methods like PCA and LDA. Additionally, it covers classification methods like Naive Bayes and the Apriori algorithm for association rule mining.

Uploaded by

Kiruthika GS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views22 pages

ML Imp Ques 1

The document provides an overview of key concepts in machine learning, including definitions, applications, and principles such as Occam's Razor. It discusses various machine learning techniques, including linear regression, Bayesian networks, and dimensionality reduction methods like PCA and LDA. Additionally, it covers classification methods like Naive Bayes and the Apriori algorithm for association rule mining.

Uploaded by

Kiruthika GS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

ML IMP QUES

2M

UNIT 1

1.DEFINE MACHINE LEARNING AND GIVE 3 APPLICATIONS

An application of ai that provides a system ability to automatically learn and improve from
experience without being explicitly programmed.

Applications are

➢ Computational finance
➢ Image processing and computer vision
➢ Tumor detection, drug discovery, and DNA sequencing
➢ Energy production, for price and load forecasting
➢ Natural language processing

2.GIVE OCCUM’S RAZOR PRINCIPLE

➢ Occam’s Razor states that "Among competing hypotheses, the one with the fewest
assumptions should be preferred." This principle emphasizes simplicity in reasoning,
suggesting that unnecessary complexities should be avoided when explaining a
phenomenon.
➢ It is widely used in science, philosophy, and problem-solving to choose the most logical and
straightforward explanation.

3.DIFFERENTIATE BIAS AND VARIANCE

4.WHAT IS NOISE IN MACHINE LEARNING DATA?

In machine learning, noise refers to irrelevant or random variations in data that do not represent the
actual patterns or relationships. It can be caused by errors in data collection, human mistakes, or
inherent randomness in real-world data.

Types of Noise:

➢ Label Noise – Incorrectly labelled data points.


➢ Feature Noise – Errors in input features (e.g., sensor inaccuracies).

Effects of Noise:

➢ Reduces model accuracy.


➢ Makes predictions less reliable.
5.WHAT IS LEARNING CLASS BY EXAMPLE

Learning Class by Example is a machine learning approach where a model learns to classify data
based on labelled examples. Instead of relying on predefined rules, the system generalizes patterns
from given training examples to classify new, unseen data.

Example:

A spam filter learns to classify emails as spam or not spam by analysing a dataset of previously
labelled emails.

This method is widely used in classification tasks like image recognition, fraud detection, and
medical diagnosis.

12M

1.EXPLAIN THE 4 TYPES OF MACHINE LEARNING


2.EXPLAIN LINER REGRESSION WITH AN EXAMPLE PROBLEM

Linear Regression is a statistical and machine learning technique used to model the relationship
between a dependent variable (Y) and one or more independent variables (X) using a straight-line
equation.

Equation of Linear Regression:

For simple linear regression (one independent variable), the equation is:

Y=mX+c

Where:

Y = Predicted output (dependent variable)

X = Input feature (independent variable)

m = Slope (coefficient)

c = Intercept (constant)

For multiple linear regression (multiple independent variables), the equation extends to:

Y=b0+b1X1+b2X2+...+bnXn

Working of Linear Regression:

Finds the Best Fit Line – Minimizes the difference between predicted and actual values.

Uses Least Squares Method – Reduces error by minimizing the sum of squared differences between
actual and predicted values.

Predicts Continuous Values – Used for tasks like price prediction, sales forecasting, and trend
analysis.
Conclusion

From the given sales data (2005–2009), we derived the least squares regression equation:

y=8.4x−16830.4y

Using this equation, we predicted the company's sales in 2012 to be 70.4 million dollars.
3.EXPLAIN THE LEARNING PROCESS OF MACHINE LEARNING

Machine Learning (ML) is a process where computers learn patterns from data to make predictions
or decisions without being explicitly programmed. The learning process involves multiple stages,
from data collection to model evaluation, ensuring accurate and meaningful insights.

This process typically consists of the following steps:

➢ Data Collection: Gathering raw data from real-world sources.


➢ Pre-processing: Cleaning and transforming data for better quality.
➢ Dimensionality Reduction: Selecting the most relevant features to improve efficiency.
➢ Model Learning: Training machine learning models using algorithms.
➢ Model Testing: Evaluating performance using validation techniques.
➢ Analysis Results: Using insights for decision-making.

1. Measuring Devices (Data Collection)

➢ The process starts with gathering real-world data using:


Sensors (e.g., temperature sensors, IoT devices).
Cameras (e.g., image and video data collection).
Databases (e.g., structured and unstructured data storage).
➢ This step provides raw data for processing.

2. Pre-processing

➢ Why it’s needed? Raw data may contain noise, missing values, or irrelevant information.

Key tasks:

➢ Noise filtering: Removing outliers and incorrect data points.


➢ Feature extraction: Selecting important features from raw data.
➢ Normalization: Scaling data to a consistent range for better model performance.
3. Dimensionality Reduction

➢ Why it’s needed? High-dimensional data can make models complex and slow.

Key techniques:

➢ Feature selection: Choosing only the most important features.


➢ Feature projection: Transforming data into a lower-dimensional space (e.g., PCA – Principal
Component Analysis).

4. Model Learning (Training)

➢ This step involves selecting and training a machine learning model.

Common learning techniques include:

➢ Supervised Learning (Classification, Regression).


➢ Unsupervised Learning (Clustering, Anomaly Detection).
➢ Reinforcement Learning (Decision-making and AI agents).
➢ The model adjusts its parameters based on the training data.

5. Model Testing (Evaluation)

➢ Purpose: To check the model’s accuracy and generalization.

Key techniques:

➢ Cross-validation: Splitting data into training and testing sets to evaluate model performance.
➢ Bootstrap: Random sampling to estimate model accuracy.
➢ The model’s performance is assessed using:
➢ Classification Metrics (e.g., accuracy, precision, recall).
➢ Regression Metrics (e.g., RMSE, MAE).
➢ Clustering Metrics (e.g., silhouette score).

6. Analysis Results

➢ The final step involves interpreting model results and making data-driven decisions.
➢ The trained model can now be deployed for real-world applications.
UNIT 2

1.STATE BAYES THEORM AND BAYESIAN NETWORK

Bayes' Theorem

➢ Bayes' Theorem is a fundamental concept in probability theory and machine learning. It


describes how to update the probability of an event based on new evidence.

Example:
In spam detection, Bayes' Theorem helps determine the probability that an email is spam given that
it contains certain words.

Bayesian Network (Bayes Net)

➢ A Bayesian Network is a graphical model that represents probabilistic relationships among a


set of variables using Directed Acyclic Graphs (DAGs).

Key Features:

➢ Nodes represent random variables (e.g., symptoms, diseases).


➢ Edges represent conditional dependencies between variables.
➢ Each node has a Conditional Probability Table (CPT) defining how it depends on its parent
nodes.

Example:

In a medical diagnosis system, a Bayesian Network can represent the relationships between diseases
and symptoms.

Application Areas:

➢ Medical Diagnosis
➢ Spam Filtering
➢ Fraud Detection
➢ Speech Recognition
2.STATE 0/1 LOSS FUNCTION

The 0/1 loss function is a simple loss function used in classification problems to measure the
accuracy of predictions. It assigns a loss of 0 for correct predictions and 1 for incorrect predictions.

Explanation:

➢ If the predicted value matches the actual label, the loss is 0 (no penalty).
➢ If the predicted value does not match the actual label, the loss is 1 (full penalty).

3.HOW TO MEASURE SUPPORT COUNT AND CONFIDENCE


4.GIVE FEW DIMENSION REDUCTION TECHNIQUES

Dimensionality reduction helps in reducing the number of features while preserving essential
information. It improves model performance, reduces overfitting, and speeds up computations.

1. Principal Component Analysis (PCA)

➢ Introduction: PCA transforms high-dimensional data into a lower-dimensional space by


selecting the most important features (principal components).
➢ Applications: Used in image compression, finance, and genetics.
➢ Example: Reducing a 1000-feature dataset to 10 principal components while retaining 90%
variance.

2. Linear Discriminant Analysis (LDA)

➢ Introduction: LDA reduces dimensionality while maintaining class separability by maximizing


the difference between multiple classes.
➢ Applications: Used in face recognition, text classification, and medical diagnosis.
➢ Example: LDA can reduce a 10-feature dataset into 2 dimensions while keeping different
class labels distinct.

3. t-Distributed Stochastic Neighbour Embedding (t-SNE)

➢ Introduction: A non-linear technique that visualizes high-dimensional data by mapping it to


2D or 3D space.
➢ Applications: Used in data visualization for clustering and anomaly detection.
➢ Example: t-SNE is widely used in gene expression analysis to group similar genes together.

4. Autoencoders (Deep Learning-Based)

➢ Introduction: A type of neural network that learns a compressed representation of input


data.
➢ Applications: Used in image denoising, anomaly detection, and speech recognition.
➢ Example: Compressing high-resolution images into low-dimensional representations for
faster processing.

5. Feature Selection (Filter, Wrapper, Embedded Methods)

➢ Introduction: Selecting only the most relevant features based on statistical tests, model-
based selection, or domain knowledge.
➢ Applications: Used in healthcare, fraud detection, and finance.
➢ Example: In a medical dataset, removing irrelevant patient attributes (e.g., name, address)
to keep only significant factors like blood pressure and sugar level.
12M

1.EXPLAIN NAIVE BASIEN WITH A PROBLEM

Consider the dataset. Predict for the given X value whether the customer can purchase a computer
or not using Bayes Classification.

Given input:
X = (Age = Youth, Income = Medium, Student = Yes, Credit_Rating = Fair)

Given:

A dataset from AllElectronics Customer Database with 14 records (tuples).

Contains 5 attributes:

➢ Age (youth, middle-aged, senior)


➢ Income (high, medium, low)
➢ Student (yes, no)
➢ Credit Rating (fair, excellent)
➢ Class Label: Buys Computer (yes, no)

The dataset is used for classification, where the goal is to predict if a customer will buy a computer
based on their attributes.

SOLN:
2.EXPLAIN APRIORI ALG WITH A PROBLEM

APRIORI PRUNING PRINCIPLE

If there is any itemset which is infrequent, its superset should not be generated/tested.

ALGORITHM

➢ Initially, scan DB once to get frequent 1-itemset


➢ Generate length (k+1) candidate itemsets from length k frequent itemsets
➢ Test the candidates against DB
➢ Terminate when no frequent or candidate set can be generated

PSEUDO-CODE

Ck: Candidate itemset of size k


Lk: frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk! = NULL; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return Uk Lk;
Minimum support count is 2 minimum confidence is 60%

STEP 1 K=1

(I) Create a table containing support count of each item present in dataset – Called C1(candidate set)

(II) Compare candidate set item’s support count with minimum support count (if support_count of is
less than min_support then remove those items). This gives us itemset L1.

STEP 2 K=2

(I)Generate candidate set C2 using L1 (this is called join step). Condition of joining Lk-1 and Lk-1 is
that it should have (K-2) elements in common.

➢ Check all subsets of an itemset are frequent or not and if not frequent remove that itemset.
(Example subset of {I1, I2} are {I1}, {I2} they are frequent. Check for each itemset)
➢ Now find support count of these itemsets by searching in dataset.
(II) compare candidate (C2) support count with minimum support count (here min_support=2 if
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L2.

STEP 3

(I)Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and Lk-1 is that it should
have (K-2) elements in common.

➢ Check if all subsets of these itemsets are frequent or not and if not, then remove that
itemset
➢ find support count of these remaining itemset by searching in dataset.

(II) Compare candidate (C3) support count with minimum support count (here min_support=2 if
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L3.

STEP 4

Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and Lk-1 (K=4) is that, they
should have (K-2) elements in common.

Check all subsets of these itemsets are frequent or not (Here itemset formed by joining L3 is {I1, I2,
I3, I5} so its subset contains {I1, I3, I5}, which is not frequent). So, no itemset in C4

We stop here because no frequent itemsets are found further.


ASSOCIATION RULE GENERATION

A confidence of 60% means that 60% of the customers, who purchased milk and bread also bought
butter.

Confidence(A->B) =Support_count(A∪B)/Support_count(A)

Itemset {I1, I2, I3}

So, rules can be

[I1^I2] =>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%


[I1^I3] =>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3] =>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1] =>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2] =>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3] =>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%

Itemset {I1, I2, I5}

So, rules can be

[I1^I2] =>[I5] //confidence = sup(I1^I2^I5)/sup(I1^I2) = 2/4*100=50%


[I1^I5] =>[I2] //confidence = sup(I1^I2^I5)/sup(I1^I5) = 2/2*100=100%
[I2^I5] =>[I1] //confidence = sup(I1^I2^I5)/sup(I2^I5) = 2/2*100=100%
[I1] =>[I2^I5] //confidence = sup(I1^I2^I5)/sup(I1) = 2/6*100=33%
[I2] =>[I1^I5] //confidence = sup(I1^I2^I5)/sup(I2) = 2/7*100=28%
[I5] =>[I1^I2] //confidence = sup(I1^I2^I5)/sup(I5) =2/2*100=100%

Itemset {I1, I2, I3}:

➢ The highest confidence observed is 50%, which is below the common threshold.
➢ No strong association rules can be extracted from this itemset.

Itemset {I1, I2, I5}:

➢ Three rules have 100% confidence, making them strong association rules:
➢ [I1^I5] → [I2] (100%)
➢ [I2^I5] → [I1] (100%)
➢ [I5] → [I1^I2] (100%) These rules indicate a strong relationship among the items.

PROPERTIES

➢ Monotonicity: If an itemset is frequent, all its subsets are frequent.


➢ Anti-Monotonicity: If an itemset is infrequent, all its supersets are infrequent.

ADVANTAGES

➢ Easy to understand and implement.


➢ Can handle large datasets.

DISADVANTAGES

➢ Can be expensive for large datasets.


➢ Generates many unnecessary candidates.
3.DESCRIBE DIMENSIONALITY REDUCTION

Dimensionality Reduction

Dimensionality reduction is a technique used in machine learning and data science to reduce the
number of features (variables) in a dataset while preserving important information. High-
dimensional data can lead to increased computation time, overfitting, and difficulties in
visualization, making dimensionality reduction essential.

Why is Dimensionality Reduction Important?

➢ Improves Model Performance – Reduces overfitting by eliminating redundant and irrelevant


features.
➢ Enhances Computational Efficiency – Less data means faster training and prediction times.
➢ Better Data Visualization – High-dimensional data is challenging to visualize; reducing
dimensions allows for graphical representation in 2D or 3D.
➢ Removes Noise – Helps eliminate less useful or noisy features, improving model
generalization.

Types of Dimensionality Reduction Techniques

1.Principal Component Analysis (PCA)

In pattern recognition, Dimension Reduction is defined as-

➢ It is a process of converting a data set having vast dimensions into a data set with lesser
dimensions.
➢ It ensures that the converted data set conveys similar information concisely.

Example-

Consider the following example

➢ The following graph shows two dimensions x1 and x2.


➢ x1 represents the measurement of several objects in cm.
➢ x2 represents the measurement of several objects in inches.

Using dimension reduction techniques-

➢ We convert the dimensions of data from 2 dimensions (x1 and x2) to 1 dimension (z1).
➢ It makes the data relatively easier to explain.
Dimension reduction offers several benefits such as-

➢ It compresses the data and thus reduces the storage space requirements.
➢ It reduces the time required for computation since less dimensions require less
computation.
➢ It eliminates the redundant features.
➢ It improves the model performance.

ALG

➢ Step-01: Get data.


➢ Step-02: Compute the mean vector (µ).
➢ Step-03: Subtract mean from the given data.
➢ Step-04: Calculate the covariance matrix.
➢ Step-05: Calculate the Eigen vectors and Eigen values of the covariance matrix.
➢ Step-06: Choosing components and forming a feature vector.
➢ Step-07: Deriving the new data set.

2.Linear Discriminant Analysis (LDA)

Logistic regression is a classification algorithm traditionally limited to only two-class classification


problems.

➢ If you have more than two classes then Linear Discriminant Analysis is the preferred linear
classification technique.

Limitations of Logistic Regression

➢ Logistic regression is a simple and powerful linear classification algorithm.


➢ Two-Class Problems. Logistic regression is intended for two-class or binary classification
problems. It can be extended for multi-class classification, but is rarely used for this purpose.
➢ Unstable with Well Separated Classes. Logistic regression can become unstable when the
classes are well separated.
➢ Unstable with Few Examples. Logistic regression can become unstable when there are few
examples from which to estimate the parameters.

Example:

➢ Suppose we have two sets of data points belonging to two different classes that we want to
classify.
➢ As shown in the given 2D graph, when the data points are plotted on the 2D plane, there’s
no straight line that can separate the two classes of the data points completely.
➢ Hence, in this case, LDA (Linear Discriminant Analysis) is used which reduces the 2D graph
into a 1D graph in order to maximize the separability between the two classes.
ALG

➢ Calculate Class Means: Compute the average vectors for each class in your data.
➢ Compute Scatter Matrices: Determine the between-class and within-class scatter matrices.
➢ Eigenvalue/Eigenvector Calculation: Find the eigenvectors and corresponding eigenvalues
for the scatter matrices.
➢ Sort and Select Eigenvectors: Sort eigenvectors by decreasing eigenvalues and pick the top k
eigenvectors.
➢ Transform Data: Use the selected k eigenvectors to form a matrix, and transform your data
into the new lower-dimensional space using matrix multiplication.

Applications of Dimensionality Reduction

➢ Machine Learning & Deep Learning


➢ Computer Vision
➢ NLP
➢ Bioinformatics & Genetics

Advantages of Dimensionality Reduction:

➢ Improves Model Performance


➢ Reduces Overfitting
➢ Storage Efficiency

Disadvantages of Dimensionality Reduction:

➢ Loss of Information
➢ Computationally Expensive
➢ Risk of Over-Simplification

You might also like