0% found this document useful (0 votes)
13 views12 pages

IDML Presentation

The document presents an overview of the concepts of training, validation, and test datasets in machine learning, emphasizing their distinct roles in model development. The training set is used to teach the model, the validation set helps in tuning hyperparameters, and the test set evaluates the model's performance on unseen data. It also discusses the importance of data splitting, typically following the 80:20 rule for effective model training and testing.

Uploaded by

vishwasuded265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views12 pages

IDML Presentation

The document presents an overview of the concepts of training, validation, and test datasets in machine learning, emphasizing their distinct roles in model development. The training set is used to teach the model, the validation set helps in tuning hyperparameters, and the test set evaluates the model's performance on unseen data. It also discusses the importance of data splitting, typically following the 80:20 rule for effective model training and testing.

Uploaded by

vishwasuded265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

PRESENTATION ON

NOTION OF TRAINING-VALIDATION AND TESTING

BY:
• Pn Sameera-22BCAR261
• K.pooja-22BCAR262
• Priyadharshini-22BCAR263
• Vishwas-22BCAR265
INTRODUCTION
• Understanding the concepts of training data, validation data, and test
data is important in machine learning. In the world of machine learning,
data reigns supreme.

• . The trio of training data sets, validation data sets, and test data sets,
play an important role in shaping your machine learning model.

• Machine learning (ML) is a branch of artificial intelligence (AI) that uses


data and algorithms to mimic real-world situations. Machine learning
helps you forecast, analyze, and study human behaviors and events.
• Machine learning helps you understand customer behaviors, spot
process-related patterns, and operational gaps.
• Machine learning also helps you predict trends and developments.
• Constructing a machine learning algorithm depends on how it will
collect data. In this process, information is categorized into three
types of data:
1.Training data.
2.Validation data.
3.Test data.
THREE TYPE OF SPLIT DATASET IN MACHINE LEARNING
TRAINING SET
• It is the set of data that is used to train and make the model learn the hidden
features/patterns in the data.
• The training set includes the features and well as labels in the case of
supervised learning. In the case of unsupervised learning, it can simply
be the feature sets.
• These labels are used in the training phase to get the training accuracy
score. The training set is usually taken as 70% of the original dataset
but can be changed per the use case or available data.
• The training set must include all the possible inputs the model can process.
• For example, if your model must classify pictures of cats and dogs, the training set
must include both cats and dogs.
VALIDATION SET
• The validation set is used to provide an unbiased evaluation of the model fit
during hyperparameter tuning of the model.
• It is the set of examples that are used to change learning process
parameters.
• Optimal values of hyperparameters are tested against the model trained
using the training set.
• In Machine Learning or Deep Learning, we generally need to test multiple
models with different hyperparameters and check which model gives the
best result. This process is carried out with the help of a validation set.
• Applications of Validation Set
• Validations sets are used for Hyperparameter tuning of AI models. Domains
include Healthcare, Analytics, Cyber Security, etc.
TEST SET

Test data is used to perform a realistic check on an algorithm.

Test data, also known as a testing set, or test set, confirms if the
machine learning model is accurate.

Once the machine learning model is confirmed as accurate, it can be


used for predictive analytics. Test data is similar to validation data.
Unlike validation data, test sets are only used once on the final model.
Data splitting for training and testing your
machine learning model

• Teaching a machine learning model will mean undertaking data splitting.


• You will need to denote which type of data you are working with: training data, validation data, or test
data. Teaching your machine learning model requires data splitting into two primary datasets: training
data and test data.
• Data splitting ensures that an algorithm model can help analysts find features or aspects that include an
outcome or result. The standard data splitting approach uses the Pareto principle. The Pareto principle is
also known as the 80:20 rule.
• The Pareto principle states that 80% of effects come from 20% of causes. The 80:20 rule can be applied
to your data splitting as it is a reliable way to assess data. Your data splitting approach should:
1.Use 80% of your data as training data.
2.Use the remaining 20% of your data as testing data.
• In summary, training, testing,
and validation sets serve
distinct purposes in machine
learning. The training set is
used to train the model; the
test set evaluates its
performance on unseen data;
and the validation set aids in
model selection and
hyperparameter tuning.
10/21/2024 12

You might also like