PRESENTATION ON
NOTION OF TRAINING-VALIDATION AND TESTING
                        BY:
          • Pn Sameera-22BCAR261
          • K.pooja-22BCAR262
          • Priyadharshini-22BCAR263
          • Vishwas-22BCAR265
                    INTRODUCTION
• Understanding the concepts of training data, validation data, and test
  data is important in machine learning. In the world of machine learning,
  data reigns supreme.
• . The trio of training data sets, validation data sets, and test data sets,
  play an important role in shaping your machine learning model.
• Machine learning (ML) is a branch of artificial intelligence (AI) that uses
  data and algorithms to mimic real-world situations. Machine learning
  helps you forecast, analyze, and study human behaviors and events.
• Machine learning helps you understand customer behaviors, spot
  process-related patterns, and operational gaps.
• Machine learning also helps you predict trends and developments.
•    Constructing a machine learning algorithm depends on how it will
    collect data. In this process, information is categorized into three
    types of data:
                 1.Training data.
                2.Validation data.
                3.Test data.
THREE TYPE OF SPLIT DATASET IN MACHINE LEARNING
                                TRAINING SET
• It is the set of data that is used to train and make the model learn the hidden
  features/patterns in the data.
• The training set includes the features and well as labels in the case of
  supervised learning. In the case of unsupervised learning, it can simply
  be the feature sets.
• These labels are used in the training phase to get the training accuracy
  score. The training set is usually taken as 70% of the original dataset
  but can be changed per the use case or available data.
• The training set must include all the possible inputs the model can process.
• For example, if your model must classify pictures of cats and dogs, the training set
  must include both cats and dogs.
                        VALIDATION SET
• The validation set is used to provide an unbiased evaluation of the model fit
  during hyperparameter tuning of the model.
• It is the set of examples that are used to change learning process
  parameters.
• Optimal values of hyperparameters are tested against the model trained
  using the training set.
• In Machine Learning or Deep Learning, we generally need to test multiple
  models with different hyperparameters and check which model gives the
  best result. This process is carried out with the help of a validation set.
• Applications of Validation Set
• Validations sets are used for Hyperparameter tuning of AI models. Domains
  include Healthcare, Analytics, Cyber Security, etc.
                         TEST SET
Test data is used to perform a realistic check on an algorithm.
Test data, also known as a testing set, or test set, confirms if the
machine learning model is accurate.
Once the machine learning model is confirmed as accurate, it can be
used for predictive analytics. Test data is similar to validation data.
Unlike validation data, test sets are only used once on the final model.
            Data splitting for training and testing your
            machine learning model
• Teaching a machine learning model will mean undertaking data splitting.
• You will need to denote which type of data you are working with: training data, validation data, or test
  data. Teaching your machine learning model requires data splitting into two primary datasets: training
  data and test data.
• Data splitting ensures that an algorithm model can help analysts find features or aspects that include an
  outcome or result. The standard data splitting approach uses the Pareto principle. The Pareto principle is
  also known as the 80:20 rule.
• The Pareto principle states that 80% of effects come from 20% of causes. The 80:20 rule can be applied
  to your data splitting as it is a reliable way to assess data. Your data splitting approach should:
                   1.Use 80% of your data as training data.
                   2.Use the remaining 20% of your data as testing data.
• In summary, training, testing,
  and validation sets serve
  distinct purposes in machine
  learning. The training set is
  used to train the model; the
  test set evaluates its
  performance on unseen data;
  and the validation set aids in
  model selection and
  hyperparameter tuning.
10/21/2024   12