JSPM’S
Bhivarabai Sawant Institute of Technology & Research
                     Pune-412207
          Department Of Computer Engin
              Academic Year 2019-20
               Mini Project Report
                         On
                 “Classification on Final
                  settlements in labor
                negotiations in Canadian
                        industry”
                    Submitted by:
                 Sonali Patil(BEA_05)
                 Reshma Jadhav(BEA_06)
                 Austin Wilson(BEA_22)
                Under the guidance of
                  Prof.Nilufar Zaman
           Subject : Laboratory Practice II
                            DEPARTMENT OF COMPUTER ENGINEERING
           BHIVARABAI SAWANT INSTITUTE OF TECHNOLOGY & RESEARCH
                                       WAGHOLI, PUNE – 412 207
                                            CERTIFICATE
      This is to certify that the Sonali Patil(BEA_05) ,Reshma Jadhav(BEA_06) and Austin Wilson(BEA_22)
have submitted her Project report on “ Classification on weather record dataset ”under my guidance and supervision.
The work has been done to my satisfaction during the academic year 2019-2020 under Savitribai Phule Pune University
                                                       guidelines.
         Date:
         Place: BSIOTR, PUNE.
         Prof.Nilufar Zaman                                                    Dr. Prof. Gayatri Bhandari
            Project Guide                                                           H.O. D.
                     ACKNOWLEDGEMENT
   This is a great pleasure & immense satisfaction to express my deepest sense of gratitude &
  thanks to everyone who has directly or indirectly helped me in completing my Project work
                                         successfully.
           I express my gratitude towards guide Prof. and Dr. Prof. G.M. Bhandari Head of
Department of Computer Engineering, Bhivarabai Sawant Institute Of Technology and Research,
Wagholi, Pune who guided & encouraged me in completing the Project work in scheduled time. I
    would like to thanks our Principal, for allowing us to pursue my Project in this institute.
                                                       Sonali Patil(BEA_05)
                                                       Reshma Jadhav(BEA_06)
                                                       Austin Wilson(BEA_22)
                   INDEX
                                       Page
Sr. No.         Chapters (14 points)
                                       No
          CERTIFICATE PAGE              I
          ACKNOWLEDGEMENT                II
          ABSTRACT                      III
          Index Page                   IV
  1.      INTRODUCTION                  1
  2.      OBJECTIVES AND SCOPE          3
          PROPOSED SYSTEM               4
  3.
          METHODOLOGY
  4.      RESULTS AND DISCUSSIONS       11
          ADVANTAGES AND                19
  5.
          DISADVANTAGES
  6.      CONCLUSION                    20
  7.      REFERENCES                    21
                                    ABSTRACT
           In this era of data science where R and Python are ruling the roost, let’s take a look at another
data science tool called Weka. Weka has been around for quite a while and was developed internally at
University of Waikato for research purpose. What makes Weka worthy of try is the easy learning curve.
Weka is a free software under the GNN general public license. Weka supports several standard data
mining tasks, more specifically, data preprocessing, clustering, classification ,regression, visualization,
and feature selection. A number of organizations monitor and analyse current Canadian economic
conditions including the Bank of Canada, the economic research units of major Canadian banks, and think
tanks such as the Conference Board of Canada. Large unions (e.g., CUPE or Unifor) and trade / industry
associations may also have economists on staff, however their analysis may only be available to their own
membership. Weka provides access to learning with Deep learning. It is not capable of multi-relational data
mining, but there is separate software for converting a collection of linked database tables into a single table
that is suitable for processing using Weka. Another important area that is currently not covered by the
algorithms included in the Weka distribution is sequence modeling.
          .
                                            CHAPTER 1
                                      INTRODUCTION
           Weka is a collection of machine learning algoritham for data mining tasks.Weka supports
           several standard functions as:
           - data mining tasks
           - data preprocessing
           -clustering
           - classification
           -regression
           - visualization
In law, a settlement is a resolution between disputing parties about a legal case, reached either before or after
court action begins. The term "settlement" also has other meanings in the context of law. Structured settlements provide
for future periodic payments, instead of a one time cash payment. A settlement, as well as dealing with the dispute
between the parties is a contract between those parties, and is one possible (and common) result when parties sue (or
contemplate so doing) each other in civil proceedings. The plaintiffs and defendants identified in the lawsuit can end the
dispute between themselves without a trial.
The contract is based upon the bargain that a party forgoes its ability to sue (if it has not sued already), or to continue
with the claim (if the plaintiff has sued), in return for the certainty written into the settlement. The courts will enforce the
settlement. If it is breached, the party in default could be sued for breach of that contract. In some jurisdictions, the party
in default could also face the original action being restored.
The settlement of the lawsuit defines legal requirements of the parties and is often put in force by an order of the court
after a joint stipulation by the parties. In other situations (as where the claims have been satisfied by the payment of a
certain sum of money), the plaintiff and defendant can simply file a notice that the case has been dismissed.
The majority of cases are decided by a settlement. Both sides (regardless of relative monetary resources) often have a
strong incentive to settle to avoid the costs (such as legal fees, finding expert witnesses, etc.), the time and the stress
associated with a trial, particularly where a trial by jury is available. Generally, one side or the other will make
a settlement offer early in litigation. The parties may hold (and indeed, the court may require) a settlement conference, at
which they attempt to reach such a settlement.
                                   CHAPTER 2
             OBJECTIVES AND SCOPE
The main objective of WEKA are too:
1.Make Machine Learning(ML) techniques generally available.
2.Apply them to practical problem as in labor negotition.
3.Analyze the dataset well and display results graphically.
4.Generating more clever view about labor negotition.
AREA OR SCOPE OF INVESTIGATION:
This project requires investigations in following areas:
1.candian industries
2.Best fit to predict accurate analysis.
   CHAPTER 3
PROPOSED SYSTEM
 METHODOLOGY
    WEKA INTERFACE:
Data Mining Classification
1. Decision Tree(D-Tree)
Decision Tree is a classification method which yields output as flowchart like tree structure. The result from
D-Tree is highly interpretable, but the outcome must be represented in categorical data. In this work ,D-tree
algorithm called “J48” is applied to classify data.
2. Naive Bayes
Naïve Bayes is a simple probabilistic classifier based on Bayes theorem, with a native assumptions of
independence between every pair of features.
3. LMT(Logistic Model Trees)
Logistic model trees are based on the earlier idea of a model tree: a decision tree that has linear
regression models at its leaves to provide a piecewise linear regression model (where ordinary decision
trees with constants at their leaves would produce a piecewise constant model).In the logistic variant,
this algorithm is used to produce an LR model at every node in the tree; the node is then split using
the C4.5 criterion.
                  Classification Steps:
                   Dataset in ARFF Format
                  Preprocessing the dataset
                      Select Dataset
                     Choose Classifier
Turn the                                      Train the
           Evaluate the model for test dataset
            Find Performance Criteria
Cross-validation:
Cross-validation is a technique that is used for the assessment of how the results of statistical analysis
generalize to an independent data set. This results in a loss of testing and modeling capability. Cross-
validation is also known as rotation estimation.
Cross validation is an extension of data split. In my understanding, the purpose of k-fold cross validation is
to test how well your model is trained upon a given data and test it on unseen data. . So, for this purpose we
use K-fold cross validation to make sure that each and every data point comes to test at-least once.
Discretize:
Data discretization converts a large number of data values into smaller once, so that data evaluation
and data management becomes very easy. One reason to discretize continuous features is to improve
signal-to-noise ratio. Fitting a model to bins reduces the impact that small fluctuates in the data has on the
model, often small fluctuates are just noise. Each bin "smooth" out the fluctuates/noises in sections of
the data.
Normalize:
n statistics and applications of statistics, normalization can have a range of meanings. In the simplest
cases, normalization of ratings means adjusting values measured on different scales to a notionally
common scale, often prior to averaging.
J48:
C4.5 (J48) is an algorithm used to generate a decision tree developed by Ross Quinlan mentioned earlier.
C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used
for classification, and for this reason, C4.5 is often referred to as a statistical classifier.
Naïve Bayes:
Naive Bayes classifiers are a collection of classification algorithms based on Bayes' Theorem. It is not a
single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of
features being classified is independent of each other Here is a tabular representation of our dataset.
LMT:
Logistic model trees are based on the earlier idea of a model tree: a decision tree that has linear regression
models at its leaves to provide a piecewise linear regression model (where ordinary decision trees with
constants at their leaves would produce a piecewise constant model).In the logistic variant, this
algorithm is used to produce an LR model at every node in the tree.
      Dataset Used:
@relation 'labor-neg-data'
@attribute 'duration' real
@attribute 'wage-increase-first-year' real
@attribute 'wage-increase-second-year' real
@attribute 'wage-increase-third-year' real
@attribute 'cost-of-living-adjustment' {'none','tcf','tc'}
@attribute 'working-hours' real
@attribute 'pension' {'none','ret_allw','empl_contr'}
@attribute 'standby-pay' real
@attribute 'shift-differential' real
@attribute 'education-allowance' {'yes','no'}
@attribute 'statutory-holidays' real
@attribute 'vacation' {'below_average','average','generous'}
@attribute 'longterm-disability-assistance' {'yes','no'}
@attribute 'contribution-to-dental-plan' {'none','half','full'}
@attribute 'bereavement-assistance' {'yes','no'}
@attribute 'contribution-to-health-plan' {'none','half','full'}
@attribute 'class' {'bad','good'}
@data
1,5,?,?,?,40,?,?,2,?,11,'average',?,?,'yes',?,'good'
2,4.5,5.8,?,?,35,'ret_allw',?,?,'yes',11,'below_average',?,'full',?,'full','good'
?,?,?,?,?,38,'empl_contr',?,5,?,11,'generous','yes','half','yes','half','good'
3,3.7,4,5,'tc',?,?,?,?,'yes',?,?,?,?,'yes',?,'good'
3,4.5,4.5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
2,2,2.5,?,?,35,?,?,6,'yes',12,'average',?,?,?,?,'good'
3,4,5,5,'tc',?,'empl_contr',?,?,?,12,'generous','yes','none','yes','half','good'
3,6.9,4.8,2.3,?,40,?,?,3,?,12,'below_average',?,?,?,?,'good'
2,3,7,?,?,38,?,12,25,'yes',11,'below_average','yes','half','yes',?,'good'
1,5.7,?,?,'none',40,'empl_contr',?,4,?,11,'generous','yes','full',?,?,'good'
3,3.5,4,4.6,'none',36,?,?,3,?,13,'generous',?,?,'yes','full','good'
2,6.4,6.4,?,?,38,?,?,4,?,15,?,?,'full',?,?,'good'
2,3.5,4,?,'none',40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
3,3.5,4,5.1,'tcf',37,?,?,4,?,13,'generous',?,'full','yes','full','good'
1,3,?,?,'none',36,?,?,10,'no',11,'generous',?,?,?,?,'good'
2,4.5,4,?,'none',37,'empl_contr',?,?,?,11,'average',?,'full','yes',?,'good'
1,2.8,?,?,?,35,?,?,2,?,12,'below_average',?,?,?,?,'good'
1,2.1,?,?,'tc',40,'ret_allw',2,3,'no',9,'below_average','yes','half',?,'none','bad'
1,2,?,?,'none',38,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,4,5,?,'tcf',35,?,13,5,?,15,'generous',?,?,?,?,'good'
2,4.3,4.4,?,?,38,?,?,4,?,12,'generous',?,'full',?,'full','good'
2,2.5,3,?,?,40,'none',?,?,?,11,'below_average',?,?,?,?,'bad'
3,3.5,4,4.6,'tcf',27,?,?,?,?,?,?,?,?,?,?,'good'
2,4.5,4,?,?,40,?,?,4,?,10,'generous',?,'half',?,'full','good'
1,6,?,?,?,38,?,8,3,?,9,'generous',?,?,?,?,'good'
3,2,2,2,'none',40,'none',?,?,?,10,'below_average',?,'half','yes','full','bad'
2,4.5,4.5,?,'tcf',?,?,?,?,'yes',10,'below_average','yes','none',?,'half','good'
2,3,3,?,'none',33,?,?,?,'yes',12,'generous',?,?,'yes','full','good'
2,5,4,?,'none',37,?,?,5,'no',11,'below_average','yes','full','yes','full','good'
3,2,2.5,?,?,35,'none',?,?,?,10,'average',?,?,'yes','full','bad'
3,4.5,4.5,5,'none',40,?,?,?,'no',11,'average',?,'half',?,?,'good'
3,3,2,2.5,'tc',40,'none',?,5,'no',10,'below_average','yes','half','yes','full','bad'
2,2.5,2.5,?,?,38,'empl_contr',?,?,?,10,'average',?,?,?,?,'bad'
2,4,5,?,'none',40,'none',?,3,'no',10,'below_average','no','none',?,'none','bad'
3,2,2.5,2.1,'tc',40,'none',2,1,'no',10,'below_average','no','half','yes','full','bad'
2,2,2,?,'none',40,'none',?,?,'no',11,'average','yes','none','yes','full','bad'
1,2,?,?,'tc',40,'ret_allw',4,0,'no',11,'generous','no','none','no','none','bad'
1,2.8,?,?,'none',38,'empl_contr',2,3,'no',9,'below_average','yes','half',?,'none','bad'
3,2,2.5,2,?,37,'empl_contr',?,?,?,10,'average',?,?,'yes','none','bad'
2,4.5,4,?,'none',40,?,?,4,?,12,'average','yes','full','yes','half','good'
1,4,?,?,'none',?,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,2,3,?,'none',38,'empl_contr',?,?,'yes',12,'generous','yes','none','yes','full','bad'
2,2.5,2.5,?,'tc',39,'empl_contr',?,?,?,12,'average',?,?,'yes',?,'bad'
2,2.5,3,?,'tcf',40,'none',?,?,?,11,'below_average',?,?,'yes',?,'bad'
2,4,4,?,'none',40,'none',?,3,?,10,'below_average','no','none',?,'none','bad'
2,4.5,4,?,?,40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
2,4.5,4,?,'none',40,?,?,5,?,11,'average',?,'full','yes','full','good'
2,4.6,4.6,?,'tcf',38,?,?,?,?,?,?,'yes','half',?,'half','good'
2,5,4.5,?,'none',38,?,14,5,?,11,'below_average','yes',?,?,'full','good'
2,5.7,4.5,?,'none',40,'ret_allw',?,?,?,11,'average','yes','full','yes','full','good'
2,7,5.3,?,?,?,?,?,?,?,11,?,'yes','full',?,?,'good'
3,2,3,?,'tcf',?,'empl_contr',?,?,'yes',?,?,'yes','half','yes',?,'good'
3,3.5,4,4.5,'tcf',35,?,?,?,?,13,'generous',?,?,'yes','full','good'
3,4,3.5,?,'none',40,'empl_contr',?,6,?,11,'average','yes','full',?,'full','good'
3,5,4.4,?,'none',38,'empl_contr',10,6,?,11,'generous','yes',?,?,'full','good'
3,5,5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
3,6,6,4,?,35,?,?,14,?,9,'generous','yes','full','yes','full','good'
                                          CHAPTER
                                       RESULT AND
                                      DISCUSSIONS
                                 Labor-negotition Dataset using WEKA
                                         Screenshot Captured:
Pre-Processing done using normalize filter :
Pre-Processing done using discretize filter:
Pre-Processing done using replace missing values filter:
Classifier Decision Stump used for classification with 84% of accuracy:
Classifier Naïve Bayes used for classification with 98% of accuracy:
    Classifier Logistic used for classification with 100% of accuracy:
Cross validation performed on Naïve Bayes :
  Cross Validation performed on Logistics:
Cross Validation performed on Decision Stump:
Overall Analysis of Classification Done:
Sr            Classifier         Instances          Instances           Overall
no            used               correctly          incorrectly         Accuracy
                                 Classified         classified
1.            Decision Stump     48                 9                   84%
2             Naïve              56                 1                   98%
              Bayes
3.            logistic           57                 0                   100%
4.            Cross              46                 11                  80.70%
              validation
              on
              Decision
              Stump
5             Cross              51                 6                   89%
              validation
              on
              Naïve
              Bayes
6.            Cross              53                 4                   92%
              validation
              on
              Function
              Logistic
So ,we have concluded that as Function Logistic Algorithm works out best for our
labor negotiation dataset analysis giving the accuracy of 100%,hereby is considered
to be suitable enough for analyzing out given dataset.
              CHAPTER 5
     ADAVANTAGES AND DISADVANTAGES
ADVANTAGES:
1. Free available under the GNU General Public License.
2.Portability,Since it is fully implemented in java programming languages
3.Runs on almost any modern computing platform
4. Ease of use due to its graphical user interface.
DISADVANTAGES:
1. It can only handle small datasets.
2.Blockchain can be a thing to be consider.
3. Using it via command line is a pain without read line capability of the shell.
                                         CHAPTER 6
                                   CONCLUSION
Finally after all analysis we obtained the result for the corresponding dataset. We analysis that function
logistic is the best classification algorithm analyzed, it’s then followed by naive bayes and decision stump
with the approximate accuracy nearby to function logistic. But at some point both Naïve Bayes and
decision stump shows same level of accuracy .
,we have concluded that as Function Logistic Algorithm works out best for our labor negotition analysis
giving the accuracy of 100%,hereby is considered to be suitable enough for analyzing out given dataset.
                                       REFRENCES
1. https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/labor.arff
2. https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf
3. https://courses.soe.ucsc.edu/courses/tim245/Spring12/01/pages/attached-files/attachments/11549