GAYATHRI VIDYA PARISHAD COLLEGE OF ENGINERING FOR WOMEN (JG)
MADHURAWADA, VISAKHAPATNAM-48
                      III B.TECH- II SEMESTER (R22) External LAB EXAMINATION
                Name of the Lab: MACHINE LEARNING LAB
                Branch: 3CSE-3                    Max Marks: 50
                Name of the Faculty: G.SankaraRao       Timings:
1. Create an ARFF (Attribute-Relation File Format) file and read it in WEKA. Explore the purpose of each button
under the pre-process panel after loading the ARFF file. Also, try to interpret using a different ARFF file, weather.arff,
provided with WEKA.
2. Performing data preprocessing in Weka, Study Unsupervised Attribute Filters such as Replace: Missing Values to
replace missing values in the given dataset, Add :to add the new attribute Average, Discretize: to discretize the
attributes into bins. Explore Normalize and Standardize options on a dataset with numerical attributes.
3. ] Perform Classification using the WEKA toolkit Demonstration of classification process using id3 algorithm on
categorical dataset(weather). Demonstration of classification process using naïve Bayes algorithm on categorical
dataset (‘vote’). Demonstration of classification process using Random Forest algorithm on datasets containing large
number of attributes.
4. perform Classification using the WEKA toolkit – Part 2. Demonstration of classification process using J48
algorithm on mixed type of dataset after discretizing numeric attributes. Perform cross-validation strategy with various
fold levels. Compare the accuracy of the results
5. Performing clustering in WEKA
Apply hierarchical clustering algorithm on numeric dataset and estimate cluster quality. Exploratory
6. Classification using the WEKA toolkit Demonstration of classification process using id3 algorithm on categorical
dataset(weather.arff). Demonstration of classification process using naïve Bayes algorithm on categorical dataset
(‘vote’). Demonstration of classification process using Random Forest algorithm on datasets containing large number
of attributes.
7..Create & Load the ‘iris.csv’ file like below given and display the names and type of each column. Find statistics such
as min, max,range, mean, median, variance, standard deviation for each column of data. Repeat the above for
‘mtcars.csv’ dataset also.
 sepallength sepalwidth petallength petalwidth irisClass
          4.9             3             1.4          0.2 Iris-setosa
          4.7           3.2             1.3          0.2 Iris-setosa
8. Create & Load the ‘iris.csv’ file like below,Write R program to normalize the variables into 0 to 1 scale using min-
max normalization
 sepallength sepalwidth petallength petalwidth irisClass
          4.9              3            1.4          0.2 Iris-setosa
          4.7            3.2            1.3          0.2 Iris-setosa
9.Create & Load the ‘iris.csv’ file like below given, Generate histograms for each feature / variable (sepal length/ sepal
width/ petal length/ petal width) and generate scatter plots for every pair of variables showing each species in a different
color .
  sepallength sepalwidth petallength petalwidth irisClass
           4.9             3            1.4          0.2 Iris-setosa
           4.7           3.2            1.3          0.2 Iris-setosa
10.Create & Load the ‘iris.csv’ file like below given ,Generate box plots for each of the numerical attributes. Identify
the attribute with the highest variance
 sepallength     sepalwidth   petallength   petalwidth   irisClass
         4.9              3           1.4          0.2   Iris-setosa
         4.7            3.2           1.3          0.2   Iris-setosa
11.Study of homogeneous and heterogeneous data structures such as vector, matrix, array, list, data frame in R.
12.Create ,Load sampledata.csv file like below ,Study of homogeneous and heterogeneous data structures such as
vector, matrix, array, list, data frame in R.
 name          age        city                    salary       is_employed
 Alice               25   New York                    50000                   TRUE
 Bob                 32   London                      65000                   TRUE
 Charlie             28   Paris                       58000                   FALSE
 David               40   Tokyo                       72000                   TRUE
 Eve                 22   New York                    48000                   FALSE
13.Load R Studio default iris dataset , and Write R Program using ‘apply’ group of functions to create and apply
normalization function on each of the numeric variables/columns of iris dataset to transform them into a value around 0
with z-score normalization.
14.write about features,uses,basic syntax to work with RStudio tool,
 Write about Weka tool in detail