0% found this document useful (0 votes)
325 views4 pages

Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda

This document discusses the implementation of the Apriori algorithm for association rule mining using the WEKA data mining tool. It first provides background on association rules and the Apriori algorithm. It then describes how WEKA can be used to perform preprocessing, classification, clustering, association rule mining and other data mining techniques. The document goes on to explain how a new dataset was created in ARFF format and tested in WEKA using the Apriori algorithm to generate association rules from the data.

Uploaded by

Anonymous ihe6DF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
325 views4 pages

Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda

This document discusses the implementation of the Apriori algorithm for association rule mining using the WEKA data mining tool. It first provides background on association rules and the Apriori algorithm. It then describes how WEKA can be used to perform preprocessing, classification, clustering, association rule mining and other data mining techniques. The document goes on to explain how a new dataset was created in ARFF format and tested in WEKA using the Apriori algorithm to generate association rules from the data.

Uploaded by

Anonymous ihe6DF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

KIET International Journal of Intelligent Computing and Informatics,

Vol. 1, Issue 1, January 2014

Implementation of Apriori Algorithm using


WEKA
Ajay Kumar Shrivastava R. N. Panda
Associate Professor Associate Professor
KIET Group of Institutions, Ghaziabad KIET Group of Institutions, Ghaziabad
ajay@kiet.edu rabi.panda@kiet.edu

Abstract—In this current fast moving world, information is the WEKA is the collection or a suite of the tools for performing
most common feature in every aspect of the life. It can be used to data mining with the implementation of the ‘association rules’
perform analysis and it helps in decision making. But due to huge in it. Basically it is a collection of machine learning algorithm
collection of datasets the analysis and extraction of useful for the task of data mining, which is able to be applied directly
information from the database, creates a problem. Association
rules have been used to extract the useful information from the to dataset or can call from your own java code.
large databases. Apriori algorithm is one of the most useful
algorithm for the association rule mining. In this study the It is collection or suite of tools for performing the - data
implementation of the Apriori algorithm using WEKA has been preprocessing, classification, regression, clustering,
explained. A new dataset for this study has been created and association rules and visualization type operations and it also
tested using the ARFF files. can be enhance any new machine learning scheme. In this
study the WEKA 3.6.5 has been used. There are following
Keywords- Apriori algorithm; association rules; data mining; tools available in Weka.
Weka.
Explorer is used for exploring and extracting the dataset on
I. INTRODUCTION which the operations has to be performed. Experimenter is
used to perform experiments or statistical tests on the dataset.
In this current world, globalization is the main feature of any Knowledge Flow provides same functionalities as provided by
environment. Everyone has to be update, fast and forward and Explorer but with a drag-and-drop interface. It helps in
information is the main element for it. For survival in this incremental learning. Simple CLI provides simple Command
world it’s the basic need to use and to store the information Line Interface that allows direct execution of Weka commands
means to prepare a proper database or dataset to analyze. for Operating System that do not provide their own command
line interface [3].
Using and storing the database is not an issue, but finding the
relevant dataset or to analyze the meaningful dataset for a WEKA is performing its operations on the concepts of the
particular aspect, from the junkyard of the database is very big association rules of data mining.
problem in analysis of a specific part of the database.
III. ASSOCIATION RULES
To solve this problem the concept of data mining is used to
abstracts the desirable information. Useful information from Association rules are if/then statements that help uncover
the large databases has been extracted in the form of the
relationships between seemingly unrelated data in a relational
association rules. There are many algorithms have been
database or other information repository. An example of an
developed to extract the association rules from the large
association rule is "If a customer buys a dozen eggs, he is 80%
databases. Apriori algorithm is the most popular algorithm to
likely to also purchase milk". An association rule has two
extract the association rules from the databases [1].
parts, an antecedent (if) and a consequent (then). An
To implement the Apriori algorithm, there are many tools antecedent is an item found in the data. A consequent is an
item that is found in combination with the antecedent.
available in the market. WEKA is a open source software tool
for implementing machine learning algorithms [2]. In this
study, the dataset has been created in the form of ARFF files Association rules are created by analyzing data for frequent
and tested using the Apriori algorithm in WEKA software tool. if/then patterns and using the criteria support and confidence
to identify the most important relationships .They are divided
II. WEKA [WAIKATO ENVIRONM ENT FOR into separate categories in the data mining and used in the
KNOWLEDGE ANALYSIS] Weka to perform the operations.

12
KIET International Journal of Intelligent Computing and Informatics,
Vol. 1, Issue 1, January 2014

IV. APRIORI ALGORITHM Lk+1 =candidates in Ck+1 with min_support

The apriori algorithm is a popular and foundational member of End


the correlation based ‘Data Mining kernels’ used today. It is
used to process the data into more useful forms, in particular, Return ∪k Lk ;
connections between set of items.
The apriori algorithm is divided into 3 sections as- After applying the Apriori algorithm on the dataset given in
table 4.1 the three items are associated with each other
Ini tial frequent item Ca ndi date having support value of 2.
s ets generation
Table 4.1: Dataset

T_ID Items
Support Ca ndi date pruning
ca l culation 100 134

200 235
Initial frequent item sets are fed into the system, and candidate
generation, candidate pruning, and candidate support is 300 1235
executed in turn. The support information is fed back into the
candidate generator and the cycle continues until the final 400 25
candidate set is determined. A frequent item set is a set of one
or more items that often occur in the database one item, and
often occurs together in the same basket within the database if V. IMPLEMENTATION
it consists of more than one item. The cutoff of how often a set
must occur before it is included in the candidate set is the In Weka, basic implementation had done on the ARFF
support. (Attribute-Relation File Format) files. An ARFF file is an
ASCII text file that describes a list of instances sharing a set
The general approach is to implement the Apriori algorithm in of attributes. ARFF files were developed by the Machine
the most efficient manner possible, utilizing a minimum of Learning Project at the Department of Computer Science of
hardware and a minimum of time, as well as insuring that The University of Waikato for use with the Weka machine
utilization of the hardware comparators in near 100% [4]. The learning software.
algorithm is as follows
An ARFF file is an ASCII text file that describes a list of
 Join Step: Ck is generated by joining Lk-1 with itself. instances sharing a set of attributes. ARFF files have two
distinct sections. The first section is the Header information,
 Prune Step: Any (k-1) item set that is not frequent which is followed the Data information [2].
can’t be a subset of a frequent k-item set.
Lines that begin with a % are comments. The @RELATION,
 Pseudo-Code:
@ATTRIBUTE and @DATA declarations are case
insensitive.
Ck : Candidate item set of size k

Lk : Frequent itemset of size k i. Header part: The Header of the ARFF file contains the name
of the relation, a list of the attributes (the columns in the data),
L1 = {frequent items} and their types.

For( k=1;Lk != ϕ;k++) do begin The relation name is defined as the first line in the ARFF file.
The format is:
Ck+1 =candidates generated from Lk ;

For each transaction t in database do @relation <relation-name>


where <relation-name> is a string. The string must be quoted
Increment the count of all candidates in Ck+1 if the name includes spaces.
that are
Attribute declarations take the form of an orderd sequence of
Contained in t @attribute statements.

13
KIET International Journal of Intelligent Computing and Informatics,
Vol. 1, Issue 1, January 2014

@attribute <attribute-name> <datatype> iv. Experimental steps to perform the test on the problem:

Example- Step1:- Open the Explorer application of the WEKA tool.

@relation Computer Step2:- Choose its ‘preprocess’ option/tab.

@attribute T_id {100,200,300,400} Step3:- now click on ‘open file’ button to choose any ‘arff’ file
@attribute Num1{0,1} which have to be implemented. And s et the path for it to
@attribute Num2{0,1} fetch that file.
@attribute Num3{0,1}
Step4:- After include/fetch the arff file open the ‘Associate’
@attribute Num4{0,1}
tab of the Explorer window.
@attribute Num5{0,1}
Step5:- click on the choose button to select the algorithm to
ii.Data part:The @data declaration is a single line denoting implement (as we take here the Apriori Algorithm).
the start of the data segment in the file. The format is:
Step6:- click on start button to start the working the operation
@data of the algorithm on the selected arff file.

Example- Step7:- the result of this operation will be display in the


@data ‘Associator output’ box in the explorer window.
100,1,1,1,0,0
200,1,1,0,1,1
300,1,0,1,1,0
400,1,0,1,0,0 VI. OBSERVATIONS AND RESULTS

When the apriori algorithm was implemented on the


iii. Creation of an ARFF file: Define and create the ARFF
header part and the data part of the file along with define its Computer.arff file, it produced best association rules for
attributes and relations in any notepad program and simply that particular computer.arff file or dataset. Now the
save it into .arff file format. observation took place, to observe the result of the same
operation, on the same dataset by the use of weka tool
To view that ARFF file- and made the comparison between both cases.
a) Open Weka and select its tools option from menu.
b) Select the option arff viewer from pop-up menu. i. Open the file in preprocess segment: When the
c) Choose the file option from the arff viewer window. chosen ARFF file is open in the Preprocess section, there were
d) Open the created arff file from the location. two boxes appeared as in fig.6.1.

After opening the arff file from arff viewer it is shown in


fig.5.1.

Figure 6.1: Screen shot of Weka Explorer (Preprocess).


Figure 5.1: Scrren shot of ARFF viewer.

14
KIET International Journal of Intelligent Computing and Informatics,
Vol. 1, Issue 1, January 2014

left most box was the ‘Attributes’- which showed the VII. CONCLUSION & FUTURE SCOPE
attributes name of that particular arff file which was currently
opened and provide the checkboxes to select the attributes and On the comparison of both the cases means the apriori
there was a button to remove the checks. algorithm working on computer.arff file and the same
algorithm working with the Weka tool , the result in both the
Right side box was ‘selected attributes’- which upper portion cases it found that normally the apriori algorithm produce the
showed the value items of the selected attributes with the best association rule for that dataset after performing the
description of their labels and their counts and other operation on it and the weka tool produse the 10 best
information. And the lower portion showed the graphical association rules on that particular dataset for the same apriori
representation of the attributes with itself. algorithm and in the result of it weka produce 1 association
rule same as the result of the apriori algorithm without using
ii. Open the file in Associate segment: When the chosen weka tool. Which showed that the algorithm produce the same
ARFF file is open in the Associate section, there were an result in both the cases.
output box appeared as in fig.6.2.
So the observations and res ults are showing that this tool is
capable to provide the proper exploration and analysis of the
dataset and helpful to define the dataset and to take the
decision in future aspects.

As the implementation of ‘Apriori algorithm’, it can be more


compatible and purposeful in future, by implementation of the
new association algorithms for some other new operations and
analysis in this WEKA tool.

REFERENCES
[1] Rakesh Agrawal and Ramakrishnan Srikant, “ Fast algorithms for mining
association rules in large databases”, Proceedings of the 20th International
Conference on Very Large Data Bases, VLDB, pages 487 -499, Santiago,
Chile, September 1994.
[2] http://www.cs.waikato.ac.nz/ml/index.html.
[3]http://eisc.univalle.edu.co/cursos/web/material/750061M/1/WekaManual.
pdf
[4] Michael Hahsler and Sudheer Chelluboina, “Visualizing Association
Rules in Hierarchical Groups”, 42nd Symposium on the Interface:
Statistical, Machine Learning, and Visualization Algorithms (Interface
2011).
Figure 6.1: Screen shot of Weka Explorer (Associate)

In this segment there were option to chose the Algorithm to be


implemented on the selected arff file, and then had to press the
start button to find the associator output. In that segment the
chosen algorithm was Apriori algorithm. Click on Apriori
textbox and a box will open .Then turn car option to true from
‘false’ it was done because If car option was enabled, class
association rules are mined instead of (general) association
rules.. And after pressing the start button the associator output
Box showed the result of it which gave the some information
in the form of 10 best association rules as in fig.5.2

15

You might also like