0% found this document useful (0 votes)

19 views22 pages

Dataminingg

This lab report by Bibek Khadka for the Bachelor of Information Management program at Tribhuvan University focuses on data mining techniques using the WEKA tool. It covers the implementation of various algorithms such as Apriori, FP-Growth, ID3, Bayesian, and K-Means for data analysis. The report emphasizes the importance of data mining in extracting meaningful insights from large datasets across different industries.

Uploaded by

bibek547khadka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views22 pages

Dataminingg

Uploaded by

bibek547khadka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Tribhuvan University

Bachelor of Information Management

“DATA MINING”
At

Brightland College

A LAB REPORT

Submitted to
Department of Bachelor of Information Management
Brightland College, Nepalgunj.

In partial fulfillment of the requirements for the Bachelor of Information Management ( BIM )

Submitted by
Bibek Khadka ( Symbol No: 12899/20)
TU Registration No: 7-2-543-33-2020

Under the Supervision of

Bhuwan Rai
Tribhuvan University
Bachelor of Information Management, Nepalgunj

Supervisor’s Recommendation

I hereby recommend that this Lab report prepared under my supervision by “Bibek Khadka” ( Symbol
No. 12899/20) in partial fulfillment of the requirements for the degree of Bachelor of Information
Management ( BIM ).

……………………….
Supervisor
Bhuwan Rai
Assistant Teacher
Brightland College, Nepalgunj
Acknowledgement

We, the undersigned, take great pleasure in expressing our heartfelt gratitude to all who
have volunteered their time to assist us in achieving project success.

We want to convey our appreciation to everyone who made it possible for us to finish this
project. Our heartfelt gratitude to the BIM department of Brightland College for supplying
us with the necessary information that enabled us to effectively finish our assignment.

This project has been a significant milestone in our professional development, and we are
committed to utilizing the skills and knowledge we have gained to achieve our career goals.
We hope to continue to collaborate with all of you in the future.

Thank you all for your support and encouragement.

Table Of Contents.
Supervisor’s Recommendation .................................................................................................................. 1
Introduction to Data Mining ...................................................................................................................... 1
Introduction ................................................................................................................................................2
1. Learning the Basics of WEKA ...................................................................................................................... 2
2. Implementing the Apriori Algorithm .....................................................................................................3
3. Implementing FP-Growth Algorithm .................................................................................................... 6
4. Classifying Using ID3 Algorithm ........................................................................................................10
5. Classifying Data Using Bayesian Algorithms ..................................................................................... 12
6. Implementing K-Means Algorithm ......................................................................................................14
Why is Data Mining Important? .............................................................................................................. 16
Historical Context .................................................................................................................................... 16
Key Concepts in Data Mining ..................................................................................................................16
Applications of Data Mining ................................................................................................................... 17
Challenges in Data Mining ...................................................................................................................... 17
Conclusion ............................................................................................................................................... 18
Introduction to Data Mining
In today’s digital world, a massive amount of data is being generated every second—from social media,
banking, e-commerce, healthcare, education, and various other sectors. However, this raw data is often
unstructured, noisy, and overwhelming. Merely collecting data is not useful unless it is analyzed and
interpreted meaningfully. This is where Data Mining plays a crucial role.
Data Mining is the process of discovering patterns, trends, correlations, or useful information from
large datasets using various techniques from machine learning, statistics, and database systems. It is
sometimes called Knowledge Discovery in Databases (KDD). The goal is to convert raw data into
meaningful and actionable knowledge.
Data Mining involves several key steps such as data cleaning, data integration, data selection, data
transformation, pattern discovery, and knowledge presentation. These steps help in handling noisy,
incomplete, and inconsistent data to extract accurate and interesting information.

1
Introduction
WEKA (Waikato Environment for Knowledge Analysis) is a popular open-source tool for data mining
and machine learning. It provides a collection of algorithms and tools for data preprocessing,
classification, regression, clustering, and association rule mining. This report covers the implementation
of various data mining algorithms using WEKA, including the Apriori and FP-growth algorithms for
association rule mining, ID3 and Bayesian algorithms for classification, and K-means for clustering.

1. Learning the Basics of WEKA

WEKA provides a graphical user interface for interacting with datasets and applying machine learning
algorithms. Key components include:

● Explorer : For interactive data analysis.

● Experimenter : For performing experiments and evaluating results.
● Knowledge Flow : For visually designing machine learning workflows.
● Command Line Interface : For scripting and batch processing.

Steps to Start Using WEKA:

1. Download and Install : Get WEKA from its [official
website](https://www.cs.waikato.ac.nz/ml/weka/).
1. Load Data : Open WEKA and use the `Explorer` to load datasets in formats such as ARFF or
CSV.
2. Preprocess Data : Use various preprocessing tools to clean and prepare data.

2
2. Implementing the Apriori Algorithm

The Apriori algorithm is used for association rule mining, aiming to find frequent itemsets in
transactional data.

Steps to Implement Apriori in WEKA:

1. Load Dataset : Import a dataset suitable for association rule mining (e.g.,
`Supermarket.arff`).

1. Select Algorithm : Go to the `Associate` tab.

2. Choose Apriori : Select `Apriori` from the list of association algorithms.
3. Set Parameters : Adjust parameters such as `Support` and `Confidence`.
4. Run Algorithm : Click `Start` to run the Apriori algorithm.
5. View Results : Analyze the output rules and their support and confidence values.

Dataset: breast-cancer.arff Algorithm:

Apriori Parameters: Default

3
4
5
3. Implementing FP-Growth Algorithm

The FP-Growth algorithm is another approach to association rule mining that uses a frequent pattern
tree (FP-tree).

Steps to Implement FP-Growth in WEKA:

1. Load Dataset : Use the same or a different transactional dataset.
2.Select Algorithm : Go to the `Associate` tab.
3. Choose FP-Growth : Select `FPGrowth` from the list of association algorithms.
4. Set Parameters : Configure parameters like `MinSupport`.
5. Run Algorithm : Click `Start` to execute the FP-Growth algorithm.
6. View Results : Review the frequent itemsets and association rules.

Procedure
Step 1: Open the data file (Shopping.arff) in Weka Explorer. It assumes that the required data fields
have been discretized.
Step 2: Click on the "Associate" tab to bring up the interface for the association rule algorithm. Step 3:
Use the FP-Growth algorithm (the default algorithm in Weka for association rule mining).
Step 4: To modify parameters like support or confidence, you can adjust them in the text box located to
the right of the "Choose" button.

The dataset includes the following attributes for shopping items:

● milk {yes, no}

● bread {yes, no}
● honey {yes, no}
● ghee {yes, no}
● jam {yes, no}

6
7
8
9
4. Classifying Using ID3 Algorithm

ID3 (Iterative Dichotomous 3) is a decision tree algorithm used for classification.

Steps to Implement ID3 in WEKA:

1. Load Dataset :.Select vote.arff file from the “Open file” under the preprocess tab option.
2.Select Algorithm : Go to the “Classify” tab for classifying the unclassified data.
3.Choose ID3 :Click on the “Choose” button. From this, select “trees -> J48”. (an
implementation of the C4.5 algorithm, which is an extension of ID3).
4.Set Parameters : Adjust settings if necessary.
5.Run Algorithm : Click `Start` to build the decision tree.
6.View Results : Examine the decision tree and classification accuracy.To visualize the tree, right-
click on the result and select visualize the tree.

10
11
5. Classifying Data Using Bayesian Algorithms

Bayesian algorithms, such as the Naive Bayes classifier, are used for probabilistic classification.

Steps to Implement Naive Bayes in WEKA:

1. Load Dataset : Use a suitable dataset for classification.
1. Select Algorithm : Go to the `Classify` tab.
2. Choose Naive Bayes : Select `NaiveBayes` from the list of classifiers.
3. Set Parameters : Use default settings or adjust parameters.
4. Run Algorithm : Click `Start` to train the model.
5. View Results : Check the classification results and performance metrics.

12
13
6. Implementing K-Means Algorithm

K-Means is a clustering algorithm used to partition data into k clusters.

Steps to Implement K-Means in WEKA:

1. Load Dataset : Use a dataset suitable for clustering (e.g., `Iris.arff`).
1. Select Algorithm : Go to the `Cluster` tab.
2. Choose K-Means : Select `SimpleKMeans` from the list of clustering algorithms.
3. Set Parameters : Define the number of clusters (k) and other parameters.
4. Run Algorithm : Click `Start` to perform clustering.
5. View Results : Analyze the clusters and their centroids.

14
15
Why is Data Mining Important?
Organizations around the world are using data mining to:

 Understand customer behavior and preferences

 Detect fraud and anomalies
 Forecast business trends
 Improve decision-making
 Recommend products or services
 Analyze social and market trends
 Enhance operational efficiency

In essence, data mining helps turn data into insight, and insight into action.

Historical Context
Data Mining is not a new concept. It has evolved over time as a combination of
several older disciplines such as:
 Statistics: For data analysis and hypothesis testing
 Artificial Intelligence (AI): For learning patterns and making intelligent
decisions
 Machine Learning (ML): For building models that can predict future outcomes
 Database Management: For efficient storage and retrieval of data
With the growth of big data and cloud computing, data mining has become a
powerful tool to extract value from vast and complex datasets.

Key Concepts in Data Mining

 Classification: Assigning data to predefined classes (e.g., spam vs. not spam)
 Clustering: Grouping similar data points without predefined labels (e.g.,
customer segmentation)
 Association Rule Mining: Discovering relationships between variables (e.g.,
market basket analysis)

16
 Regression: Predicting continuous values (e.g., predicting house prices)
 Anomaly Detection: Identifying rare or unusual patterns (e.g., fraud detection)
 Sequential Pattern Mining: Finding sequences or patterns in data over time
(e.g., web clickstream analysis)

Applications of Data Mining

Data mining has a wide range of applications in different industries:
 Retail: Market basket analysis, sales forecasting, inventory management
 Banking and Finance: Risk assessment, fraud detection, credit scoring
 Healthcare: Disease prediction, patient diagnosis, treatment recommendation
 Education: Student performance analysis, dropout prediction
 Telecommunications: Customer churn prediction, network optimization
 Manufacturing: Fault detection, process optimization

Challenges in Data Mining

Despite its usefulness, data mining also faces several challenges:
 Data Quality: Dealing with missing, incorrect, or noisy data
 Privacy and Security: Protecting sensitive information during analysis
 Scalability: Handling very large datasets efficiently
 Interoperability: Making results understandable and actionable to users
 Ethical Use: Ensuring data mining practices are fair and non-discriminatory

In summary, Data Mining is a powerful and essential technology in the

information age. It allows organizations and individuals to extract meaningful
insights from massive datasets, leading to informed decisions and innovations
across various fields. As the amount of data continues to grow, the importance of
data mining will only increase, making it a vital area of study and research in
computer science, data science, and related fields.

17
Conclusion
In this lab, we have explored various data mining techniques using WEKA,
including association rule mining with Apriori and FP-Growth algorithms,
classification with ID3 and Bayesian algorithms, and clustering using K-Means.
WEKA provides a user-friendly interface for applying these algorithms, allowing for
effective data analysis and model building.

Lab Report Using WEKA Lab Report Using WEKA
No ratings yet
Lab Report Using WEKA Lab Report Using WEKA
21 pages
OS Journal
No ratings yet
OS Journal
28 pages
DMDV 210
No ratings yet
DMDV 210
63 pages
Weka Data Mining Lab Guide
No ratings yet
Weka Data Mining Lab Guide
20 pages
Data Analytics Using WEKA
No ratings yet
Data Analytics Using WEKA
65 pages
Exp 6
No ratings yet
Exp 6
9 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
WEKA Data Mining Course Overview
No ratings yet
WEKA Data Mining Course Overview
5 pages
Chapter 5 - The Application of WEKA Software
No ratings yet
Chapter 5 - The Application of WEKA Software
80 pages
Data Mining Practical
No ratings yet
Data Mining Practical
31 pages
Printing 1-3
No ratings yet
Printing 1-3
36 pages
Weka Activity Report
No ratings yet
Weka Activity Report
30 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
DA LabFile
No ratings yet
DA LabFile
63 pages
DMDV Main Manual
No ratings yet
DMDV Main Manual
35 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
More Data Mining With Weka: Ian H. Witten
No ratings yet
More Data Mining With Weka: Ian H. Witten
61 pages
X-CLUSTER: A Novel and Efficient Clustering Tool
No ratings yet
X-CLUSTER: A Novel and Efficient Clustering Tool
24 pages
WEKA Guide for ML Practitioners
No ratings yet
WEKA Guide for ML Practitioners
58 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
DMDV
No ratings yet
DMDV
22 pages
Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
DataMiningManual Sawan
No ratings yet
DataMiningManual Sawan
30 pages
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
No ratings yet
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
104 pages
DWDM Manual-1
No ratings yet
DWDM Manual-1
96 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
DMDV 210
No ratings yet
DMDV 210
61 pages
Data Mining Lab Report
No ratings yet
Data Mining Lab Report
36 pages
J48 & Naive Bayes Classification Guide
No ratings yet
J48 & Naive Bayes Classification Guide
3 pages
Weka
No ratings yet
Weka
15 pages
01 DM HAI Class1 2019 09 05
No ratings yet
01 DM HAI Class1 2019 09 05
77 pages
WEKA Data Analysis Guide
No ratings yet
WEKA Data Analysis Guide
85 pages
31 - Mustansar Ali-Project Report - Data Mining
No ratings yet
31 - Mustansar Ali-Project Report - Data Mining
17 pages
17 Jan SpectClustering PCQuest
No ratings yet
17 Jan SpectClustering PCQuest
4 pages
WEKA Lab Session
No ratings yet
WEKA Lab Session
88 pages
Data Mining Practical Guide
No ratings yet
Data Mining Practical Guide
27 pages
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
No ratings yet
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
4 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Big Data & Weka Tool Guide
No ratings yet
Big Data & Weka Tool Guide
32 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Individual Assignment 2
No ratings yet
Individual Assignment 2
4 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
Data Mining Lab
No ratings yet
Data Mining Lab
3 pages
Data Mining Lab Manual Student - Copy - For - Print
No ratings yet
Data Mining Lab Manual Student - Copy - For - Print
24 pages
A216 - DWM - LAb 8
No ratings yet
A216 - DWM - LAb 8
9 pages
DWM1
No ratings yet
DWM1
19 pages
Experiment 1: Installation of WEKA Tool Aim
No ratings yet
Experiment 1: Installation of WEKA Tool Aim
19 pages
Machine Learning With WEKA An Introduction
No ratings yet
Machine Learning With WEKA An Introduction
66 pages
R23-DWDM Syllabus
No ratings yet
R23-DWDM Syllabus
5 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
Ccs341 Datawarehousing
No ratings yet
Ccs341 Datawarehousing
66 pages
Bibek Final Project
No ratings yet
Bibek Final Project
41 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
8 pages
Memoir
No ratings yet
Memoir
19 pages
Presentation 1
No ratings yet
Presentation 1
13 pages
AI
No ratings yet
AI
5 pages
Tribhuvan University Faculty of Management Brightland College Nepalgunj
No ratings yet
Tribhuvan University Faculty of Management Brightland College Nepalgunj
15 pages
Java 22
No ratings yet
Java 22
7 pages
Unknown
No ratings yet
Unknown
420 pages
Battle Brothers: Legends Starter & Troubleshooting Guide: Discord Server Link: Legends Wiki
No ratings yet
Battle Brothers: Legends Starter & Troubleshooting Guide: Discord Server Link: Legends Wiki
14 pages
Machine Learning Guide for Developers
No ratings yet
Machine Learning Guide for Developers
54 pages
GIAC Catalog 2409
No ratings yet
GIAC Catalog 2409
20 pages
Agile Estimation for Developers
No ratings yet
Agile Estimation for Developers
5 pages
Termux PDF Bbgcih
No ratings yet
Termux PDF Bbgcih
11 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Game Enemy and Player Stats Guide
No ratings yet
Game Enemy and Player Stats Guide
25 pages
Data Sheet - Item Number: 750-8212/025-002 Controller PFC200 2nd Generation 2 X ETHERNET, RS-232/-485 Telecontrol Technology Ext. Temperature ECO
No ratings yet
Data Sheet - Item Number: 750-8212/025-002 Controller PFC200 2nd Generation 2 X ETHERNET, RS-232/-485 Telecontrol Technology Ext. Temperature ECO
20 pages
ECOSYS M3660idn Series Brochure
No ratings yet
ECOSYS M3660idn Series Brochure
4 pages
Porcupine Excel Assignment
No ratings yet
Porcupine Excel Assignment
25 pages
Instructivo Sumadi - Multilingua
No ratings yet
Instructivo Sumadi - Multilingua
11 pages
Licensing Guide Acronis Cyber Protect Cloud en US 241010
No ratings yet
Licensing Guide Acronis Cyber Protect Cloud en US 241010
56 pages
User Name & Password: Student-Name User-Name Password Grade Roll No. Category School Name-Campus-Address School-Code
0% (1)
User Name & Password: Student-Name User-Name Password Grade Roll No. Category School Name-Campus-Address School-Code
1 page
TK-8360/8360H TK-8360H U: Service Manual Revised
No ratings yet
TK-8360/8360H TK-8360H U: Service Manual Revised
65 pages
Unit-4 Virtual Reality
60% (5)
Unit-4 Virtual Reality
13 pages
MBA ERP Implementation Report
No ratings yet
MBA ERP Implementation Report
84 pages
SmallCellSIG02.04.19 David Rose Veea
No ratings yet
SmallCellSIG02.04.19 David Rose Veea
19 pages
Dormitary Managment System
No ratings yet
Dormitary Managment System
76 pages
Political Science Resume with Leadership & Tech Skills
No ratings yet
Political Science Resume with Leadership & Tech Skills
1 page
Bar Cutting Efficiency Report
No ratings yet
Bar Cutting Efficiency Report
12 pages
Etech Q1 M3
No ratings yet
Etech Q1 M3
11 pages
Load Management System
No ratings yet
Load Management System
40 pages
Text Search & Multimedia Retrieval
No ratings yet
Text Search & Multimedia Retrieval
22 pages
Caregiver Workbook V3 Module 1
No ratings yet
Caregiver Workbook V3 Module 1
12 pages
Software Requirements Specification - Payment Gateway
No ratings yet
Software Requirements Specification - Payment Gateway
4 pages
Nav2013 Enus Intro 03
No ratings yet
Nav2013 Enus Intro 03
30 pages
STD - 12 Chapter-8 Mcq's
No ratings yet
STD - 12 Chapter-8 Mcq's
2 pages
UKMT - IMC - Intermediate Mathematical Challenge 2007 - Solutions
No ratings yet
UKMT - IMC - Intermediate Mathematical Challenge 2007 - Solutions
10 pages
Grade 4 TOS
No ratings yet
Grade 4 TOS
2 pages

Dataminingg

Uploaded by

Dataminingg

Uploaded by

Tribhuvan University

Bachelor of Information Management

Under the Supervision of

Thank you all for your support and encouragement.

1. Learning the Basics of WEKA

● Explorer : For interactive data analysis.

Steps to Start Using WEKA:

Steps to Implement Apriori in WEKA:

1. Select Algorithm : Go to the `Associate` tab.

Dataset: breast-cancer.arff Algorithm:

Steps to Implement FP-Growth in WEKA:

The dataset includes the following attributes for shopping items:

● milk {yes, no}

ID3 (Iterative Dichotomous 3) is a decision tree algorithm used for classification.

Steps to Implement ID3 in WEKA:

Steps to Implement Naive Bayes in WEKA:

K-Means is a clustering algorithm used to partition data into k clusters.

Steps to Implement K-Means in WEKA:

 Understand customer behavior and preferences

Key Concepts in Data Mining

Applications of Data Mining

Challenges in Data Mining

In summary, Data Mining is a powerful and essential technology in the

You might also like