Tribhuvan University
Bachelor of Information Management
“DATA MINING”
At
Brightland College
A LAB REPORT
Submitted to
Department of Bachelor of Information Management
Brightland College, Nepalgunj.
In partial fulfillment of the requirements for the Bachelor of Information Management ( BIM )
Submitted by
Bibek Khadka ( Symbol No: 12899/20)
TU Registration No: 7-2-543-33-2020
Under the Supervision of
Bhuwan Rai
Tribhuvan University
Bachelor of Information Management, Nepalgunj
Supervisor’s Recommendation
I hereby recommend that this Lab report prepared under my supervision by “Bibek Khadka” ( Symbol
No. 12899/20) in partial fulfillment of the requirements for the degree of Bachelor of Information
Management ( BIM ).
……………………….
Supervisor
Bhuwan Rai
Assistant Teacher
Brightland College, Nepalgunj
Acknowledgement
We, the undersigned, take great pleasure in expressing our heartfelt gratitude to all who
have volunteered their time to assist us in achieving project success.
We want to convey our appreciation to everyone who made it possible for us to finish this
project. Our heartfelt gratitude to the BIM department of Brightland College for supplying
us with the necessary information that enabled us to effectively finish our assignment.
This project has been a significant milestone in our professional development, and we are
committed to utilizing the skills and knowledge we have gained to achieve our career goals.
We hope to continue to collaborate with all of you in the future.
Thank you all for your support and encouragement.
Table Of Contents.
Supervisor’s Recommendation .................................................................................................................. 1
Introduction to Data Mining ...................................................................................................................... 1
Introduction ................................................................................................................................................2
1. Learning the Basics of WEKA ...................................................................................................................... 2
2. Implementing the Apriori Algorithm .....................................................................................................3
3. Implementing FP-Growth Algorithm .................................................................................................... 6
4. Classifying Using ID3 Algorithm ........................................................................................................10
5. Classifying Data Using Bayesian Algorithms ..................................................................................... 12
6. Implementing K-Means Algorithm ......................................................................................................14
Why is Data Mining Important? .............................................................................................................. 16
Historical Context .................................................................................................................................... 16
Key Concepts in Data Mining ..................................................................................................................16
Applications of Data Mining ................................................................................................................... 17
Challenges in Data Mining ...................................................................................................................... 17
Conclusion ............................................................................................................................................... 18
Introduction to Data Mining
In today’s digital world, a massive amount of data is being generated every second—from social media,
banking, e-commerce, healthcare, education, and various other sectors. However, this raw data is often
unstructured, noisy, and overwhelming. Merely collecting data is not useful unless it is analyzed and
interpreted meaningfully. This is where Data Mining plays a crucial role.
Data Mining is the process of discovering patterns, trends, correlations, or useful information from
large datasets using various techniques from machine learning, statistics, and database systems. It is
sometimes called Knowledge Discovery in Databases (KDD). The goal is to convert raw data into
meaningful and actionable knowledge.
Data Mining involves several key steps such as data cleaning, data integration, data selection, data
transformation, pattern discovery, and knowledge presentation. These steps help in handling noisy,
incomplete, and inconsistent data to extract accurate and interesting information.
1
Introduction
WEKA (Waikato Environment for Knowledge Analysis) is a popular open-source tool for data mining
and machine learning. It provides a collection of algorithms and tools for data preprocessing,
classification, regression, clustering, and association rule mining. This report covers the implementation
of various data mining algorithms using WEKA, including the Apriori and FP-growth algorithms for
association rule mining, ID3 and Bayesian algorithms for classification, and K-means for clustering.
1. Learning the Basics of WEKA
WEKA provides a graphical user interface for interacting with datasets and applying machine learning
algorithms. Key components include:
● Explorer : For interactive data analysis.
● Experimenter : For performing experiments and evaluating results.
● Knowledge Flow : For visually designing machine learning workflows.
● Command Line Interface : For scripting and batch processing.
Steps to Start Using WEKA:
1. Download and Install : Get WEKA from its [official
website](https://www.cs.waikato.ac.nz/ml/weka/).
1. Load Data : Open WEKA and use the `Explorer` to load datasets in formats such as ARFF or
CSV.
2. Preprocess Data : Use various preprocessing tools to clean and prepare data.
2
2. Implementing the Apriori Algorithm
The Apriori algorithm is used for association rule mining, aiming to find frequent itemsets in
transactional data.
Steps to Implement Apriori in WEKA:
1. Load Dataset : Import a dataset suitable for association rule mining (e.g.,
`Supermarket.arff`).
1. Select Algorithm : Go to the `Associate` tab.
2. Choose Apriori : Select `Apriori` from the list of association algorithms.
3. Set Parameters : Adjust parameters such as `Support` and `Confidence`.
4. Run Algorithm : Click `Start` to run the Apriori algorithm.
5. View Results : Analyze the output rules and their support and confidence values.
Dataset: breast-cancer.arff Algorithm:
Apriori Parameters: Default
3
4
5
3. Implementing FP-Growth Algorithm
The FP-Growth algorithm is another approach to association rule mining that uses a frequent pattern
tree (FP-tree).
Steps to Implement FP-Growth in WEKA:
1. Load Dataset : Use the same or a different transactional dataset.
2.Select Algorithm : Go to the `Associate` tab.
3. Choose FP-Growth : Select `FPGrowth` from the list of association algorithms.
4. Set Parameters : Configure parameters like `MinSupport`.
5. Run Algorithm : Click `Start` to execute the FP-Growth algorithm.
6. View Results : Review the frequent itemsets and association rules.
Procedure
Step 1: Open the data file (Shopping.arff) in Weka Explorer. It assumes that the required data fields
have been discretized.
Step 2: Click on the "Associate" tab to bring up the interface for the association rule algorithm. Step 3:
Use the FP-Growth algorithm (the default algorithm in Weka for association rule mining).
Step 4: To modify parameters like support or confidence, you can adjust them in the text box located to
the right of the "Choose" button.
The dataset includes the following attributes for shopping items:
● milk {yes, no}
● bread {yes, no}
● honey {yes, no}
● ghee {yes, no}
● jam {yes, no}
6
7
8
9
4. Classifying Using ID3 Algorithm
ID3 (Iterative Dichotomous 3) is a decision tree algorithm used for classification.
Steps to Implement ID3 in WEKA:
1. Load Dataset :.Select vote.arff file from the “Open file” under the preprocess tab option.
2.Select Algorithm : Go to the “Classify” tab for classifying the unclassified data.
3.Choose ID3 :Click on the “Choose” button. From this, select “trees -> J48”. (an
implementation of the C4.5 algorithm, which is an extension of ID3).
4.Set Parameters : Adjust settings if necessary.
5.Run Algorithm : Click `Start` to build the decision tree.
6.View Results : Examine the decision tree and classification accuracy.To visualize the tree, right-
click on the result and select visualize the tree.
10
11
5. Classifying Data Using Bayesian Algorithms
Bayesian algorithms, such as the Naive Bayes classifier, are used for probabilistic classification.
Steps to Implement Naive Bayes in WEKA:
1. Load Dataset : Use a suitable dataset for classification.
1. Select Algorithm : Go to the `Classify` tab.
2. Choose Naive Bayes : Select `NaiveBayes` from the list of classifiers.
3. Set Parameters : Use default settings or adjust parameters.
4. Run Algorithm : Click `Start` to train the model.
5. View Results : Check the classification results and performance metrics.
12
13
6. Implementing K-Means Algorithm
K-Means is a clustering algorithm used to partition data into k clusters.
Steps to Implement K-Means in WEKA:
1. Load Dataset : Use a dataset suitable for clustering (e.g., `Iris.arff`).
1. Select Algorithm : Go to the `Cluster` tab.
2. Choose K-Means : Select `SimpleKMeans` from the list of clustering algorithms.
3. Set Parameters : Define the number of clusters (k) and other parameters.
4. Run Algorithm : Click `Start` to perform clustering.
5. View Results : Analyze the clusters and their centroids.
14
15
Why is Data Mining Important?
Organizations around the world are using data mining to:
Understand customer behavior and preferences
Detect fraud and anomalies
Forecast business trends
Improve decision-making
Recommend products or services
Analyze social and market trends
Enhance operational efficiency
In essence, data mining helps turn data into insight, and insight into action.
Historical Context
Data Mining is not a new concept. It has evolved over time as a combination of
several older disciplines such as:
Statistics: For data analysis and hypothesis testing
Artificial Intelligence (AI): For learning patterns and making intelligent
decisions
Machine Learning (ML): For building models that can predict future outcomes
Database Management: For efficient storage and retrieval of data
With the growth of big data and cloud computing, data mining has become a
powerful tool to extract value from vast and complex datasets.
Key Concepts in Data Mining
Classification: Assigning data to predefined classes (e.g., spam vs. not spam)
Clustering: Grouping similar data points without predefined labels (e.g.,
customer segmentation)
Association Rule Mining: Discovering relationships between variables (e.g.,
market basket analysis)
16
Regression: Predicting continuous values (e.g., predicting house prices)
Anomaly Detection: Identifying rare or unusual patterns (e.g., fraud detection)
Sequential Pattern Mining: Finding sequences or patterns in data over time
(e.g., web clickstream analysis)
Applications of Data Mining
Data mining has a wide range of applications in different industries:
Retail: Market basket analysis, sales forecasting, inventory management
Banking and Finance: Risk assessment, fraud detection, credit scoring
Healthcare: Disease prediction, patient diagnosis, treatment recommendation
Education: Student performance analysis, dropout prediction
Telecommunications: Customer churn prediction, network optimization
Manufacturing: Fault detection, process optimization
Challenges in Data Mining
Despite its usefulness, data mining also faces several challenges:
Data Quality: Dealing with missing, incorrect, or noisy data
Privacy and Security: Protecting sensitive information during analysis
Scalability: Handling very large datasets efficiently
Interoperability: Making results understandable and actionable to users
Ethical Use: Ensuring data mining practices are fair and non-discriminatory
In summary, Data Mining is a powerful and essential technology in the
information age. It allows organizations and individuals to extract meaningful
insights from massive datasets, leading to informed decisions and innovations
across various fields. As the amount of data continues to grow, the importance of
data mining will only increase, making it a vital area of study and research in
computer science, data science, and related fields.
17
Conclusion
In this lab, we have explored various data mining techniques using WEKA,
including association rule mining with Apriori and FP-Growth algorithms,
classification with ID3 and Bayesian algorithms, and clustering using K-Means.
WEKA provides a user-friendly interface for applying these algorithms, allowing for
effective data analysis and model building.
18