0% found this document useful (0 votes)

30 views6 pages

Exp 7

LAB EXPERIMENT OF DATA WAREHOUSE

Uploaded by

Kratos grime

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views6 pages

Exp 7

LAB EXPERIMENT OF DATA WAREHOUSE

Uploaded by

Kratos grime

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

LAB Manual

PART A
(PART A: TO BE REFFERED BY STUDENTS)

Experiment No.07

A.1 Aim:
Implementation of K-means clustering using JAVA or WEKA.

A.2 Prerequisite:
Familiarity with the WEKA tool and programming languages.

A.3 Outcome:
After successful completion of this experiment students will be able to
➢ Use classification and clustering algorithms of data mining.

A.4 Theory:
THEORY:
K -means is one of the simplest unsupervised learning algorithms that solve the well-known
clustering problem. The procedure follows a simple and easy way to classify a given data
set through a certain number of clusters (assume k clusters) fixed apriori. The main idea is to
define k centers, one for each cluster. These centers should be placed in a cunning way because
of different location causes different result. So, the better choice is to place them as much as
possible far away from each other. The next step is to take each point belonging to a given data
set and associate it to the nearest center. When no point is pending, the first step is completed
and an early group age is done. At this point we need to re-calculate k new centroids as
barycenter of the clusters resulting from the previous step. After we have these k new centroids,
a new binding has to be done between the same data set points and the nearest new center. A
loop has been generated. As a result of this loop we may notice that the k centers change their
location step by step until no more changes are done or in other words centers do not move
any more.

Algorithmic steps for k-means clustering

Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} be the set of
centers.
1) Randomly select ‘c’ cluster centers.
2) Calculate the distance between each data point and cluster centers.
3) Assign the data point to the cluster center whose distance from the cluster center is
minimum of all the cluster centers.
4) Recalculate the new cluster center using:
where, ‘ci’ represents the number of data points in ith cluster.
5) Recalculate the distance between each data point and new obtained cluster centers.
6) If no data point was reassigned then stop, otherwise repeat from step 3).
PART B
(PART B: TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the
practical. The soft copy must be uploaded on the Blackboard or emailed to the concerned
lab in charge faculties at the end of the practical in case the there is no Black board access
available)

Roll. No Name:
Class: Batch:
Date of Experiment: Date of Submission:
Grade:

B.1 Software Code written by student:

(Paste your problem statement related to your case study completed during the 2 hours of practical in
the lab here)

B.2 Input and Output:

(Paste diagram of star schema and snowflake schema model related to your case study in following
format )
B.3 Observations and learning:
We understood the concept of clustering

B.4 Conclusion:
Implementation of K-means clustering using WEKA.

B.5 Question of Curiosity

(To be answered by student based on the practical performed and learning/observations)

Q1: What is Clustering? Types of clustering? Explain advantages and disadvantages of

clustering.

Clustering is a type of unsupervised learning method of machine learning.

In the unsupervised learning method, the inferences are drawn from the
data sets which do not contain labelled output variable. It is an exploratory
data analysis technique that allows us to analyze the multivariate data sets.
Types:
• Hard Clustering: In hard clustering, each data point either belongs to a
cluster completely or not. For example, in the above example each customer
is put into one group out of the 10 groups.
• Soft Clustering: In soft clustering, instead of putting each data point into a
separate cluster, a probability or likelihood of that data point to be in those
clusters is assigned. For example, from the above scenario each costumer
is assigned a probability to be in either of 10 clusters of the retail store.

The main advantage of a clustered solution is automatic recovery from failure, that
is, recovery without user intervention. Disadvantages of clustering are complexity and
inability to recover from database corruption.

Q2: Give the advantages and disadvantages of K- means clustering.

Advantages of k-means

Relatively simple to implement.

Scales to large data sets.

Guarantees convergence.

Can warm-start the positions of centroids.

Easily adapts to new examples.

Generalizes to clusters of different shapes and sizes, such as elliptical

clusters.

Disadvantages of k-means
Choosing k manually.

Use the “Loss vs. Clusters” plot to find the optimal (k), as discussed in Interpret
Results.

Being dependent on initial values.

For a low k, you can mitigate this dependence by running k-means several times
with different initial values and picking the best result. As k increases, you need
advanced versions of k-means to pick better values of the initial centroids (called k-
means seeding). For a full discussion of k- means seeding see, A Comparative
Study of Efficient Initialization Methods for the K-Means Clustering Algorithm by M.
Emre Celebi, Hassan A. Kingravi, Patricio A. Vela.

Clustering data of varying sizes and density.

k-means has trouble clustering data where clusters are of varying sizes and density.
To cluster such data, you need to generalize k-means as described in
the Advantages section.
Clustering outliers.

Centroids can be dragged by outliers, or outliers might get their own cluster instead
of being ignored. Consider removing or clipping outliers before clustering.

Scaling with number of dimensions.

As the number of dimensions increases, a distance-based similarity measure

converges to a constant value between any given examples. Reduce dimensionality
either by using PCA on the feature data, or by using “spectral clustering” to modify
the clustering algorithm as explained below.

Q3: How is the number of cluster chosen?

The optimal number of clusters can be defined as follow: Compute clustering
algorithm (e.g., k-means clustering) for different values of k. For instance, by
varying k from 1 to 10 clusters. For each k, calculate the total within-cluster sum of
square (wss).

DWM Exp7 C49
No ratings yet
DWM Exp7 C49
11 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Algo
No ratings yet
Algo
59 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
K Clustering
No ratings yet
K Clustering
28 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
K-Means Clustering Guide 2023
No ratings yet
K-Means Clustering Guide 2023
14 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
ML PR 5
No ratings yet
ML PR 5
23 pages
Kmean Clustering
No ratings yet
Kmean Clustering
3 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
No ratings yet
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
3 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Na 2010
No ratings yet
Na 2010
5 pages
K Means Clustering
No ratings yet
K Means Clustering
29 pages
Clustering
No ratings yet
Clustering
17 pages
ML (Unit 4)
No ratings yet
ML (Unit 4)
19 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
32 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
7 Clustering1
No ratings yet
7 Clustering1
72 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
Clustering and Dimensionality Reduction
No ratings yet
Clustering and Dimensionality Reduction
58 pages
K Means
No ratings yet
K Means
40 pages
Clustering
No ratings yet
Clustering
125 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
4 pages
Module 4-1
No ratings yet
Module 4-1
153 pages
Session 3-Clustering
No ratings yet
Session 3-Clustering
41 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
12 pages
Presentation: Operating System Concept CS-582
No ratings yet
Presentation: Operating System Concept CS-582
13 pages
Unit 4
No ratings yet
Unit 4
63 pages
Exp 5 ML
No ratings yet
Exp 5 ML
9 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
45 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
Unit 4
No ratings yet
Unit 4
29 pages
K Means Clustering
No ratings yet
K Means Clustering
90 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
An Efficient Incremental Clustering Algorithm
No ratings yet
An Efficient Incremental Clustering Algorithm
3 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Unit 4
No ratings yet
Unit 4
22 pages
K Mean
No ratings yet
K Mean
7 pages
ML13 Kmeans
No ratings yet
ML13 Kmeans
11 pages
DWM Exp8 C49
No ratings yet
DWM Exp8 C49
10 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
24 pages
PART2
No ratings yet
PART2
61 pages
ML Application in Signal Processing and Communication Engineering
No ratings yet
ML Application in Signal Processing and Communication Engineering
27 pages
Psychology in Modules 12th Edition Myers Fast Access
0% (1)
Psychology in Modules 12th Edition Myers Fast Access
298 pages
Applications of Artificial Intelligence
No ratings yet
Applications of Artificial Intelligence
6 pages
Advanced Programming-NETWORK CH-04
No ratings yet
Advanced Programming-NETWORK CH-04
31 pages
Webfocus - 52 Enhancemnts
No ratings yet
Webfocus - 52 Enhancemnts
544 pages
Computing Security Exam Guide
No ratings yet
Computing Security Exam Guide
5 pages
Promax Training
100% (1)
Promax Training
89 pages
sm-g361h Ds
No ratings yet
sm-g361h Ds
44 pages
One Voice Operations Center Users Manual Ver 76
No ratings yet
One Voice Operations Center Users Manual Ver 76
306 pages
FB-Pier Users Guide and Manual: For The Analysis of Group Pile Foundations
No ratings yet
FB-Pier Users Guide and Manual: For The Analysis of Group Pile Foundations
212 pages
Chapter One
No ratings yet
Chapter One
66 pages
CCTV HANDBOOK New
No ratings yet
CCTV HANDBOOK New
36 pages
Imaging and Design For Visual Message Using Infographics: LAS For Empowerment Technologies (Grade 11)
No ratings yet
Imaging and Design For Visual Message Using Infographics: LAS For Empowerment Technologies (Grade 11)
3 pages
CNS Engineer CV - Muhammad Faisal Jamil
50% (2)
CNS Engineer CV - Muhammad Faisal Jamil
3 pages
Huawei BB Schematic Overview
No ratings yet
Huawei BB Schematic Overview
43 pages
CT43B0513 Ieee
No ratings yet
CT43B0513 Ieee
6 pages
Module 1
No ratings yet
Module 1
19 pages
Scope User Guide
No ratings yet
Scope User Guide
428 pages
Raymond 9000 Series Swing-Reach
No ratings yet
Raymond 9000 Series Swing-Reach
16 pages
Assessment 3 Case Study MIS200
No ratings yet
Assessment 3 Case Study MIS200
6 pages
Resume 1
No ratings yet
Resume 1
2 pages
HCM Learning Cloud FAQ Overview
No ratings yet
HCM Learning Cloud FAQ Overview
7 pages
GR 11 BST Chap 5
50% (2)
GR 11 BST Chap 5
16 pages
SiCK-68 Cost Breakdown
No ratings yet
SiCK-68 Cost Breakdown
2 pages
OmniProx Readers
No ratings yet
OmniProx Readers
2 pages
List of AI Project Problem Statement
No ratings yet
List of AI Project Problem Statement
26 pages
Fast Switching Emitter Controlled Diode
No ratings yet
Fast Switching Emitter Controlled Diode
8 pages
Adobe Scan 4 Aug 2023
No ratings yet
Adobe Scan 4 Aug 2023
24 pages
101 Maths Shortcuts Afcat Nda Cds Ta Upsc
No ratings yet
101 Maths Shortcuts Afcat Nda Cds Ta Upsc
1 page
ActivityInfo Roles and Responsibilities 2023-24
No ratings yet
ActivityInfo Roles and Responsibilities 2023-24
5 pages
SPM Unit-3
No ratings yet
SPM Unit-3
26 pages

Exp 7

Uploaded by

Exp 7

Uploaded by

LAB Manual

Algorithmic steps for k-means clustering

B.1 Software Code written by student:

B.2 Input and Output:

B.5 Question of Curiosity

Q1: What is Clustering? Types of clustering? Explain advantages and disadvantages of

Clustering is a type of unsupervised learning method of machine learning.

Q2: Give the advantages and disadvantages of K- means clustering.

Relatively simple to implement.

Scales to large data sets.

Can warm-start the positions of centroids.

Easily adapts to new examples.

Generalizes to clusters of different shapes and sizes, such as elliptical

Being dependent on initial values.

Clustering data of varying sizes and density.

Scaling with number of dimensions.

As the number of dimensions increases, a distance-based similarity measure

Q3: How is the number of cluster chosen?

You might also like