0% found this document useful (0 votes)

75 views1 page

Clustering MMD

The document outlines an assignment to cluster datasets using Apache Pyspark. Students must perform K-means and bisecting K-means clustering on multiple datasets, test different values of K, and analyze clustering quality and runtime. Results and Python code must be submitted along with a report analyzing findings.

Uploaded by

A N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views1 page

Clustering MMD

Uploaded by

A N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Assignment 2

Clustering
Mining of Massive Datasets Spring 2024
Due Date: 27th March 2024
Submission: Google Classroom
In this assignment, you have to cluster the datasets provided to you using Apache Pyspark. You have to submit your
Python code and a Word document explaining and analyzing your results and findings.

1. Perform Kmeans Clustering using your own Pyspark code on dataset DS1 (you can use the code provided in class and
modify it according to your requirements).
a. Run K-means for different values of K.
i. For each value of K, run K-means multiple times.
ii. Report your findings (error in each clustering, the time required, K that gives the best result, and the
number of iterations to convergence for different runs.)
b. Examine the quality of clusters and also of clusterings.
i. Report the errors: within-cluster sum of squared error (WSSE), between-cluster sum of the square
error (BSSE), and silhouette coefficient (SC) for each run of K-mean. Write your PySPARK code to
calculate BSSE, WSSE, and SC.

2. Perform BISECTING Kmeans Clustering using your own Pyspark code on dataset DS1.
a. Run BISECTING Kmeans for different values of K.
i. For each value of K, run K-means multiple times.
ii. Report your findings (error in each clustering, the time required, K that gives the best result)
b. Examine the quality of clusters and also of clusterings.
i. Report the errors: within-cluster sum of squared error (WSSE), between-cluster sum of the square
error (BSSE), and silhouette coefficient (SC) for each run of K-mean. Write your PySPARK code to
calculate BSSE, WSSE, and SC.

3. Perform K-MEANS clustering using PYSPARK MLLIB Kmeans function on the given dataset DS2, DS3.
a. Use the Silhouette method to find the optimal value of K.
i. Run K-means multiple times for optimal K. Report your findings (error in each clustering, the time
required, the number of iterations to convergence for different runs.)
ii. Report the errors: within-cluster sum of squared error (WSSE), between-cluster sum of the square
error (BSSE), and silhouette coefficient (SC) for each run of K-mean. Use PYSPARK MLLIB library for
calculating BSSE, WSSE, and SC.
b. RUN Kmeans with K greater than the optimal K and post-process to improve the clustering results. Post-
processing can help when clusters are of different sizes, densities, or shapes.

4. Repeat Part 3 above using the Bisecting Kmeans clustering function provided in PYSPARK MLLIB.

5. Compare the clustering results of the K-means and Bisecting K-means for all the datasets.

NOTE: Draw different plots to visualize the clustering results and include plots in your report.

AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
K-Means Clustering Python Guide
No ratings yet
K-Means Clustering Python Guide
3 pages
SOLUTION ONLY CODE DWDM - Lab - All
No ratings yet
SOLUTION ONLY CODE DWDM - Lab - All
8 pages
Experiment 4 1
No ratings yet
Experiment 4 1
4 pages
KMeans Clustering for Universities
No ratings yet
KMeans Clustering for Universities
9 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
Python K-Means Clustering Guide
No ratings yet
Python K-Means Clustering Guide
6 pages
Clustering Assignment Guide
No ratings yet
Clustering Assignment Guide
2 pages
ML0101EN Clus DBSCN Weather Py v1
No ratings yet
ML0101EN Clus DBSCN Weather Py v1
16 pages
Bil570 hw3 Summer2020
No ratings yet
Bil570 hw3 Summer2020
3 pages
Assignment 10
100% (1)
Assignment 10
3 pages
IDM Assignment
No ratings yet
IDM Assignment
15 pages
Imkpğ
No ratings yet
Imkpğ
3 pages
DADV Exp-5
No ratings yet
DADV Exp-5
3 pages
Application of Linear Algebra
No ratings yet
Application of Linear Algebra
7 pages
Assignment 2.1
No ratings yet
Assignment 2.1
2 pages
Seminar 10
No ratings yet
Seminar 10
3 pages
Aiml Assignment 10
No ratings yet
Aiml Assignment 10
6 pages
Apriori Algorithm & Clustering Guide
No ratings yet
Apriori Algorithm & Clustering Guide
8 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
Sklearn Kmeans Dbscan Guide
No ratings yet
Sklearn Kmeans Dbscan Guide
2 pages
Data Enggineering
No ratings yet
Data Enggineering
16 pages
CS7267-HW 1
No ratings yet
CS7267-HW 1
2 pages
MLT 8 KK
No ratings yet
MLT 8 KK
2 pages
51 DA5400 - FML51 - 20250501 ProblemSet06
No ratings yet
51 DA5400 - FML51 - 20250501 ProblemSet06
4 pages
ML - LAB 2 - Jupyter Notebook
No ratings yet
ML - LAB 2 - Jupyter Notebook
9 pages
09-Clustering - Ipynb - Colab
No ratings yet
09-Clustering - Ipynb - Colab
10 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
22 pages
Clustering in Python-Dr. Afsaneh Javadi
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi
8 pages
Exp 6
No ratings yet
Exp 6
10 pages
Clustering
No ratings yet
Clustering
43 pages
Dmaclat4 Merged
No ratings yet
Dmaclat4 Merged
46 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
K-Means Clustering for CS Students
No ratings yet
K-Means Clustering for CS Students
30 pages
K-Means for Data Engineers
No ratings yet
K-Means for Data Engineers
5 pages
Big Data Clustering Assignment
100% (1)
Big Data Clustering Assignment
7 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
CS60050 - Machine Learning - Programming Assignment - 3
No ratings yet
CS60050 - Machine Learning - Programming Assignment - 3
5 pages
COMP 4710 Assignment 1 - Clustering Total Marks
No ratings yet
COMP 4710 Assignment 1 - Clustering Total Marks
2 pages
ML Solution
No ratings yet
ML Solution
60 pages
KDD WS 24 25 E4 Clustering I
No ratings yet
KDD WS 24 25 E4 Clustering I
2 pages
DMDW Lab8
No ratings yet
DMDW Lab8
3 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Assignment8 ML)
No ratings yet
Assignment8 ML)
4 pages
Major
No ratings yet
Major
3 pages
Baidurya Debnath 4
No ratings yet
Baidurya Debnath 4
37 pages
Python 4
No ratings yet
Python 4
7 pages
20 ENG 016 Assignment 8
No ratings yet
20 ENG 016 Assignment 8
4 pages
Kunal DSML Laboratory Record - Format
No ratings yet
Kunal DSML Laboratory Record - Format
58 pages
20bcs7635-EXP 10
No ratings yet
20bcs7635-EXP 10
5 pages
HW5 Clustering (50 PTS) : Test Algorithms
No ratings yet
HW5 Clustering (50 PTS) : Test Algorithms
5 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
51 pages
MLFILE
No ratings yet
MLFILE
21 pages
3.unsupervised Learning
No ratings yet
3.unsupervised Learning
9 pages
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
100% (19)
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
50 pages
Unit 3 Unsupervised Learning
No ratings yet
Unit 3 Unsupervised Learning
9 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Digital Logic Design Exam Key
No ratings yet
Digital Logic Design Exam Key
5 pages
OS Quiz 1 (Chapters 1-3) Flashcards - Quizlet
No ratings yet
OS Quiz 1 (Chapters 1-3) Flashcards - Quizlet
12 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
Chap3 Slides Week4
No ratings yet
Chap3 Slides Week4
42 pages
Chap2 Slides Week3
No ratings yet
Chap2 Slides Week3
28 pages
Excel Solver Optimization Report
No ratings yet
Excel Solver Optimization Report
12 pages
Numerical Methods1
No ratings yet
Numerical Methods1
3 pages
Numerical Analysis For Engineer - 2
No ratings yet
Numerical Analysis For Engineer - 2
8 pages
Math10 q1 Mod5 Long and Synthetic Division Laila Kiwisen Bgo v2
No ratings yet
Math10 q1 Mod5 Long and Synthetic Division Laila Kiwisen Bgo v2
19 pages
Ge330fall09 Dualsimplex Postoptimal11 PDF
No ratings yet
Ge330fall09 Dualsimplex Postoptimal11 PDF
18 pages
Chapter 5 Transportation Problems
No ratings yet
Chapter 5 Transportation Problems
40 pages
Polynomials Practice Worksheet
No ratings yet
Polynomials Practice Worksheet
2 pages
Class 10 Polynomials WS
No ratings yet
Class 10 Polynomials WS
15 pages
Numerical Solutions
No ratings yet
Numerical Solutions
2 pages
Chapter 3 - Solving Problems by Searching Concise
No ratings yet
Chapter 3 - Solving Problems by Searching Concise
67 pages
Graph Theory Q5
100% (1)
Graph Theory Q5
2 pages
Assignment
No ratings yet
Assignment
8 pages
Wa0005
No ratings yet
Wa0005
4 pages
Offline 02 CSE318
No ratings yet
Offline 02 CSE318
6 pages
HW AP PC Unit 1 MCQs
No ratings yet
HW AP PC Unit 1 MCQs
6 pages
Linear Programming for Retail & Transport
No ratings yet
Linear Programming for Retail & Transport
59 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
MATHS - 6th MAY POLYNOMIALS - Assignment
No ratings yet
MATHS - 6th MAY POLYNOMIALS - Assignment
3 pages
Quadratic Equation Class 10 TH
0% (1)
Quadratic Equation Class 10 TH
15 pages
Hwdivmono 1
No ratings yet
Hwdivmono 1
1 page
Numerical Analysis: NA Team 2024
No ratings yet
Numerical Analysis: NA Team 2024
17 pages
BFS and DFS Implementation Lab
No ratings yet
BFS and DFS Implementation Lab
6 pages
Cbjemapu 02
No ratings yet
Cbjemapu 02
8 pages
Unacademy Class 1 - Wavy Curve Method - Inequality PDF
No ratings yet
Unacademy Class 1 - Wavy Curve Method - Inequality PDF
9 pages
Type Example Tip Polynomial Radical Rational: Rochie Roasa, TVL Advance 11-Pearl General Mathematics Functions
No ratings yet
Type Example Tip Polynomial Radical Rational: Rochie Roasa, TVL Advance 11-Pearl General Mathematics Functions
1 page
MA5102 Final - 2021
No ratings yet
MA5102 Final - 2021
4 pages
Mining Association Rules Guide
No ratings yet
Mining Association Rules Guide
8 pages
MCS 21
No ratings yet
MCS 21
4 pages
Quicksort: Quicksort: Advantages and Disadvantages Quicksort
No ratings yet
Quicksort: Quicksort: Advantages and Disadvantages Quicksort
15 pages

Clustering MMD

Uploaded by

Clustering MMD

Uploaded by

Assignment 2

You might also like