0% found this document useful (0 votes)

42 views6 pages

Mlsec Solution Exercise Sheet 7

Uploaded by

laribiamal24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views6 pages

Mlsec Solution Exercise Sheet 7

Uploaded by

laribiamal24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Machine Learning for Computer Security

Exercise Sheet 7
Summer term 2024

Exercise 1 Lloyd algorithm

We begin by examining Lloyd’s algorithm as an example of a partition-based clustering
approach. To this end, let us consider a set of points and two centroids, c1 and c2 :

2 c1 c2

0
0 1 2 3 4 5 6 7 8

1. Which points are assigned to each centroid? Assume that the Euclidean distance is
used to measure the distance between points.

2. What are the new centroids after the first update step?

1
Solution 1
1.

c1 → (1, 2), (2, 1)

c2 → (4, 3), (7, 3), (6, 2), (7, 1)

2.
1.5 6
c1 = , c2 =
1.5 2.25

2
Exercise 2 K-means clustering
Building on the previous exercise, we now want to implement k-means clustering using
the Lloyd algorithm. On ISIS you can find a template to help you get started. Follow
the steps below:

1. First, write a function initialize that returns k 2-dimensional random centroids.

Sample the vectors from the set [xmin , xmax ] × [ymin , ymax ].
Hint: Numpy provides a function numpy.random.random that you can use.

2. Next, implement a function squared euclidean distances that returns an array

distances from a set of data points Z to a given centroid c, where distances[i]
is the distance of data point Zi to the centroid c.

3. To assign each Zi to its closest centroid, implement a function assignment step.

The function returns a list clusters, in which clusters[i] denotes the cluster
the datapoint Zi is assigned to.
Hint: Make use of squared euclidean distances.

4. To complete the algorithm, implement a function update step that updates the
centroid positions by calculating the mean of the vectors of each cluster. The
function returns the centroids as an k × 2 matrix.

5. Finally, run the k-means algorithm for different values of k and N , where k is the
number of centroids and N is the number of Gaussians used in the generation of
the data set.

3
Exercise 3 Linkage clustering
A different approach to partition-based clustering is hierarchical clustering. Assume that
the following points are given:

b
a

c d e

1. Perform a single-linkage clustering using the Euclidean distance to measure the

distance between the clusters. Draw the corresponding dendrogram.

2. Assume you want to have three clusters, which points would be grouped together?

3. If we use complete-linkage for clustering the points instead, how would this affect
the resulting dendrogram?

4
Solution 3
1. With single-linkage clustering, points are clustered based on the minimum distance
between points in different clusters:

(a), (b), (c), (d), (e), (f)

→ (a, b), (c), (d), (e), (f)
→ (a, b), (c), (d, e), (f)
→ (a, b), (c, d, e), (f)
→ (a, b, c, d, e), (f)
→ (a, b, c, d, e, f)

Thus, the dendrogram follows as:

a b c d e f

2. To have three clusters, we should cut the dendrogram at the level below D. This
results in clusters (a, b), (c, d, e), and (f).

3. If we use complete linkage, the resulting dendrogram is no longer unique, as c has

the same distance to b as to e.

5
Exercise 4 Malware clustering
In this task, we aim to apply the concepts from the previous exercises to categorize
malicious software based on its behavior. To accomplish this, use the dataset and template
provided on ISIS. Each line of the data file contains the Windows registry keys and mutex
names utilized by a malware program during its execution. Several of these programs
belong to the same malware family and can therefore be grouped together.
You can build upon your k-means implementation developed in the previous task.
Please select a suitable feature representation and justify your choice. You can verify
your solution by plotting the clusters and examining common features. The following
hints may guide you through your analysis:

1. Think about how to construct a suitable feature space. For instance, you could
follow a bag-of-words approach.

2. Your implemented clustering method only works on 2-dimensional data. To project

your high-dimensional data into a 2-dimensional space, you can use the Principal
Component Analysis (PCA) algorithm implementation provided in the template.
Note, however, that PCA works best on mean-centered data.

3. The template includes some enhancements that you might want to experiment with
(e.g., token filtering/normalization, scaling).

4. In the case of outliers, consider initializing your k-means implementation by selecting

random samples from the dataset as the initial centroids instead of using random
values within the range of the minimum and maximum values.

Exp 5 ML
No ratings yet
Exp 5 ML
9 pages
Unsupervised ML: Clustering Guide
No ratings yet
Unsupervised ML: Clustering Guide
10 pages
Tutorial Series 4: Exercise 1
No ratings yet
Tutorial Series 4: Exercise 1
1 page
Learn Lab3
No ratings yet
Learn Lab3
12 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
16 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
ML PR 5
No ratings yet
ML PR 5
23 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
K-Means Clustering Clustering Algorithms Implementation and Comparison
No ratings yet
K-Means Clustering Clustering Algorithms Implementation and Comparison
4 pages
Tutorial Exercises Clustering - K-Means, Nearest Neighbor and Hierarchical
No ratings yet
Tutorial Exercises Clustering - K-Means, Nearest Neighbor and Hierarchical
7 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Day 3
No ratings yet
Day 3
74 pages
Lec 2
No ratings yet
Lec 2
32 pages
Exercises695Clus Solution - Doc Exercises695Clus Solution
No ratings yet
Exercises695Clus Solution - Doc Exercises695Clus Solution
7 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
Clustering
No ratings yet
Clustering
55 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
12 pages
K-Means Clustering Python Guide
No ratings yet
K-Means Clustering Python Guide
3 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Data Mining
No ratings yet
Data Mining
27 pages
Clustering Techniques Guide
No ratings yet
Clustering Techniques Guide
5 pages
Data Clustering for Informatics Students
No ratings yet
Data Clustering for Informatics Students
5 pages
ML-Lab Programs - VTU
No ratings yet
ML-Lab Programs - VTU
5 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Lab Manual 6
No ratings yet
Lab Manual 6
10 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Clustering
No ratings yet
Clustering
35 pages
Task 2
No ratings yet
Task 2
3 pages
Clustering in Python-Dr. Afsaneh Javadi
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi
8 pages
Peer Eval
No ratings yet
Peer Eval
6 pages
Experiment 9
No ratings yet
Experiment 9
10 pages
Cluster Analysis Techniques Guide
No ratings yet
Cluster Analysis Techniques Guide
37 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Zara
No ratings yet
Zara
47 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
No ratings yet
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
12 pages
STAT452 Project1
No ratings yet
STAT452 Project1
13 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
EXP-6 K Mean Clustring
No ratings yet
EXP-6 K Mean Clustring
6 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
KDD WS 24 25 E4 Clustering I
No ratings yet
KDD WS 24 25 E4 Clustering I
2 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
COVID-19 Clustering Project Report
No ratings yet
COVID-19 Clustering Project Report
19 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
73 pages
ML Unit-5
No ratings yet
ML Unit-5
31 pages
Analysis&Comparisonof Efficient Techniquesof
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
5 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
K-Means Clustering - MATLAB Kmeans
No ratings yet
K-Means Clustering - MATLAB Kmeans
23 pages
0006 - K Means Clustering - Introduction - 2025
No ratings yet
0006 - K Means Clustering - Introduction - 2025
19 pages
Python K-Means Clustering Guide
No ratings yet
Python K-Means Clustering Guide
6 pages
1 - Software Quality Engineering
100% (2)
1 - Software Quality Engineering
34 pages
Autocad 2018 Tips and Tricks en
No ratings yet
Autocad 2018 Tips and Tricks en
21 pages
OFI GST TDS Configuration Guide
50% (4)
OFI GST TDS Configuration Guide
26 pages
Engineering Design & CAD/CAM Guide
No ratings yet
Engineering Design & CAD/CAM Guide
25 pages
TIKTOK Algorithm - How To Get To VIRAL in 24 Hours (0 - 100k Fans)
87% (15)
TIKTOK Algorithm - How To Get To VIRAL in 24 Hours (0 - 100k Fans)
29 pages
Methods To Determine The Number of Plates in Distillation
No ratings yet
Methods To Determine The Number of Plates in Distillation
5 pages
Web Based Application Literature Review
100% (3)
Web Based Application Literature Review
7 pages
Obiee Admin Interview Questions
100% (3)
Obiee Admin Interview Questions
13 pages
ArchiSurance Case Study - v2
No ratings yet
ArchiSurance Case Study - v2
43 pages
Best Sites To Buy Google Voice Accounts
No ratings yet
Best Sites To Buy Google Voice Accounts
9 pages
Microsoft Dynamics CRM IG Operating
No ratings yet
Microsoft Dynamics CRM IG Operating
59 pages
Honeywell Modbus TCP Firewall Rev D Partner Upgrade Release Notes v1.0
No ratings yet
Honeywell Modbus TCP Firewall Rev D Partner Upgrade Release Notes v1.0
14 pages
iQA Series: Portable Passive Intermodulation Analyzer
No ratings yet
iQA Series: Portable Passive Intermodulation Analyzer
2 pages
MS Word 2003 Formatting Guide
71% (14)
MS Word 2003 Formatting Guide
2 pages
NFC Reader Design - How To Build Your Own Reader
No ratings yet
NFC Reader Design - How To Build Your Own Reader
42 pages
53 1000114 02 FOS Password Recovery PDF
No ratings yet
53 1000114 02 FOS Password Recovery PDF
14 pages
Syllabus BCA 21
No ratings yet
Syllabus BCA 21
26 pages
Speech To Text Conversion
No ratings yet
Speech To Text Conversion
6 pages
Xethru Serial Protocol: Notation
No ratings yet
Xethru Serial Protocol: Notation
7 pages
BI Data Analyst Path Guide
No ratings yet
BI Data Analyst Path Guide
27 pages
Modul 1 Linux Island
No ratings yet
Modul 1 Linux Island
8 pages
MP Unit 5
No ratings yet
MP Unit 5
76 pages
WAPCOS Programmer Contract Jobs
No ratings yet
WAPCOS Programmer Contract Jobs
2 pages
The Information Age
No ratings yet
The Information Age
17 pages
Emerson Ovation Connectivity
No ratings yet
Emerson Ovation Connectivity
3 pages
Lifestyle Vision Board: Manifest and Brainstorm With This Whiteboard
No ratings yet
Lifestyle Vision Board: Manifest and Brainstorm With This Whiteboard
17 pages
STD - 12 Chapter-8 Mcq's
No ratings yet
STD - 12 Chapter-8 Mcq's
2 pages
Blue Pelicon (Java)
No ratings yet
Blue Pelicon (Java)
543 pages
Intrusion Detection Systems
No ratings yet
Intrusion Detection Systems
29 pages
OS Lab Programs for Students
No ratings yet
OS Lab Programs for Students
51 pages

Mlsec Solution Exercise Sheet 7

Uploaded by

Mlsec Solution Exercise Sheet 7

Uploaded by

Machine Learning for Computer Security

Exercise 1 Lloyd algorithm

c1 → (1, 2), (2, 1)

1. First, write a function initialize that returns k 2-dimensional random centroids.

2. Next, implement a function squared euclidean distances that returns an array

3. To assign each Zi to its closest centroid, implement a function assignment step.

1. Perform a single-linkage clustering using the Euclidean distance to measure the

(a), (b), (c), (d), (e), (f)

Thus, the dendrogram follows as:

3. If we use complete linkage, the resulting dendrogram is no longer unique, as c has

2. Your implemented clustering method only works on 2-dimensional data. To project

4. In the case of outliers, consider initializing your k-means implementation by selecting

You might also like