Sparkling Water

Short Description

Sparkling Water is a scalable system for detecting, merging, and clustering similar server processes based on interaction logs. Using Apache Spark, MinHash, LSH, and time-series hashing (SSH, BSeSH), it efficiently identifies behavior patterns in large server infrastructures for performance optimization, anomaly detection, and system analysis. A detailed decription can be found in the report

Features

Similarity detection between server processes (name + timing)
Time-series analysis using SSH and BSeSH
Merging of equivalent processes
Clustering using k-means++
Scalable, distributed log processing via Apache Spark

Installation

Ensure you have Python 3.7+ and Apache Spark installed.

Install the required Python packages:

pip install -r requirements.txt

Run the Pipelines

The project has two main parts. Each has its own script:

1. Similarity Detection

Run:

python pipeline_part1.py

Input: res/output.txt
Outputs:
- res/part1Observations.txt – similarity analysis
- res/part1Output.txt – merged process candidates

2. Merging and Clustering

Run:

python pipeline_part2.py

Input: res/part1Output.txt
Output:
- res/part2Observations.txt – final clustering results

Authors

Liva van der Velden — Utrecht University
Robin Kollmann — Utrecht University
Simon Menke — Utrecht University

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
checkpoints		checkpoints
experiments		experiments
res		res
ssh		ssh
.gitignore		.gitignore
README.md		README.md
create_data.py		create_data.py
data_handler.py		data_handler.py
identify_similarities.py		identify_similarities.py
metrics.py		metrics.py
pipeline_clustering.py		pipeline_clustering.py
pipeline_components.py		pipeline_components.py
pipeline_merge_processes.py		pipeline_merge_processes.py
pipeline_part1		pipeline_part1
pipeline_part2		pipeline_part2
pipeline_similiar.py		pipeline_similiar.py
report.pdf		report.pdf
repr_notebook_input.csv		repr_notebook_input.csv
representations.py		representations.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sparkling Water

Short Description

Features

Installation

Run the Pipelines

1. Similarity Detection

2. Merging and Clustering

Authors

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Robino-CK/-Py-Sparkling-Water

Folders and files

Latest commit

History

Repository files navigation

Sparkling Water

Short Description

Features

Installation

Run the Pipelines

1. Similarity Detection

2. Merging and Clustering

Authors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages