Skip to content
View kyungjunleeme's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@ens4

Block or report kyungjunleeme

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

19 stars written in Scala
Clear filter

Apache Spark - A unified analytics engine for large-scale data processing

Scala 43,214 29,167 Updated May 1, 2026

CMAK is a tool for managing Apache Kafka clusters

Scala 11,938 2,488 Updated Aug 2, 2023

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala 8,768 2,084 Updated Apr 30, 2026

The leader in Customer Data Infrastructure

Scala 7,010 1,177 Updated Apr 30, 2026

Code, exercises, answers, and hints to go along with the book "Functional Programming in Scala"

Scala 5,821 3,030 Updated Dec 11, 2024

Simple and Distributed Machine Learning

Scala 5,226 860 Updated Apr 24, 2026

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules

Scala 4,384 521 Updated Jun 29, 2022

State of the Art Natural Language Processing

Scala 4,133 743 Updated Apr 23, 2026

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,614 583 Updated Apr 30, 2026

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Scala 1,385 790 Updated Jan 28, 2025

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Scala 951 622 Updated Apr 27, 2026

An open protocol for secure data sharing

Scala 938 224 Updated Apr 29, 2026

Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.

Scala 542 142 Updated Jan 27, 2026

MySQL binary log consumer with the ability to act on changed rows and publish changes to different systems with emphasis on Apache Kafka.

Scala 429 78 Updated Feb 22, 2023

A library that provides an in-memory Kafka instance to run your tests against.

Scala 420 46 Updated Apr 27, 2026

A fast Kafka client for ZIO and ZIO Streams

Scala 367 145 Updated Apr 30, 2026

Dicer auto-sharder: Infrastructure for building sharded services

Scala 261 23 Updated Apr 28, 2026

A workflow management system for researchers who heart Unix.

Scala 128 14 Updated Sep 23, 2015

Spark Accelerator framework ; It enables secondary indices to remote data stores.

Scala 40 50 Updated Feb 11, 2026