kyungjunleeme

🎯

Focusing

Kyungjun Lee kyungjunleeme

🎯

Focusing

Data Engineer

190 followers · 1.5k following

Achievements

x3 x3

Achievements

x3 x3

Highlights

Developer Program Member

Organizations

Starred repositories

19 stars written in Scala

Clear filter

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

Scala 43,214 29,167 Updated May 1, 2026

yahoo / CMAK

CMAK is a tool for managing Apache Kafka clusters

Scala 11,938 2,488 Updated Aug 2, 2023

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala 8,768 2,084 Updated Apr 30, 2026

snowplow / snowplow

The leader in Customer Data Infrastructure

Scala 7,010 1,177 Updated Apr 30, 2026

fpinscala / fpinscala

Code, exercises, answers, and hints to go along with the book "Functional Programming in Scala"

Scala 5,821 3,030 Updated Dec 11, 2024

microsoft / SynapseML

Simple and Distributed Machine Learning

Scala 5,226 860 Updated Apr 24, 2026

mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules

Scala 4,384 521 Updated Jun 29, 2022

JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing

Scala 4,133 743 Updated Apr 23, 2026

awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,614 583 Updated Apr 30, 2026

databricks / LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Scala 1,385 790 Updated Jan 28, 2025

apache / incubator-livy

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Scala 951 622 Updated Apr 27, 2026

delta-io / delta-sharing

An open protocol for secure data sharing

Scala 938 224 Updated Apr 29, 2026

Interana / eventsim

Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.

Scala 542 142 Updated Jan 27, 2026

mardambey / mypipe

MySQL binary log consumer with the ability to act on changed rows and publish changes to different systems with emphasis on Apache Kafka.

Scala 429 78 Updated Feb 22, 2023

embeddedkafka / embedded-kafka

A library that provides an in-memory Kafka instance to run your tests against.

Scala 420 46 Updated Apr 27, 2026

zio / zio-kafka

A fast Kafka client for ZIO and ZIO Streams

Scala 367 145 Updated Apr 30, 2026

databricks / dicer

Dicer auto-sharder: Infrastructure for building sharded services

Scala 261 23 Updated Apr 28, 2026

jhclark / ducttape

A workflow management system for researchers who heart Unix.

Scala 128 14 Updated Sep 23, 2015

opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.

Scala 40 50 Updated Feb 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kyungjun Lee kyungjunleeme

Achievements

Achievements

Highlights

Organizations

Block or report kyungjunleeme

Starred repositories

apache / spark

yahoo / CMAK

delta-io / delta

snowplow / snowplow

fpinscala / fpinscala

microsoft / SynapseML

mesos / chronos

JohnSnowLabs / spark-nlp

awslabs / deequ

databricks / LearningSparkV2

apache / incubator-livy

delta-io / delta-sharing

Interana / eventsim

mardambey / mypipe

embeddedkafka / embedded-kafka

zio / zio-kafka

databricks / dicer

jhclark / ducttape

opensearch-project / opensearch-spark

Starred topics

data-processing