-
Grad Student@AISys, Seoul National University
- Seoul, Korea
- https://aisys.snu.ac.kr/members/JaeyongSong.html
Highlights
- Pro
Stars
Training library for Megatron-based models with bidirectional Hugging Face conversion capability
A vector index built on TurboQuant, written in Rust with Python bindings
TeamKorea agent-reasoning solution for the MLSys 2026 scheduling contest (Track B)
Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language
Spec-driven development (SDD) for AI coding assistants.
[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
Pre-indexed code knowledge graph, auto syncs on code changes, for Claude Code, Codex, Gemini, Cursor, OpenCode, AntiGravity, Kiro, and Hermes Agent — fewer tokens, fewer tool calls, 100% local
📰 Must-read papers and blogs on Speculative Decoding ⚡️
[MLSys '26] GriNNder: Breaking the Memory Capacity Wall in Full-Graph GNN Training with Storage Offloading
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU [SIGMOD'26]
First public benchmark of llama.cpp speculative decoding on Qwen3.6-35B-A3B with a single RTX 3090 (post PR #19493 merge, 2026-04-19). 19 configurations covering ngram-cache, ngram-mod, and classic…
Skills for Real Engineers. Straight from my .claude directory.
Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall free. The result is lower TTFT, lower end-to-end latency, and l…
SwarmIO is an SSD emulation framework for next-generation GPU-centric storage systems research
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
Overleaf CLI, library & MCP server — pull, push, sync, compile LaTeX projects. Use from terminal, import as TypeScript library, or connect AI agents via Model Context Protocol.
The repo has been moved to https://github.com/VectorDB-NTU/RaBitQ-Library. [SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor …
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
Unified configuration and drivers for running Linux on Samsung Galaxy Book with complete functionality. Combines galaxy-book2-pro-linux and samsung-galaxybook-extras repositories.
A framework for generating realistic LLM serving workloads
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini C…