Skip to content
View jaeyong-song's full-sized avatar
🧭
Visiting
🧭
Visiting

Highlights

  • Pro

Block or report jaeyong-song

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Training library for Megatron-based models with bidirectional Hugging Face conversion capability

Python 732 365 Updated Jun 16, 2026

A vector index built on TurboQuant, written in Rust with Python bindings

Python 11,733 1,018 Updated Jun 10, 2026

TeamKorea agent-reasoning solution for the MLSys 2026 scheduling contest (Track B)

Python 5 1 Updated May 26, 2026

Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language

Python 294 20 Updated May 31, 2026

Spec-driven development (SDD) for AI coding assistants.

TypeScript 55,087 3,859 Updated Jun 13, 2026

Fast and Furious AMD Kernels

C++ 433 66 Updated Jun 13, 2026

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 11,983 1,072 Updated Jun 16, 2026

Pre-indexed code knowledge graph, auto syncs on code changes, for Claude Code, Codex, Gemini, Cursor, OpenCode, AntiGravity, Kiro, and Hermes Agent — fewer tokens, fewer tool calls, 100% local

TypeScript 50,129 3,066 Updated Jun 16, 2026

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,255 80 Updated Jun 2, 2026

[MLSys '26] GriNNder: Breaking the Memory Capacity Wall in Full-Graph GNN Training with Storage Offloading

Python 5 Updated May 10, 2026

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

Python 3,505 306 Updated Apr 10, 2026

High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU [SIGMOD'26]

Cuda 27 5 Updated Apr 22, 2026

First public benchmark of llama.cpp speculative decoding on Qwen3.6-35B-A3B with a single RTX 3090 (post PR #19493 merge, 2026-04-19). 19 configurations covering ngram-cache, ngram-mod, and classic…

Python 28 1 Updated May 16, 2026

Skills for Real Engineers. Straight from my .claude directory.

Shell 131,322 11,436 Updated Jun 12, 2026

Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)

Python 34,834 5,364 Updated Jun 12, 2026

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Rust 32,371 2,384 Updated Jun 16, 2026

Heterogeneous FPGA virtualization

Shell 6 Updated Mar 3, 2026

Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall free. The result is lower TTFT, lower end-to-end latency, and l…

Python 17 2 Updated Mar 9, 2026

SwarmIO is an SSD emulation framework for next-generation GPU-centric storage systems research

C 49 1 Updated May 24, 2026

[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"

Python 117 17 Updated Dec 15, 2025

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Python 67,726 5,704 Updated Jun 15, 2026

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 109 8 Updated Dec 2, 2025

Overleaf CLI, library & MCP server — pull, push, sync, compile LaTeX projects. Use from terminal, import as TypeScript library, or connect AI agents via Model Context Protocol.

TypeScript 82 16 Updated Jun 13, 2026
JavaScript 165 17 Updated Apr 7, 2026

The repo has been moved to https://github.com/VectorDB-NTU/RaBitQ-Library. [SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor …

C++ 249 38 Updated Apr 22, 2026

An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.

Rust 193,900 109,957 Updated Jun 8, 2026

Unified configuration and drivers for running Linux on Samsung Galaxy Book with complete functionality. Combines galaxy-book2-pro-linux and samsung-galaxybook-extras repositories.

ASL 17 1 Updated Sep 7, 2025

A framework for generating realistic LLM serving workloads

Python 153 14 Updated May 11, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 488 26 Updated Jun 11, 2026

Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini C…

TypeScript 61,097 5,045 Updated Jun 16, 2026
Next