Skip to content
View Gyu1291's full-sized avatar
🚀
Let's rocket
🚀
Let's rocket
  • KAIST
  • Seoul, Korea
  • 20:09 (UTC +09:00)

Block or report Gyu1291

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

109 stars written in Python
Clear filter
Python 73 15 Updated May 27, 2025

LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification

Python 67 3 Updated Jul 14, 2025

[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"

Python 57 3 Updated Jul 8, 2025

Adaptive Parallel PDF Parsing and Resource Scaling Engine

Python 56 14 Updated Oct 23, 2025

Meta-Repository for Bespoke Silicon Group's Manycore Architecture (A.K.A HammerBlade)

Python 43 20 Updated Jun 16, 2025

Experiments in Joint Embedding Predictive Architectures (JEPAs).

Python 43 10 Updated Jan 5, 2024

Python implementation of the TMscore program

Python 43 13 Updated Apr 29, 2024

PyTorchSim is a Comprehensive, Fast, and Accurate NPU Simulation Framework

Python 42 3 Updated Nov 6, 2025

Official PyTorch implementation of "Denoising MCMC for Accelerating Diffusion-Based Generative Models", ICML 2023 Oral Paper

Python 31 4 Updated Sep 14, 2023

A simple Python script for running LLMs on Intel's Neural Processing Units (NPUs)

Python 26 1 Updated Oct 17, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 25 12 Updated Nov 7, 2025
Python 20 4 Updated Oct 21, 2025

Created and enhanced a local LLM training system on Apple Silicon with MLX and Metal API, overcoming the absence of CUDA support. Fine-tuned the Llama3 model on 16 GPUs for streamlined solution of …

Python 20 5 Updated May 29, 2024
Python 19 Updated Nov 5, 2024

KV cache compression via sparse coding

Python 14 2 Updated Oct 26, 2025

[ISCA 2025] Official Implementation of "MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization"

Python 12 1 Updated Oct 30, 2025
Python 12 2 Updated Nov 11, 2024