Skip to content
View codinggosu's full-sized avatar
  • Mangoboost
  • Seoul
  • 19:46 (UTC +09:00)

Block or report codinggosu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Lightweight Recommendation System

Python 9,288 719 Updated Oct 13, 2025

Framework providing operating system abstractions and a range of shared networking and memory services for common modern heterogeneous platforms.

SystemVerilog 340 100 Updated Apr 9, 2026

Perplexity open source garden for inference technology

Rust 390 36 Updated Dec 25, 2025

Linux Cross-Memory Attach

C 98 38 Updated Feb 18, 2026

Modular RDMA Interface

C++ 112 30 Updated Apr 13, 2026

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,804 1,031 Updated Mar 30, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,324 858 Updated Mar 22, 2026
HTML 233 56 Updated Apr 8, 2026
C++ 101 42 Updated Mar 23, 2026

Supercharge Your LLM with the Fastest KV Cache Layer

Python 7,970 1,088 Updated Apr 13, 2026

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 418 40 Updated Aug 13, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,210 398 Updated Jul 11, 2024

LLaMA 2 implemented from scratch in PyTorch

Python 369 71 Updated Sep 25, 2023

[Deprecated] ⭐️ TT-NN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path

Python 61 30 Updated Feb 24, 2026

A validation and profiling tool for AI infrastructure

Python 370 85 Updated Apr 8, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,368 183 Updated Mar 12, 2026

Merlin Models is a collection of deep learning recommender system model reference implementations

Python 296 54 Updated May 4, 2024

A LogGOPS (LogP, LogGP, LogGPS) Simulator and Simulation Framework

C 17 7 Updated Aug 20, 2024

DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.

Python 80 23 Updated Apr 7, 2026

Fully open reproduction of DeepSeek-R1

Python 25,981 2,413 Updated Apr 2, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,079 672 Updated Apr 13, 2026

Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.

Go 54 36 Updated Apr 9, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,724 5,318 Updated Apr 13, 2026

An extremely fast Python package and project manager, written in Rust.

Rust 83,170 2,935 Updated Apr 13, 2026

To develop Arm Cortex-M0 based SoCs, from creating high-level functional specifications to design, implementation and testing on FPGA platforms using standard hardware description and software prog…

Verilog 40 9 Updated Dec 24, 2020

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,552 731 Updated Apr 12, 2026
Next