Skip to content
View codinggosu's full-sized avatar
  • Mangoboost
  • Seoul
  • 23:42 (UTC +09:00)

Block or report codinggosu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Lightweight Recommendation System

Python 9,300 720 Updated Oct 13, 2025

Framework providing operating system abstractions and a range of shared networking and memory services for common modern heterogeneous platforms.

SystemVerilog 348 99 Updated Apr 27, 2026

Perplexity open source garden for inference technology

Rust 402 38 Updated Dec 25, 2025

Linux Cross-Memory Attach

C 98 38 Updated Feb 18, 2026

Modular RDMA Interface

C++ 119 37 Updated Apr 29, 2026

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,843 1,038 Updated Mar 30, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,137 950 Updated Apr 24, 2026
HTML 234 55 Updated Apr 8, 2026
C++ 105 42 Updated Mar 23, 2026

Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,156 1,137 Updated Apr 29, 2026

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 420 43 Updated Aug 13, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,224 398 Updated Jul 11, 2024

LLaMA 2 implemented from scratch in PyTorch

Python 369 71 Updated Sep 25, 2023

[Deprecated] ⭐️ TT-NN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path

Python 61 30 Updated Feb 24, 2026

A validation and profiling tool for AI infrastructure

Python 370 86 Updated Apr 27, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,373 185 Updated Mar 12, 2026

Merlin Models is a collection of deep learning recommender system model reference implementations

Python 298 54 Updated May 4, 2024

A LogGOPS (LogP, LogGP, LogGPS) Simulator and Simulation Framework

C 17 7 Updated Aug 20, 2024

DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.

Python 83 25 Updated Apr 24, 2026

Fully open reproduction of DeepSeek-R1

Python 26,011 2,421 Updated Apr 2, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,232 717 Updated Apr 29, 2026

Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.

Go 54 36 Updated Apr 28, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 26,702 5,619 Updated Apr 29, 2026

An extremely fast Python package and project manager, written in Rust.

Rust 84,136 3,009 Updated Apr 29, 2026

To develop Arm Cortex-M0 based SoCs, from creating high-level functional specifications to design, implementation and testing on FPGA platforms using standard hardware description and software prog…

Verilog 41 9 Updated Dec 24, 2020

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,557 743 Updated Apr 29, 2026
Next