Skip to content
View sarckk's full-sized avatar
🎯
🎯

Organizations

@facebookresearch @pytorch @vllm-project

Block or report sarckk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository contains the code to train and evaluate TRIBE v2, a multimodal model for brain response prediction

Jupyter Notebook 2,623 583 Updated May 11, 2026

100M tokens. Infinite compute. Lowest val loss wins.

Python 467 66 Updated May 14, 2026

Nano vLLM

Python 13,472 2,101 Updated Apr 26, 2026

MoE training for Me and You and maybe other people

Python 386 33 Updated Mar 15, 2026

GPU programming related news and material links

2,133 126 Updated Mar 8, 2026

Persist and reuse KV Cache to speedup your LLM.

Python 277 74 Updated May 15, 2026

Source code examples from the Parallel Forall Blog

HTML 1,330 643 Updated Sep 23, 2025

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 211 72 Updated May 17, 2026

PyTorch Single Controller

Rust 1,033 162 Updated May 18, 2026

An app that brings language models directly to your phone.

TypeScript 6,975 699 Updated May 17, 2026

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,100 2,078 Updated Mar 27, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 864 146 Updated May 17, 2026

Accent Smart Picture Frame

Python 207 17 Updated May 7, 2026

Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,283 1,179 Updated May 18, 2026

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 278 20 Updated Aug 31, 2024

Open source repo for Locate 3D Model, 3D-JEPA and Locate 3D Dataset

Python 443 52 Updated Jun 3, 2025

[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection

Python 157 16 Updated Feb 20, 2025

Dream 7B, a large diffusion language model

Python 1,238 77 Updated Nov 21, 2025

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,335 170 Updated Jan 4, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 27,933 5,958 Updated May 18, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 80,283 16,883 Updated May 18, 2026

Dynamic Memory Management for Serving LLMs without PagedAttention

C 485 41 Updated May 30, 2025

Awesome LLM compression research papers and tools.

1,832 126 Updated Feb 23, 2026

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 13,119 1,460 Updated May 16, 2026

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,217 76 Updated May 11, 2026

A library to analyze PyTorch traces.

Python 517 92 Updated May 13, 2026

A curated list for Efficient Large Language Models

Python 2,009 166 Updated Jun 17, 2025

An implementation of a deep learning recommendation model (DLRM)

Python 4,037 865 Updated Jan 12, 2026

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

8,097 512 Updated Jan 6, 2026
Next