Skip to content
View iofu728's full-sized avatar
😶
Focusing
😶
Focusing

Block or report iofu728

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale

Python 265 21 Updated Apr 17, 2026

CUDA Kernel Benchmarking Library

Cuda 854 103 Updated Apr 14, 2026

CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.

Python 452 46 Updated Apr 18, 2026
Python 66 4 Updated Apr 15, 2026

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 296 22 Updated Apr 14, 2026

Connect to any agents with WeChat ClawBot.

Go 1,289 153 Updated Apr 1, 2026

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,235 118 Updated Mar 19, 2026

OpenClaw-RL: Train any agent simply by talking

Python 5,032 532 Updated Apr 18, 2026

AI agents running research on single-GPU nanochat training automatically

Python 74,152 10,816 Updated Mar 26, 2026

A lightweight inference engine supporting speculative speculative decoding (SSD).

Python 881 65 Updated Mar 22, 2026

A lightweight, AI-native training framework for large language models. Designed for fast iteration, reproducible experiments, and modular configuration across SFT, RLVR, and evaluation workflows.

Python 548 37 Updated Apr 7, 2026

A simple, fast and robust program-aware agentic inference system.

Python 267 22 Updated Mar 16, 2026

FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels

Python 159 126 Updated Apr 18, 2026

Building the Virtuous Cycle for AI-driven LLM Systems

Python 222 37 Updated Apr 14, 2026

A rejection-sampling based distribution alignment method for extreme actor-policy mismatch RL Training

Python 15 1 Updated Feb 11, 2026

FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.

Rust 60 7 Updated Feb 6, 2026
Python 63 5 Updated Feb 5, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 360,055 73,343 Updated Apr 18, 2026

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Python 4,288 316 Updated Jan 14, 2026

DFlash: Block Diffusion for Flash Speculative Decoding

Python 1,860 126 Updated Apr 17, 2026

OpenAI Frontier Evals

Python 1,168 148 Updated Apr 16, 2026

A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.

C++ 187 31 Updated Apr 17, 2026

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

Cuda 437 27 Updated Mar 30, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 635 74 Updated Apr 18, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 4,019 585 Updated Mar 13, 2026
Next