Skip to content
View Ldpe2G's full-sized avatar
:octocat:
I may be slow to respond.
:octocat:
I may be slow to respond.

Block or report Ldpe2G

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,659 85 Updated Dec 20, 2025

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 462 20 Updated Dec 23, 2025

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

Cuda 249 19 Updated Dec 15, 2025

Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton

Python 39 1 Updated Feb 13, 2025

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Python 3,429 124 Updated Nov 27, 2025

Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

Python 7,708 279 Updated Dec 18, 2025

🙌 OpenHands: AI-Driven Development

Python 65,869 8,105 Updated Dec 23, 2025

SWE-bench: Can Language Models Resolve Real-world Github Issues?

Python 4,003 719 Updated Dec 18, 2025

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Python 345 45 Updated Apr 22, 2025

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 354 40 Updated Dec 16, 2025

A debugging and profiling tool that can trace and visualize python code execution

Python 7,460 467 Updated Dec 21, 2025

Getting Started with Triton: A Tutorial for Python Beginners

HTML 27 2 Updated Oct 21, 2025

🎒 Token-Oriented Object Notation (TOON) – Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK.

TypeScript 21,066 928 Updated Dec 15, 2025

Qwen3Guard is a multilingual guardrail model series developed by the Qwen team at Alibaba Cloud.

Python 388 26 Updated Oct 21, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,731 2,877 Updated Dec 23, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,958 358 Updated Dec 23, 2025

PyTorch Single Controller

Rust 932 120 Updated Dec 23, 2025

Kimi CLI is your next CLI agent.

Python 3,669 361 Updated Dec 23, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 726 73 Updated Nov 30, 2025

Contexts Optical Compression

Python 21,554 1,926 Updated Oct 25, 2025
Python 38 4 Updated Oct 12, 2025

Ascend TileLang adapter

C++ 167 47 Updated Dec 23, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,293 355 Updated Dec 23, 2025

Verify Precision of all Kimi K2 API Vendor

Python 488 26 Updated Nov 19, 2025

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 833 103 Updated Dec 23, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,157 193 Updated Oct 9, 2025

Super-fast Structured Outputs

Rust 643 43 Updated Dec 1, 2025

A tiny demo of interfacing CUDA via nanobind with a pytorch tensor

Cuda 7 Updated Dec 24, 2024

🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.

JavaScript 22,593 2,285 Updated Oct 17, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,713 1,357 Updated Dec 17, 2025
Next