Skip to content
View jiefisher's full-sized avatar

Block or report jiefisher

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Lean & Secure Bot

Go 409 57 Updated Apr 27, 2026
Python 22 5 Updated Dec 11, 2024

Scalable toolkit for efficient model reinforcement

Python 1,640 389 Updated May 20, 2026

Scalable toolkit for efficient model reinforcement

Python 12 3 Updated Jan 27, 2026

DeepConf: Deep Think with Confidence

Python 397 59 Updated May 6, 2026

Towards a Unified View of Large Language Model Post-Training

Python 211 10 Updated Sep 8, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,270 210 Updated May 20, 2026
Python 5 Updated Oct 20, 2024

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,675 132 Updated Nov 21, 2025

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,435 3,909 Updated May 20, 2026

how to optimize some algorithm in cuda.

Cuda 2,998 276 Updated May 20, 2026

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 3,137 263 Updated May 18, 2026

这是一个基于C++实现的从零开始的大模型推理框架

C++ 10 1 Updated Nov 18, 2024

Ollama Function Calling with Search API

Python 11 1 Updated Apr 28, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 11,045 1,113 Updated May 17, 2026

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Python 881 87 Updated May 10, 2026

A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size

Python 87 9 Updated Sep 5, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,449 165 Updated Mar 20, 2025
Python 331 32 Updated Jul 25, 2024

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 330 37 Updated Jan 29, 2026

Code for Quiet-STaR

Python 740 92 Updated Aug 21, 2024

Implementation of paper Data Engineering for Scaling Language Models to 128K Context

Python 496 31 Updated Mar 19, 2024

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Cuda 1,119 191 Updated May 20, 2026
C 2 1 Updated Oct 30, 2021

Tools for merging pretrained large language models.

Python 7,089 715 Updated May 6, 2026

A flexible and efficient training framework for large-scale alignment tasks

Python 452 39 Updated Oct 23, 2025

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,527 944 Updated May 15, 2026

😇 A PyTorch-like deep learning framework. Just for fun.

Python 157 7 Updated Oct 9, 2023
Next