Skip to content
View wangqinsi1's full-sized avatar

Block or report wangqinsi1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…

Python 10,944 953 Updated Nov 8, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 48,056 3,935 Updated Nov 7, 2025

An LLM agent that conducts deep research (local and web) on any given topic and generates a long report with citations.

Python 24,080 3,184 Updated Nov 7, 2025

[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges

2,008 57 Updated Nov 7, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,268 420 Updated Nov 7, 2025
CSS 18 Updated Nov 5, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 4,686 441 Updated Nov 4, 2025
Python 70 1 Updated Nov 4, 2025

[NeurIPS'25] KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

Python 46 6 Updated Nov 3, 2025

[NeurIPS'25] KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

Python 9 2 Updated Nov 1, 2025

This is the official Python version of Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.

Python 98 2 Updated Oct 21, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,681 366 Updated Oct 21, 2025

Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"

Python 47 6 Updated Oct 19, 2025

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning

Python 104 2 Updated Oct 16, 2025

[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,474 71 Updated Oct 13, 2025

[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

1,291 79 Updated Oct 11, 2025

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Python 153 9 Updated Sep 27, 2025

This is the official Python version of Angles Don’t Lie: Unlocking Training-Efficient RL Through the Model’s Own Signals.

Python 77 9 Updated Sep 26, 2025

Fully open reproduction of DeepSeek-R1

Python 25,620 2,401 Updated Sep 8, 2025

"Knock, knock!" "Who's there?" "Dobi."

HTML 17 1 Updated Aug 11, 2025

[ICML 2025] Official Repo for Stability-guided Adaptive Diffusion Acceleration. 🚀🌙Accelerating off-the-shelf diffusion model with a unified stability criterion.

Python 32 4 Updated Jul 24, 2025

ICML 2025 - Impossible Videos

Python 78 8 Updated Jul 23, 2025

This is Official PyTorch implementation for 2025-ICML-CoreMatching: Co-adaptive Sparse Inference Framework for Comprehensive Acceleration of Vision Language Model

Python 12 2 Updated May 27, 2025

HippoMM: Hippocampal-inspired Multimodal Memory

Python 13 Updated May 22, 2025

Witness the aha moment of VLM with less than $3.

Python 3,977 290 Updated May 19, 2025

Repository for latent Bayesian Kernel Inference

Python 7 1 Updated Apr 1, 2025

A GPU-based Incremental PCA implementation.

Python 31 6 Updated Feb 18, 2025
Python 147 11 Updated Feb 15, 2025

Advanced implementation of DeepSeek-R1 featuring Group Relative Policy Optimization (GRPO) for mathematical reasoning AI. Integrates safe distillation, modular reward systems, and efficient LoRA fi…

Python 13 3 Updated Jan 29, 2025

[ICLR 2025] Automated Design of Agentic Systems

Python 1,447 222 Updated Jan 28, 2025
Next