Skip to content
View si0wang's full-sized avatar

Block or report si0wang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 14 1 Updated Feb 21, 2026
Python 24 1 Updated Jun 18, 2025
Jupyter Notebook 31 3 Updated Feb 26, 2026

颈椎病腰突康复指南,为程序员群体提供简单可靠的康复指南。

Python 3,425 218 Updated Dec 25, 2023
Python 107 6 Updated Jun 10, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,817 210 Updated Apr 10, 2026

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,372 920 Updated Apr 18, 2026

A fork to add multimodal model training to open-r1

Python 1,528 72 Updated Feb 8, 2025

Simple RL training for reasoning

Python 3,847 289 Updated Dec 23, 2025
Python 48 5 Updated Dec 30, 2024

The official implementation of Natural Language Fine-Tuning

Python 54 4 Updated Jan 7, 2025

[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS

Python 1,243 113 Updated Jan 16, 2026

[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models

Python 413 27 Updated Jun 25, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 70,270 8,600 Updated Apr 12, 2026

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 329 36 Updated Jan 29, 2026
Python 9 Updated Apr 30, 2025
Python 8 1 Updated Jul 1, 2024
Jupyter Notebook 32 Updated Feb 8, 2024
Python 23 2 Updated Apr 2, 2024
Jupyter Notebook 7 1 Updated Feb 28, 2024
Python 132 23 Updated Mar 18, 2026

PyTorch implementation of DreamerV3, Mastering Diverse Domains through World Models.

Python 11 2 Updated Feb 16, 2024

a distributed deep learning platform

C++ 3,606 1,270 Updated Mar 23, 2026

Simple maze environments using mujoco-py

Python 60 12 Updated Dec 27, 2023

Implementation of Dreamer v3 in pytorch.

Python 836 213 Updated Mar 8, 2026

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Python 2,374 700 Updated Mar 3, 2026

A curated list of awesome model based RL resources (continually updated)

1,334 76 Updated Dec 20, 2025

Benchmark for Continuous Multi-Agent Robotic Control, based on OpenAI's Mujoco Gym environments.

Python 371 35 Updated Mar 16, 2023