Skip to content
View MoeKid101's full-sized avatar
😋
enjoy life
😋
enjoy life
  • Shanghai Jiaotong University

Block or report MoeKid101

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Implementation of "Maximum Likelihood Reinforcement Learning (MaxRL)"

Python 188 29 Updated May 28, 2026

[arxiv 2025] TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation

Jupyter Notebook 69 1 Updated Mar 11, 2026

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Python 2,236 100 Updated Jun 5, 2025

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,444 120 Updated Apr 17, 2026

A list of Offline to Online RL papers (continually updated)

96 1 Updated Apr 25, 2026

[ACL 2025] Adaptive Retrieval without Self-Knowledge? Bringing Uncertainty Back Home

Python 19 4 Updated May 17, 2025

Scalable RL solution for advanced reasoning of language models

Python 1,863 112 Updated Mar 18, 2025

A Collection of High Quality research papers and open-source projects about LLM-agents

85 14 Updated Nov 1, 2024

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Python 90 7 Updated Mar 23, 2025

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Python 195 22 Updated Mar 27, 2026

Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".

Jupyter Notebook 119 6 Updated Aug 5, 2025

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,961 4,074 Updated Jun 14, 2026

Reinforced Multi-LLM Agents training

Python 86 5 Updated Jan 18, 2026

Reproducing R1 for Code with Reliable Rewards

Python 312 20 Updated May 5, 2025

上海交通大学 LaTeX 论文模板 | Shanghai Jiao Tong University LaTeX Thesis Template

TeX 3,802 799 Updated May 20, 2026

Agentic-RAG explores advanced Retrieval-Augmented Generation systems enhanced with AI LLM agents.

1,662 179 Updated Oct 20, 2025

A repo lists papers related to LLM based agent

Python 2,313 149 Updated Jul 12, 2025

Must-read Papers on LLM Agents.

3,047 183 Updated Jun 5, 2026

Benchmarking library for RAG

Jupyter Notebook 273 33 Updated Mar 11, 2026

💡 Awesome RAG: A resource of Retrieval-Augmented Generation (RAG) for LLMs, focusing on the development of technology.

498 27 Updated May 25, 2026

📐 Jekyll theme for building a personal site, blog, project documentation, or portfolio.

HTML 13,523 27,287 Updated Apr 29, 2026

VS Code in the browser

TypeScript 77,947 6,703 Updated Jun 12, 2026

Download all GTMs by the scripts

104 55 Updated Jun 10, 2019

Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models

123 11 Updated Jul 20, 2025

From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓

3,634 211 Updated Apr 20, 2026

Pessimistic Value Iteration for Multi-Task Data Sharing in Offline RL

Python 18 4 Updated Nov 21, 2023

NO TIME TO SLEEP

Python 643 23 Updated May 26, 2024

MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)

Python 2,697 49 Updated Mar 9, 2025

Train transformer language models with reinforcement learning.

Python 18,637 2,791 Updated Jun 14, 2026
Next