MoeKid101

😋

enjoy life

Yang Letian MoeKid101

😋

enjoy life

泥岩单推人

5 followers · 13 following

Shanghai Jiaotong University

Stars

tajwarfahim / maxrl

Official Implementation of "Maximum Likelihood Reinforcement Learning (MaxRL)"

Python 188 29 Updated May 28, 2026

TwinAligner / TwinAligner

[arxiv 2025] TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation

Jupyter Notebook 69 1 Updated Mar 11, 2026

XiaomiMiMo / MiMo

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Python 2,236 100 Updated Jun 5, 2025

open-thought / reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,444 120 Updated Apr 17, 2026

linhlpv / awesome-offline-to-online-RL-papers

A list of Offline to Online RL papers (continually updated)

96 1 Updated Apr 25, 2026

s-nlp / AdaRAGUE

[ACL 2025] Adaptive Retrieval without Self-Knowledge? Bringing Uncertainty Back Home

Python 19 4 Updated May 17, 2025

PRIME-RL / PRIME

Scalable RL solution for advanced reasoning of language models

Python 1,863 112 Updated Mar 18, 2025

junhua / awesome-llm-agents

A Collection of High Quality research papers and open-source projects about LLM-agents

85 14 Updated Nov 1, 2024

YangLing0818 / SuperCorrect-llm

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Python 90 7 Updated Mar 23, 2025

spiral-rl / spiral

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Python 195 22 Updated Mar 27, 2026

CMU-AIRe / MRT

Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".

Jupyter Notebook 119 6 Updated Aug 5, 2025

verl-project / verl

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,961 4,074 Updated Jun 14, 2026

ziyuwan / ReMA-public

Reinforced Multi-LLM Agents training

Python 86 5 Updated Jan 18, 2026

ganler / code-r1

Reproducing R1 for Code with Reliable Rewards

Python 312 20 Updated May 5, 2025

sjtug / SJTUThesis

上海交通大学 LaTeX 论文模板 | Shanghai Jiao Tong University LaTeX Thesis Template

TeX 3,802 799 Updated May 20, 2026

Dakingrai / awesome-mechanistic-interpretability-lm-papers

250 17 Updated Nov 22, 2024

asinghcsu / AgenticRAG-Survey

Agentic-RAG explores advanced Retrieval-Augmented Generation systems enhanced with AI LLM agents.

1,662 179 Updated Oct 20, 2025

AGI-Edgerunners / LLM-Agents-Papers

A repo lists papers related to LLM based agent

Python 2,313 149 Updated Jul 12, 2025

zjunlp / LLMAgentPapers

Must-read Papers on LLM Agents.

3,047 183 Updated Jun 5, 2026

naver / bergen

Benchmarking library for RAG

Jupyter Notebook 273 33 Updated Mar 11, 2026

liunian-Jay / Awesome-RAG

💡 Awesome RAG: A resource of Retrieval-Augmented Generation (RAG) for LLMs, focusing on the development of technology.

498 27 Updated May 25, 2026

mmistakes / minimal-mistakes

📐 Jekyll theme for building a personal site, blog, project documentation, or portfolio.

HTML 13,523 27,287 Updated Apr 29, 2026

coder / code-server

VS Code in the browser

TypeScript 77,947 6,703 Updated Jun 12, 2026

chanqi4444 / GTM

Download all GTMs by the scripts

104 55 Updated Jun 10, 2019

SuperBruceJia / Awesome-LLM-Self-Consistency

Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models

123 11 Updated Jul 20, 2025

atfortes / Awesome-LLM-Reasoning

From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓

3,634 211 Updated Apr 20, 2026

Baichenjia / UTDS

Pessimistic Value Iteration for Multi-Task Data Sharing in Offline RL

Python 18 4 Updated Nov 21, 2023

JadyXuan / NTTS

NO TIME TO SLEEP

Python 643 23 Updated May 26, 2024

yuweihao / MambaOut

MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)

Python 2,697 49 Updated Mar 9, 2025

huggingface / trl

Train transformer language models with reinforcement learning.

Python 18,637 2,791 Updated Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly