yyht

yyht

28 followers · 7 following

Achievements

personal-page Public

HTML Updated Apr 16, 2026
T3RL Public
Forked from IcyFish332/T3RL

Python MIT License Updated Apr 15, 2026
Auto-claude-code-research-in-sleep Public
Forked from wanshuiyin/Auto-claude-code-research-in-sleep

ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…

Python MIT License Updated Mar 18, 2026
hermes-agent Public
Forked from NousResearch/hermes-agent

The agent that grows with you

Python MIT License Updated Mar 11, 2026
MLEvolve Public
Forked from InternScience/MLEvolve

MLEvolve is an open-source autonomous system for end-to-end machine learning algorithm design and optimization powered by progressive search and experience-driven memory.

Python Updated Mar 9, 2026
redstar Public

Python 2 Updated Mar 1, 2026
OpenClaw-RL Public
Forked from Gen-Verse/OpenClaw-RL

OpenClaw-RL: Personalize openclaw simply by talking to it

TypeScript MIT License Updated Feb 26, 2026
VeriSoftBench Public
Forked from utopia-group/VeriSoftBench

Benchmarking LLMs on Real-World Software Verification in Lean 4

Python MIT License Updated Feb 23, 2026
MARTI Public
Forked from TsinghuaC3I/MARTI

A Framework for LLM-based Multi-Agent Reinforced Training and Inference

Python MIT License Updated Feb 19, 2026
ML-Master Public
Forked from sjtu-sai-agents/ML-Master

The official implementation of "ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning"

Python Updated Jan 16, 2026
qqr Public
Forked from Alibaba-NLP/qqr

qqr is an RL training framework for open-ended agents.

Python Apache License 2.0 Updated Jan 14, 2026
Spectral-Sphere-Optimizer Public
Forked from Unakar/Spectral-Sphere-Optimizer

Spectral Sphere Optimizer

Python Apache License 2.0 Updated Jan 14, 2026
Seed-Prover Public
Forked from ByteDance-Seed/Seed-Prover

Lean Apache License 2.0 Updated Dec 19, 2025
torchforge Public
Forked from meta-pytorch/torchforge

PyTorch-native post-training at scale

Python BSD 3-Clause "New" or "Revised" License Updated Nov 5, 2025
AgentRL Public
Forked from THUDM/AgentRL

Python MIT License Updated Oct 27, 2025
InfiR2 Public
Forked from InfiXAI/InfiR2

Shell Updated Oct 22, 2025
sparsity_in_rl Public
Forked from SagnikMukherjee/sparsity_in_rl

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Python Updated Oct 20, 2025
verl_megatron_practice Public
Forked from ISEEKYAN/verl_megatron_practice

(best/better) practices of megatron on veRL and tuning guide

Shell Apache License 2.0 Updated Sep 26, 2025
RLinf Public
Forked from RLinf/RLinf

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.

Python Apache License 2.0 Updated Sep 22, 2025
debug Public

Updated Sep 4, 2025
Verlog Public
Forked from WentseChen/Verlog

Verlog: A Multi-turn RL framework for LLM agents

Python Apache License 2.0 Updated Aug 16, 2025
openrlhf_async_pipline Public

Python 90 2 Apache License 2.0 Updated Aug 16, 2025
IRL-VLA Public
Forked from IRL-VLA/IRL-VLA

Official repo for IRL-VLA

Apache License 2.0 Updated Aug 13, 2025
ASearcher Public
Forked from inclusionAI/ASearcher

Python Updated Aug 11, 2025
Agent_Foundation_Models Public
Forked from OPPO-PersonalAI/Agent_Foundation_Models

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.

Python Apache License 2.0 Updated Aug 11, 2025
MiroRL Public
Forked from MiroMindAI/MiroRL

MiroRL is an MCP-first reinforcement learning framework for deep research agent.

Python Apache License 2.0 Updated Aug 8, 2025
JAxtar Public
Forked from tinker495/JAxtar

JAxtar is a project with a JAX-native implementation of parallelizeable A* & Q* solver for neural heuristic search research.

Python MIT License Updated Aug 7, 2025
openrlhf_gem Public

Python Apache License 2.0 Updated Aug 5, 2025
terminal-bench-rl Public
Forked from Danau5tin/terminal-bench-rl

GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.

Python 1 Updated Jul 31, 2025
vsag Public
Forked from antgroup/vsag

vsag is a vector indexing library used for similarity search.

C++ Apache License 2.0 Updated Jul 29, 2025

yyht

Achievements

Achievements

personal-page Public

Uh oh!

T3RL Public

Uh oh!

Auto-claude-code-research-in-sleep Public

Uh oh!

hermes-agent Public

Uh oh!

MLEvolve Public

Uh oh!

redstar Public

Uh oh!

OpenClaw-RL Public

Uh oh!

VeriSoftBench Public

Uh oh!

MARTI Public

Uh oh!

ML-Master Public

Uh oh!

qqr Public

Uh oh!

Spectral-Sphere-Optimizer Public

Uh oh!

Seed-Prover Public

Uh oh!

torchforge Public

Uh oh!

AgentRL Public

Uh oh!

InfiR2 Public

Uh oh!

sparsity_in_rl Public

Uh oh!

verl_megatron_practice Public

Uh oh!

RLinf Public

Uh oh!

debug Public

Uh oh!

Verlog Public

Uh oh!

openrlhf_async_pipline Public

Uh oh!

IRL-VLA Public

Uh oh!

ASearcher Public

Uh oh!

Agent_Foundation_Models Public

Uh oh!

MiroRL Public

Uh oh!

JAxtar Public

Uh oh!

openrlhf_gem Public

Uh oh!

terminal-bench-rl Public

Uh oh!

vsag Public

Uh oh!