-
Stanford University
- Palo Alto, CA
- jiachengmiao.com
- @Jiacheng_Miao
Highlights
- Pro
Stars
Train the smallest LM you can that fits in 16MB. Best model wins!
Benchmarking approaches to fine-tune AlphaGenome on lentiMPRA data
AI agents running research on single-GPU nanochat training automatically
⚒ Evolutionary self-improvement for Hermes Agent — optimize skills, prompts, and code using DSPy + GEPA
AI agents running research on single-GPU nanochat training automatically
An open-source SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skills and subagents, it handles different levels of tasks that could take minute…
Ralph is an autonomous AI agent loop that runs repeatedly until all PRD items are complete.
Train transformer language models with reinforcement learning.
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
Replication archive for "Do Claude Code and Codex P-Hack? Sycophancy and Statistical Analysis in Large Language Models"
OpenSandbox is a general-purpose sandbox platform for AI applications, offering multi-language SDKs, unified sandbox APIs, and Docker/Kubernetes runtimes for scenarios like Coding Agents, GUI Agent…
Official PyTorch Implementation for Learning a Generative Meta-Model of LLM Activations
AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods
Hypernetworks that update LLMs to remember factual information
Scaling Preference Data Curation via Human-AI Synergy
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
HumanLM: Simulating Users with State Alignment Beats Response Imitation
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
An ARC-AGI solution using Agentica from Symbolica
"🐈 nanobot: The Ultra-Lightweight OpenClaw"
Official MCP server implementation for accessing Open Targets Data
Ideas for projects related to Tinker
Reinforcement Learning via Self-Distillation (SDPO)
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞