Skip to content
View jmiao24's full-sized avatar

Highlights

  • Pro

Block or report jmiao24

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 2,486 1,327 Updated Mar 20, 2026

Benchmarking approaches to fine-tune AlphaGenome on lentiMPRA data

Python 5 Updated Mar 19, 2026

AI agents running research on single-GPU nanochat training automatically

Python 431 23 Updated Mar 13, 2026

⚒ Evolutionary self-improvement for Hermes Agent — optimize skills, prompts, and code using DSPy + GEPA

Python 238 17 Updated Mar 9, 2026

AI agents running research on single-GPU nanochat training automatically

Python 45,627 6,341 Updated Mar 16, 2026

Pixel office.

TypeScript 5,059 716 Updated Mar 19, 2026

An open-source SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skills and subagents, it handles different levels of tasks that could take minute…

Python 32,048 3,881 Updated Mar 20, 2026

Ralph is an autonomous AI agent loop that runs repeatedly until all PRD items are complete.

TypeScript 13,366 1,393 Updated Feb 2, 2026

Train transformer language models with reinforcement learning.

Python 17,733 2,571 Updated Mar 20, 2026

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

Python 9,210 899 Updated Mar 20, 2026

AllenAI's post-training codebase

Python 3,641 515 Updated Mar 20, 2026

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 4,264 367 Updated Nov 13, 2025

Replication archive for "Do Claude Code and Codex P-Hack? Sycophancy and Statistical Analysis in Large Language Models"

R 17 Updated Mar 3, 2026

OpenSandbox is a general-purpose sandbox platform for AI applications, offering multi-language SDKs, unified sandbox APIs, and Docker/Kubernetes runtimes for scenarios like Coding Agents, GUI Agent…

Python 8,882 673 Updated Mar 20, 2026

Official PyTorch Implementation for Learning a Generative Meta-Model of LLM Activations

Jupyter Notebook 72 11 Updated Mar 18, 2026

AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods

TypeScript 26,333 2,781 Updated Mar 20, 2026

Hypernetworks that update LLMs to remember factual information

Python 587 64 Updated Mar 2, 2026

Scaling Preference Data Curation via Human-AI Synergy

145 5 Updated Jul 3, 2025

Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"

Jupyter Notebook 603 54 Updated Oct 7, 2025

HumanLM: Simulating Users with State Alignment Beats Response Imitation

Python 68 8 Updated Feb 27, 2026

Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models

1,032 40 Updated Mar 15, 2026

Code and Data for Tau-Bench

Python 1,135 187 Updated Mar 18, 2026

An ARC-AGI solution using Agentica from Symbolica

Python 166 15 Updated Feb 12, 2026

"🐈 nanobot: The Ultra-Lightweight OpenClaw"

Python 35,140 5,928 Updated Mar 20, 2026

pip install continualcode

Python 37 3 Updated Feb 10, 2026

Official MCP server implementation for accessing Open Targets Data

Python 25 2 Updated Mar 20, 2026

Ideas for projects related to Tinker

174 9 Updated Nov 6, 2025

Reinforcement Learning via Self-Distillation (SDPO)

Python 663 63 Updated Feb 18, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 327,033 63,274 Updated Mar 20, 2026
Next