Skip to content
View jmiao24's full-sized avatar

Highlights

  • Pro

Block or report jmiao24

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 3,831 2,222 Updated Mar 24, 2026

Benchmarking approaches to fine-tune AlphaGenome on lentiMPRA data

Python 5 Updated Mar 19, 2026

AI agents running research on single-GPU nanochat training automatically

Python 442 23 Updated Mar 13, 2026

⚒ Evolutionary self-improvement for Hermes Agent — optimize skills, prompts, and code using DSPy + GEPA

Python 259 22 Updated Mar 9, 2026

AI agents running research on single-GPU nanochat training automatically

Python 53,802 7,483 Updated Mar 21, 2026

Pixel office.

TypeScript 5,268 756 Updated Mar 23, 2026

An open-source SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of tasks that c…

Python 42,481 4,986 Updated Mar 24, 2026

Ralph is an autonomous AI agent loop that runs repeatedly until all PRD items are complete.

TypeScript 13,673 1,409 Updated Feb 2, 2026

Train transformer language models with reinforcement learning.

Python 17,772 2,586 Updated Mar 24, 2026

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

Python 9,235 903 Updated Mar 24, 2026

AllenAI's post-training codebase

Python 3,650 515 Updated Mar 24, 2026

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 4,288 369 Updated Nov 13, 2025

Replication archive for "Do Claude Code and Codex P-Hack? Sycophancy and Statistical Analysis in Large Language Models"

R 17 Updated Mar 3, 2026

Secure, Fast, and Extensible Sandbox runtime for AI agents.

Python 9,225 700 Updated Mar 24, 2026

Official PyTorch Implementation for Learning a Generative Meta-Model of LLM Activations

Jupyter Notebook 73 12 Updated Mar 18, 2026

AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods

TypeScript 27,592 2,915 Updated Mar 24, 2026

Hypernetworks that update LLMs to remember factual information

Python 609 65 Updated Mar 2, 2026

Scaling Preference Data Curation via Human-AI Synergy

145 5 Updated Jul 3, 2025

Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"

Jupyter Notebook 604 54 Updated Oct 7, 2025

HumanLM: Simulating Users with State Alignment Beats Response Imitation

Python 69 8 Updated Feb 27, 2026

Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models

1,040 40 Updated Mar 15, 2026

Code and Data for Tau-Bench

Python 1,140 187 Updated Mar 18, 2026

An ARC-AGI solution using Agentica from Symbolica

Python 167 16 Updated Feb 12, 2026

"🐈 nanobot: The Ultra-Lightweight OpenClaw"

Python 35,958 6,132 Updated Mar 24, 2026

pip install continualcode

Python 37 3 Updated Feb 10, 2026

Official MCP server implementation for accessing Open Targets Data

Python 25 2 Updated Mar 20, 2026

Ideas for projects related to Tinker

173 9 Updated Nov 6, 2025

Reinforcement Learning via Self-Distillation (SDPO)

Python 681 64 Updated Feb 18, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 334,056 65,143 Updated Mar 24, 2026
Next