xdotli

Xiangyi Li xdotli

building BenchFlow.ai

174 followers · 447 following

@benchflow-ai
San Francisco
14:06 (UTC -07:00)
xiangyi.li
@xdotli
in/l1xiangyi

Sponsoring

Achievements

x3 x3 x3

Achievements

x3 x3 x3

Highlights

Organizations

Lists (1)

Sort

React-Native

1 repository

Starred repositories

kessler / gemma-gem

Gemma Gem runs Google's Gemma 4 model entirely on-device via WebGPU — no API keys, no cloud, no data leaving your machine.

TypeScript 940 102 Updated May 29, 2026

allenai / olmes

Reproducible, flexible LLM evaluations

Python 380 95 Updated Mar 24, 2026

huggingface / lerobot

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Python 25,161 4,858 Updated Jun 21, 2026

simpler-env / SimplerEnv

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)

Jupyter Notebook 1,096 193 Updated Dec 20, 2025

DietrichGebert / ponytail

Makes your AI agent think like the laziest senior dev in the room. The best code is the code you never wrote.

JavaScript 45,664 2,246 Updated Jun 21, 2026

vercel / eve

The Framework for Building Agents

TypeScript 2,026 146 Updated Jun 21, 2026

rohitg00 / ai-engineering-from-scratch

Learn it. Build it. Ship it for others.

Python 35,304 5,759 Updated Jun 14, 2026

context-labs / HALO

Hierarchal Agent Loop Optimizer

TypeScript 891 67 Updated Jun 21, 2026

Gen-Verse / Open-AgentRL

RLAnything (ICML 2026) & AutoTool (ICML 2026), DemyAgent: Open-Source RL for LLMs and Agentic Scenarios

Python 555 56 Updated Jun 12, 2026

shadcn / improve

Use your most capable model to audit your codebase and write plans for cheaper models to execute.

5,867 234 Updated Jun 15, 2026

omnigent-ai / omnigent

Omnigent is an open-source AI agent framework and meta-harness: orchestrate Claude Code, Codex, Cursor, Pi, and custom agents — swap harnesses without rewriting, enforce policies and sandboxing, an…

Python 4,304 488 Updated Jun 21, 2026

allenai / olmo-eval

Python 42 5 Updated Jun 18, 2026

rdi-berkeley / agents-last-exam

Agents' Last Exam

Python 705 29 Updated Jun 21, 2026

thinkwee / AgentsMeetRL

Awesome List for Agentic RL

HTML 1,623 63 Updated Jun 20, 2026

vllm-project / vime

An LLM post-training framework with vLLM for RL Scaling

Python 288 29 Updated Jun 21, 2026

dohooo / helmor

Open-source local workbench for multi-agent software development.

TypeScript 1,237 108 Updated Jun 21, 2026

EveryInc / compound-engineering-plugin

Official Compound Engineering plugin for Claude Code, Codex, Cursor, and more

TypeScript 21,856 1,607 Updated Jun 21, 2026

smithersai / claude-p

Drop-in replacement for `claude -p` that drives the interactive Claude Code TUI inside an in-process zmux PTY session.

Zig 386 34 Updated Jun 17, 2026

dn00 / clarp

Drop-in replacement for claude -p that runs on your Claude Code subscription instead of metered API pricing.

TypeScript 37 7 Updated Jun 10, 2026

anthropics / defending-code-reference-harness

Skills for threat modeling, scanning, triage, patching, plus an autonomous scanning harness you can /customize

Python 6,128 468 Updated Jun 15, 2026

containers / bubblewrap

Low-level unprivileged sandboxing tool used by Flatpak and similar projects

C 7,677 353 Updated Jun 2, 2026

arteemg / autoswarm

Open-source framework for superagents.

Python 88 4 Updated Jun 16, 2026

walkinglabs / hands-on-modern-rl

🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.

Python 3,038 201 Updated Jun 21, 2026

vals-ai / Valkyrie

Scalable, cloud-native infrastructure for evaluating AI agents across any benchmark.

Python 10 Updated Jun 19, 2026

agentbeats / agentbeats

Python 80 21 Updated Nov 17, 2025

ARA-Labs / Agent-Native-Research-Artifact

A protocol that recasts the primary research object from narrative document to machine-executable knowledge package — so AI agents can navigate, reproduce, and extend published research without re-…

JavaScript 379 40 Updated Jun 18, 2026

CloakHQ / CloakBrowser

Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.

Python 26,785 2,110 Updated Jun 21, 2026

GXL-ai / paperclip

Paperclip — search, read, and analyze 8M+ biomedical papers from the command line

Python 183 17 Updated May 22, 2026

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …

Python 14,567 1,487 Updated Jun 18, 2026