Humphrey HumphreySun98

Hi, I'm Haofei Sun

AI Agents & LLM Infrastructure · Deep Learning for Wireless Sensing · Embedded Systems

Open to full-time SWE / AI Engineer / ML Engineer roles — graduating Dec 2026.

About Me

Engineer who connects hardware signals to intelligent software, and who ships systems honestly — including when the simple baseline wins. Recently I've contributed merged fixes to several leading LLM-infrastructure projects (SGLang, LiteLLM, LangChain), built embedded RTOS firmware sampling RF at 77 kHz (3x prior published rates), trained deep-learning models that recover signals lost to aliasing with 0.986 R2 on chirp recovery, and shipped full-stack LLM agents live on the Chrome Web Store and in production.

Contributed to leading LLM-infrastructure projects — merged PRs into SGLang (~29k★ serving framework), LiteLLM (50k★ gateway), and LangChain, spanning multi-tenant batching, multi-region routing, and prompt-encoding bugs (details below)
Built a physics-informed neural network on NVIDIA B200 reconstructing aliased RF signals with 0.986 R2 on chirp recovery
Custom Zephyr RTOS firmware on nRF54L15 hitting 77 kHz BLE RSSI sampling with <0.01% drop rate
Shipped Archiagents (https://archiagents.com/) — an end-to-end AI agent for architectural design that takes project briefs through to IFC4 BIM models and photorealistic renders. Owned engineering implementation and VPS deployment (2-person team)
Deployed a Claude-powered learning agent live on Chrome Web Store + HuggingFace, with a 4-policy benchmark and an honestly-reported finding that a rule-based heuristic outperformed Q-learning on short-horizon tasks
Shipped RepoAgentBench, an open-source toolkit that mines merged PRs into reproducible coding-agent benchmarks; tested 4 frontier LLMs across claude-code and aider with real API spend
Running a production LLM API gateway (https://api.manxuezhida.com) with multi-provider routing, load balancing, and key management — serves my downstream products
Summer 2026 intern at Halo Microelectronics — full-stack AI agent system for analog IC design (RAG + agent orchestration)

Interests: LLM serving infrastructure, edge AI, wireless sensing, LLM agents, signal processing, sim-to-real for robotics.

Open Source — LLM Infrastructure Contributions

sgl-project/sglang (~29k★) — high-performance LLM/multimodal inference-serving framework

PR #26971 (merged): Fixed a batched multi-tenant cache-routing crash — GenerateReqInput.extra_key wasn't indexed per sub-request, so the whole list was passed to RadixKey.child_key(), crashing prefix-cache matching with TypeError: unhashable type: 'list'. Added _normalize_extra_key() (scalar broadcast / list-length validation / parallel-sample expansion) + a 6-path regression test; passed 121 CI checks.
PR #25975 (merged, co-author): Prefill-delayer monitoring-metric fix — prefill_delayer_wait_* histogram stuck at 0 because the release path read next_state=None; maintainer adopted the prev_state approach and credited me as co-author.

BerriAI/litellm (50k★) — LLM gateway/proxy unifying 100+ providers

PR #29707 (merged): Diagnosed a Vertex AI context-caching 404 on multi-region (eu/us) endpoints — the caching path hardcoded the single-region host instead of the multi-region REP host the inference path already used — and contributed the merged parametrized regression suite locking the corrected host-resolution invariant. 49 green CI checks.

langchain-ai/langchain-aws — AWS/Bedrock integrations for LangChain

PR #1085 (merged): Repo-wide static analysis caught ensure_ascii=True defaults in json.dumps across Bedrock converters, tool-schema serializers, and stream parsers — silently escaping CJK/emoji to \uXXXX and inflating prompt token cost ~6x. Fixed across 11 sites in 3 modules.

RepoAgentBench — my open-source CLI on PyPI for reproducible, contamination-free coding-agent benchmarks.

Tech Stack

Languages

AI / ML

Backend & Web

Infrastructure

Embedded & Hardware

Featured Projects

Project	Description	Stack
Archiagents — https://archiagents.com/	End-to-end AI agent for architectural design (2-person team). Ingests project briefs + CAD/DWG/IFC/Revit files, conducts requirement dialogue, generates design schemes, renders photorealistic visualizations (gpt-image-1), and outputs IFC4 BIM models with embedded Autodesk APS viewer. Multi-LLM backend (Claude / GPT / Gemini); deployed on custom domain via VPS.	Vercel AI SDK, shadcn/ui, gpt-image-1, Autodesk APS, IFC4
LLM API Gateway — https://api.manxuezhida.com	Production LLM API proxy serving multiple providers (Claude / GPT / Gemini) with load balancing, API key management, and request routing. Powers SmartStudy Agent, Archiagents, and other downstream products. Custom domain on VPS.	Node.js, Express, VPS
SmartStudy Agent (Web · Chrome Extension)	Closed-loop POMDP learning agent with 4-policy benchmark (Random / Rule-based / LinUCB Bandit / Q-learning) over 30 simulated students x 30 sessions. Honestly reported finding: rule-based heuristic +35% over random vs Q-learning +18% — RL is defensible but not dominant in short-horizon regime. Live on Chrome Web Store + HuggingFace; 8-page Streamlit UI; 3 pluggable LLM backends.	Python, Claude API, Streamlit, SQLite, Chrome MV3
RepoAgentBench	Open-source CLI that mines merged GitHub PRs into reproducible, contamination-free coding-agent benchmarks. Adapters for claude-code and aider; tested with 4 frontier LLMs (Opus 4.7 / GPT-5.5 / Sonnet 4.6 / Gemini 3.1 Pro) using real API spend.	Python, Click, PyPI, JSONL, GitHub API
NeuroUnfold	Physics-informed DL recovering 406 kHz LoRa chirps from 5.3x aliased BLE RSSI with 0.986 R2 on chirp recovery. Branch disambiguation enables BLE-only wireless sensing at 5 m.	Python, PyTorch, NumPy
High-Speed BLE RSSI Firmware	Custom Zephyr RTOS firmware on nRF54L15 hitting 77 kHz sampling (3x prior published), bypassing BLE protocol layer for raw energy detection.	C, Zephyr RTOS, DMA
Agentic Weather Assistant	Full-stack agentic web app with 3-service architecture: React frontend + FastAPI backend (LangChain ReAct agent + LangGraph) + custom MCP microservice wrapping a public REST API. Pydantic-validated typed tool-calling across services.	React, FastAPI, LangChain, LangGraph, MCP
Dual-Stream Gesture Transformer	Real-time hand gesture recognition via a Dual-Stream Spatiotemporal Transformer on MediaPipe skeletons. 557 FPS GPU (1.79 ms latency), 88.2% accuracy with 35 labeled samples via Sim-to-Real training.	Python, PyTorch, MediaPipe
Deep Learning for BLE Sensing	End-to-end super-resolution pipeline recovering wideband LoRa channel responses from narrowband BLE RSSI via progressive sub-pixel convolution.	Python, PyTorch, C

Research & Publications

Robotic Manipulation RL — Sim-to-Real on Franka & xArm (paper in preparation): Contact-rich policy training in Isaac Lab with sim-to-real transfer to physical hardware.
Peer Reviewer, AgentSkills Workshop, ACM CAIS 2026 (ACM Conference on AI and Agentic Systems)
Peer Reviewer, IEEE Wireless Communications Letters
2 Chinese patents accepted on mixed-signal circuit techniques
Provincial Second Prize, China Undergraduate Mathematical Contest in Modeling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Humphrey HumphreySun98

Sponsoring

Achievements

Achievements

Highlights

Block or report HumphreySun98