I am an AI Researcher, working on reliable long-horizon AI agents, agentic reinforcement learning, and calibrated post-training.
My research asks a simple question:
How can AI agents know what they don’t know, act under uncertainty, and improve from their own prediction–reality gaps?
I build methods, environments, and evaluation frameworks that turn uncertainty, confidence, and consistency into first-class training signals for reliable and self-improving AI systems.
Homepage · Google Scholar · LinkedIn · X/Twitter · Email
- Agentic RL & Post-training
Calibration-aware on-policy distillation, GRPO/RL training, self-evolving environments, synthetic feedback, and reward/evaluator design for long-horizon agents. - Alignment, Calibration & Honesty
Uncertainty-aware supervision, confidence calibration, hallucination detection, factuality, scalable oversight, and reliable model behavior. - Long-horizon Agents & Evaluation
Tool use, planning, trajectory-level evaluation, deep research agents, evidence grounding, failure attribution, and enterprise-scale agent benchmarks.
- Prospective Hindsight
Self-calibrating reinforcement learning via prediction–reality gaps, aligning an agent’s action-time self-belief with verifier outcomes. - CaOPD: Calibration-aware On-policy Distillation
Decouples capability learning from honest confidence calibration in LLM post-training. - Agentic Uncertainty Quantification
Turns verbalized uncertainty into active control signals for memory, reflection, and long-horizon execution. - [ICML2026] Agentic Confidence Calibration
A trajectory-level calibration framework for diagnosing and improving the reliability of long-horizon agents. - [ACL2026] The Evolving Role of Uncertainty Quantification in Large Language Models
The evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior
For the full list of publications, please see my Google Scholar or homepage.
I am interested in reliable AI agents, agentic RL, post-training, calibration, uncertainty, scalable evaluation, and self-improving AI systems. Feel free to reach out via email or visit my homepage.