zhaoyuzhi

😪

I may be slow to respond.

Yuzhi ZHAO zhaoyuzhi

😪

I may be slow to respond.

Ph.D., CityU EE. B.Eng, HUST EIC (电信卓越班)

148 followers · 28 following

City University of Hong Kong
Hong Kong, China
02:45 (UTC +08:00)
https://zhaoyuzhi.github.io/

Lists (2)

Sort

LLM

10 repositories

MLLM

26 repositories

Stars

SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.

Python 7,679 736 Updated Jun 15, 2026

ShandaAI / AlayaRenderer

Generative World Renderer: an AI-native Renderer for Games and Virtual Worlds. 面向游戏与虚拟世界的AI原生渲染引擎

Python 632 10 Updated May 5, 2026

AGI-Eval-Official / PRDBench

Python 37 4 Updated May 29, 2026

tanweai / pua

你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候，对你的期望是很高的。一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.

TypeScript 18,289 1,103 Updated Jun 12, 2026

Weiyun1025 / verl-internvl

Python 52 8 Updated Oct 20, 2025

openclaw / openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 379,033 79,323 Updated Jun 16, 2026

QwenLM / Qwen3-VL-Embedding

Python 1,290 108 Updated Apr 8, 2026

simular-ai / Agent-S

Agent S: an open agentic framework that uses computers like a human

Python 11,856 1,401 Updated May 13, 2026

maze-agent / Maze

A distributed framework for LLM agents

Python 533 14 Updated Jun 16, 2026

Endlinc / VP-Bench

Official Repo for AAAI 2026 paper, VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models.

Python 7 Updated Dec 2, 2025

THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Python 3,501 262 Updated Feb 8, 2026

harbor-framework / terminal-bench

A benchmark for LLMs on complicated tasks in the terminal

Python 2,365 542 Updated Jan 22, 2026

sierra-research / tau2-bench

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Python 1,361 354 Updated Jun 11, 2026

Mondrian-He / awesome-emnlp-2025-artist

This is a repository dedicated to high quality figures from EMNLP 2025 long papers.

52 6 Updated Dec 15, 2025

Libr-AI / do-not-answer

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

Jupyter Notebook 329 29 Updated Jun 7, 2024

verl-project / verl

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 22,004 4,085 Updated Jun 16, 2026

open-compass / MMBench-GUI

Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, includi…

Python 112 5 Updated Sep 8, 2025

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 23,295 2,150 Updated Jan 27, 2026

tongjingqi / Game-RL

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning

Python 154 2 Updated Jun 1, 2026

zhaochen0110 / Awesome_Think_With_Images

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,484 46 Updated Mar 9, 2026

Kwai-Kolors / Kolors

Kolors Team

Python 4,607 354 Updated Nov 13, 2024

GAIR-NLP / LIMI

LIMI: Less is More for Agency

Python 162 7 Updated Oct 14, 2025

ChunmingHe / awesome-diffusion-models-in-low-level-vision

A Repository for Diffusion-Model-related Papers in Low-level Vision

555 12 Updated Feb 23, 2025

Alibaba-NLP / DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 19,418 1,489 Updated Feb 27, 2026

OpenGVLab / ScaleCUA

[ICLR 2026 Oral] ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).

Python 1,115 79 Updated Jan 7, 2026

MoonshotAI / Kimi-K2

Kimi K2 is the large language model series developed by Moonshot AI team

10,866 853 Updated Jan 21, 2026

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 4,223 723 Updated Jun 15, 2026

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 7,093 790 Updated Jun 15, 2026

AngusDujw / Bottom-Up-Agent

Python 203 16 Updated Oct 10, 2025

acl-org / acl-style-files

Official style files for papers submitted to venues of the Association for Computational Linguistics

BibTeX Style 1,886 373 Updated Nov 13, 2025