-
The Hong Kong University of Science and Technology
- Hong Kong SAR, China
Stars
LLM驱动的 A/H/美股智能分析:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.
A live reading list for LLM data synthesis (Updated to July, 2025).
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
The agent that grows with you
A benchmark for LLMs on complicated tasks in the terminal
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
SkillsBench evaluates how well skills work and how effective agents are at using them.
Harbor is a framework for running agent evaluations and creating and using RL environments.
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
Use Claude Code as the foundation for coding infrastructure, allowing you to decide how to interact with the model while enjoying updates from Anthropic.
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
Salesforce Enterprise Deep Research
MCP-Universe is a comprehensive framework designed for RL training, benchmarking, and developing AI agents for general tool-use.
A Survey of Reinforcement Learning for Large Reasoning Models
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
基于多智能体LLM的中文金融交易框架 - TradingAgents中文增强版
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Eigent: The Open Source Cowork Desktop to Unlock Your Exceptional Productivity. Local and Free Alternative to Claude Cowork.
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.