Stars
A unified AI model hub for aggregation & distribution. It supports cross-converting various LLMs into OpenAI-compatible, Claude-compatible, or Gemini-compatible formats. A centralized gateway for p…
⚒ Evolutionary self-improvement for Hermes Agent — optimize skills, prompts, and code using DSPy + GEPA
Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat forever.
NAACL2025 - Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Your Personal AI Assistant; easy to install, deploy on your own machine or on the cloud; supports multiple chat apps with easily extensible capabilities.
Build personal agents and enterprise AI workforces that plan, delegate, use tools, and deliver real work — without brittle workflows.
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
TradingAgents: Multi-Agents LLM Financial Trading Framework
Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud(通义点金:阿里云金融大模型)
An autonomous agent for deep financial research
Lightweight, open-source AI agent for your tools, chats, and workflows.
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
ReMe: Memory Management Kit for Agents - Remember Me, Refine Me.
AgentEvolver: Towards Efficient Self-Evolving Agent System
The official codebase for our paper, FLEX: Continuous Agent Evolution via Forward Learning from Experience.
🌎💪 BrowserGym, a Gym environment for web task automation
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
An RL Recipe for Building Agentic LLMs via Self-Imitation on Long-Horizon Agentic Tasks
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning" by Zhiheng Xi et al.