Highlights
- Pro
Stars
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
A bridge between Streamable HTTP and stdio MCP transports
SDE-Harness (Scientific Discovery Evaluation Framework)
Agentic AI research papers, benchmarks, frameworks, and tools curated across 24 domains.
[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge
Therapeutics Commons (TDC): Multimodal Foundation for Therapeutic Science
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery (EMNLP'24)
TS-DAR identifies transition states of protein conformational changes from MD simulations using hyperspherical embeddings in the latent space.
A Chemistry Toolkit that turns your AI assistant into a Chemistry coscientist..
Awesome GUI Agent Paper List
High accuracy RAG for answering questions from scientific documents with citations
A machine learning software for extracting information from scholarly documents
[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
A curated list of Python packages related to chemistry
Awesome-Biomolecule-Language-Cross-Modeling: a curated list of resources for paper "Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey"
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
Scientific Large Language Models: A Survey on Biological & Chemical Domains
Official code repo for the paper "LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset"
Ongoing research training transformer models at scale
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.