Lists (19)
Sort Name ascending (A-Z)
Stars
An automated data pipeline scaling RL to pretraining levels
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Official code and dataset for our ICCV 2025 paper: MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Bringing BERT into modernity via both architecture changes and scaling
A curated list of resources for Document Understanding (DU) topic
Creative Preference Optimization
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.
The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
Build, enrich, and transform datasets using AI models with no code
Collection of scripts and notebooks for OpenAI's latest GPT OSS models
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
A lightweight, powerful framework for multi-agent workflows
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation
Exposure-slot: Exposure-centric representations learning with Slot-in-Slot Attention for Region-aware Exposure Correction, Computer Vision and Pattern Recognition (CVPR), 2025.
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
한국어 음성인식 STT API 리스트. 각 성능 벤치마크.