LLM-OS-Models is an organization for building and evaluating model capabilities that make up an LLM operating-system stack.
docs: central documentation and experiment cardsllm-os-eval-core: shared evaluation schemas, runners, graders, and reportersTerminal: terminal-agent training and Terminal-Bench evaluationMD-Retrieval: Markdown retrieval and grounded answer evaluationTool-Call: tool-calling, schema validation, and execution evaluationText2SQL: document-grounded text-to-SQL evaluation and trainingCoding-Agent: repo-grounded coding-agent evaluation and trainingDocAI-OCR: OCR and structured document parsing for downstream agentsDeep-Research: research-agent evaluation and external comparison track
T_baseT_sftS_baseS_sftS_kld