Skip to content

Support Ollama as OpenAI-compatible backend (LEON_SGLANG_BASE_URL + model tags) #566

@githubphadnis

Description

@githubphadnis

GitHub issue draft (copy into leon-ai/leon)

Title: Support Ollama as OpenAI-compatible backend (LEON_SGLANG_BASE_URL + model tags)

Labels (suggested): enhancement, llm


Summary

Many self-hosters run Leon with Ollama using its OpenAI-compatible API (/v1/chat/completions) instead of a native SGLang server. Today this combination fails or misbehaves in several places because Leon treats sglang/<name> as a local filesystem model and enables thinking/reasoning that Ollama rejects.

We would like first-class support (or documented configuration) for:

LEON_LLM=sglang/llama3.2
LEON_SGLANG_BASE_URL=http://127.0.0.1:11434/v1

Problems observed (Leon develop, 2026-05)

1. Wrong model sent to the API

  • resolveConfiguredLLMTarget('sglang/llama3.2') resolves to a path under $LEON_HOME/models/ (see llm-routing.ts, LOCAL_PROVIDERS).
  • SGLangLLMProvider passes that path as the OpenAI-compatible model field.
  • Ollama expects the tag llama3.2400 invalid model name (or similar).

2. Thinking / reasoning not supported by Ollama

  • ReAct phases default to reasoningMode: 'on' (react-llm-duty/phase-policy.ts).
  • OpenAI-compatible provider options can set reasoningEffort: 'high' (ai-sdk-remote-llm-provider.ts).
  • Ollama returns: "llama3.2" does not support thinking.

3. Docker / LEON_HOME mismatches (deployment)

  • Build trains NLP assets under ~/.leon; runtime compose often sets LEON_HOME=/data/leon.
  • pnpm start expects managed Node at $LEON_HOME/bin/node/bin/node (skipped when GITHUB_ACTIONS=true during install).
  • Missing leon-skill-list.nlp under runtime LEON_HOME unless copied/seeded.

These are exacerbated in container stacks but (2) and (1) are core provider issues.

Proposed direction

  1. Remote tag mode: If sglang/<tag> is not an on-disk model under LLM_DIR_PATH, treat <tag> as the remote API model name when LEON_SGLANG_BASE_URL is configured (Ollama, LiteLLM, etc.).
  2. Capability flag: For providers/backends without thinking support, force disableThinking / reasoningMode: 'off' for ReAct and retry on "does not support thinking" (generalize existing tool_choice + thinking retry).
  3. Docs: Short “Ollama + Docker” section in configuration docs (env vars, no thinking, model pull).

Reference implementation (downstream)

We maintain a working stack and patch set here (interim, not ideal long-term):

  • Repo: https://github.com/githubphadnis/leon-ei
  • Patches applied at image build: docker/leon/apply-leon-ei-patches.mjs
  • Upstream package in that repo: upstream/leon-ollama/ (this issue text + PR proposal)

Happy to contribute a proper PR against develop if the approach above aligns with maintainers’ intent for SGLang vs generic OpenAI-compatible endpoints.

Environment

  • Leon: develop (2.0 dev preview)
  • Ollama: ollama/ollama:latest, model llama3.2
  • Direct API test: POST /v1/chat/completions with "model":"llama3.2" succeeds in ~5s on modest CPU hardware

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions