Skip to content

GBakalkinOAI/codex-hybrid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

codex-hybrid

Broad global goal: automate workflow to investigate the scope of narrow, file-scooped, low-risk tasks that orchestrator gpt-5.4 can delegate to given local LLM to get essentially the same solution quality while conserving precious gpt-5.4 tokens.

Rationale: Local LLM tokens are essentially free, so it is a win to use lots of them to save a few gpt-5.4 tokens without sacrificing solution quality.

What counts as full success: parity in problem solving (with A>2 and B>3) between these separate Codex setups:

  • current: powered by A*X smart tokens from the best model available via OpenAI Plus plan
  • hybrid-local: powered by X smart orchestrator tokens plus B*X "free but dumber" local LLM tokens for delegated subtasks

Inspirational real GitHub repos that GPT-5.4 found

  • disler/aider-mcp-server is the best match to "Master agent delegates grunt work to a cheaper/alternate coding agent." Its README says it “allows Claude Code to offload AI coding tasks to Aider,” with the explicit goal of reducing costs and using the stronger model more orchestratively. It exposes the delegate as an MCP server, passes an --editor-model, and includes tests for the coding tool path.

  • Aider-AI/aider plus bjodah/local-aider is the best proven split-model editing pattern. Aider’s official docs describe architect/editor mode: one model proposes the solution, then a second model turns that proposal into concrete file edits. Their docs also say architect mode often improves reliability for weaker editors, and their troubleshooting guide says local quantized models often have editing problems, recommending architect mode and sometimes “whole” edit format as mitigations. The local-aider proof-of-concept shows this pattern with local models on one consumer GPU, using a local reasoning architect and a separate local editor model. It is the clearest existing evidence that reasoning and editing should often be split when local models are flaky at edits.

  • can1357/oh-my-pi is the best open-source example of delegated subagents with per-agent model overrides. Its README advertises per-agent model overrides, and the development docs describe a task tool that delegates work to child agent sessions, resolves model overrides per agent, and aggregates results. That is highly relevant to our longer-term goal of turning this into a reusable evaluation skill, because it shows the architecture we want: planner/orchestrator on one model, executor(s) on another, with explicit delegation boundaries and observability.

  • ToDo: check if the above repos give enough inspiration for our own task.

Machine / setup

  • OS: Ubuntu 24.04 LTS

  • GPU: NVIDIA Quadro RTX 4000, 8 GB VRAM

  • I do not want experiments to overwrite or destabilize my normal Codex setup powered solely by gpt-5.4

  • User dansa302 initialize local LLM and live logging outside of Codex sandbox using helper script ./tools/start_local_lmstudio_model.sh.

    • Codex is expected to improve this script to increase local LLM fitness for delegated tasks
    • before editing this script Codex should stop, explain the proposed changes, and wait for user approval
    • after editing this script Codex should stop, explain how to re-run modified script, wait for user to reload local LLM, and verify
  • We start by investigating . Later we will also investigate ibm/granite-4-h-tiny, qwen/qwen3.5-9b, possibly other LLMs.

  • Codex CLI operates as unpriviledged user ai2 using free local LLM like this:

MODEL="${MODEL:-google/gemma-4-e4b}"
CTX="${CTX:-16384}"
curl -fsS http://127.0.0.1:1234/v1/responses \
  -H 'Content-Type: application/json' \
  -d "{
    \"model\": \"$MODEL\",
    \"input\": \"Reply with exactly: MODEL_OK\",
    \"max_output_tokens\": $CTX
  }"
  • We have lean logs to triage failures:
truncate -s 0 /home/ai2/lmstudio-codex-trace/lms-*.jsonl
curl -fsS http://127.0.0.1:1234/v1/responses \
  -H 'Content-Type: application/json' \
  -d "{
    \"model\": \"$MODEL\",
    \"input\": \"List all tools you can call\",
    \"max_output_tokens\": $CTX
  }"
cat /home/ai2/lmstudio-codex-trace/lms-model.jsonl
cat /home/ai2/lmstudio-codex-trace/lms-server.jsonl

Known issue already established

There is a repeated upstream-style problem where local/custom providers may not expose the edit contract the way Codex expects, especially around apply_patch / tool-calling behavior. Similar reports already exist upstream, so we do not need to file a new bug report right now.

What to delegate to local LLM

When a task is narrow, file-scoped, and low-risk, first try the local worker: ./tools/local_free_helper_codex.sh "<task>"

Start conservatively, using the local worker only for:

  • explicit file paths
  • small edits
  • repetitive grunt work

After the worker returns, always verify:

  • which files actually changed
  • whether the task was done close enough to orchestrator's intent
  • whether the worker claimed success without changing files

If verification fails, stop delegating and report failure. Do not let the local worker choose broad plans or architecture.

Benchmarking delegation to local LLM

Each folder ./tests/<task_id> holds one simple task as

  • setup.sh script to create a fresh task instance in ./results/<time_stamp>_<task_id>
  • prompt.txt a prompt for to delegate task with ./tools/local_free_helper_codex.sh to delegate to local LLM along with

About

Codex CLI: GPT-5.4 orchestrating local Gemma 4 as free helper

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors