Skip to content

Add optional automatic tool naming#51

Open
noobydp wants to merge 8 commits into
tracefinity:mainfrom
noobydp:codex/auto-tool-names
Open

Add optional automatic tool naming#51
noobydp wants to merge 8 commits into
tracefinity:mainfrom
noobydp:codex/auto-tool-names

Conversation

@noobydp

@noobydp noobydp commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add optional automatic naming for traced tool polygons using short JSON responses from Ollama, Gemini, OpenRouter, or hosted provider fallback.
  • Run naming in the background with generic tool N labels first, then stream returned names into the Step 4 Save UI one by one.
  • Allow users to manually rename tools before saving, cancel pending naming, retry failed naming for still-generic labels, and save immediately without waiting.
  • Warm Ollama earlier with a small image payload so the vision path is initialized before the first real crop request.
  • Guard against stale background naming tasks mutating a newer re-trace, and document the new configuration and retry endpoint.

Why

The trace workflow should stay usable even when local VLM naming is slow or unavailable. Previously, cold Ollama requests could leave users staring at generic names, and late background results could conflict with manual edits or newer trace state. This keeps auto naming opportunistic: useful when it lands, harmless when it does not.

Validation

  • python -m pytest backend\tests using the existing backend test venv: 90 passed
  • npm exec -- tsc --noEmit
  • npm run build
  • git diff --check

Known existing issue: npm run lint currently fails before linting because the script invokes next lint, which is not valid in this Next 16 setup.

Codex Disclaimer

This PR was drafted with assistance from OpenAI Codex.

noobydp added 7 commits May 4, 2026 10:23
# Conflicts:
#	frontend/src/components/ToolEditor.tsx
Add configurable background tool labeling for traced polygons, with Ollama and hosted provider support, trace-page polling, save-time cancellation of pending labels, docs, and focused backend tests. Also document ONNX CUDA memory controls added during this worktree.
@noobydp noobydp changed the title [codex] Add optional automatic tool naming Add optional automatic tool naming May 31, 2026
@noobydp noobydp marked this pull request as ready for review May 31, 2026 15:43
@noobydp

noobydp commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Update after conflict-reduction pass:

  • Added Reduce auto-naming merge conflicts (e18347a).
  • Moved background naming task logic out of routes.py into backend/app/services/tool_label_tasks.py, reducing route-file churn while keeping the same naming behavior.
  • Adjusted the trace-page tool list to preserve the existing pencil/edit-label interaction style while retaining naming status, cancel, retry, polling, and save-time cancellation behavior.
  • Made the API docs additive around automatic naming behavior.

Validation for this follow-up:

  • backend\venv\Scripts\python.exe -m py_compile backend\app\api\routes.py backend\app\services\tool_label_tasks.py backend\app\services\tool_labeler.py backend\tests\test_tool_labeler.py
  • frontend\node_modules\.bin\tsc.cmd -p frontend\tsconfig.json --noEmit

Note: pytest was not available in this worktree venv or system Python during this follow-up check.

Pairwise merge status after this update:

@noobydp

noobydp commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

The E2E failure is caused by a selector expectation that no longer matches the trace selection UI.

The failing test is frontend/e2e/happy-path.spec.ts in verify trace results, where it looks for buttons named /^Include tool \d+$/. This PR changed that sidebar row so the whole detected-tool row is the include/select control, while the only nested button is now the pencil button for editing the tool name (aria-label="Edit ..."). That was intentional: the UI now keeps the existing pencil/edit-label interaction while adding automatic naming status and editable generated labels.

Could the test be updated to select tools through the current row interaction, or better, through a stable test id / accessible selector that represents the selectable tool row instead of assuming each include control is a button named Include tool N?

Code and PR drafted with Codex.

@jasonmadigan jasonmadigan left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for putting this together. the implementation is thorough and the UX flow (generic labels first, stream in names, allow manual override) is well thought out.

a few things to discuss before we move forward on the code:

direction concern

tool naming/categorisation should be optional and pluggable, not tied to a specific inference approach. this PR bakes in four provider backends (Ollama, Gemini, OpenRouter, hosted fallback) with warmup, polling, retry, and background task infrastructure. that's a lot of surface area for what should be one possible implementation behind a simple interface. I'd rather see a clean abstraction (e.g. a ToolNamer protocol with a single name(image) -> str method) that this ML approach can plug into, alongside simpler alternatives (manual-only, filename-based, etc).

code-level findings (for when the direction is agreed)

  • 3 of 21 new tests fail -- test_background_labeling_* tests pass "default" (string) where SessionStore is expected. the outer try/except swallows the AttributeError, so some tests pass by accident
  • asyncio.create_task() with no stored reference -- task can be silently GC'd
  • multiple ToolLabeler() instantiations per request (re-reads settings each time)
  • os.getenv() alongside pydantic-settings creates double source of truth for config
  • 500ms polling (40 attempts) -- consider SSE or a lightweight status endpoint

happy to discuss the architecture direction. I've added a DESIGN.md to capture these kinds of decisions so we're aligned before PRs get too far along.

@noobydp

noobydp commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Hey, sorry for this one. I intended to keep it optional/pluggable but the implementation got complicated over time and I didn't consider how impactful it was by the time I finished.

I'll rework this and follow your direction and fix those other things.

Do you want me to share extra info before I re-submit a PR, or push to my fork or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants