Add optional automatic tool naming#51
Conversation
# Conflicts: # frontend/src/components/ToolEditor.tsx
Add configurable background tool labeling for traced polygons, with Ollama and hosted provider support, trace-page polling, save-time cancellation of pending labels, docs, and focused backend tests. Also document ONNX CUDA memory controls added during this worktree.
|
Update after conflict-reduction pass:
Validation for this follow-up:
Note: Pairwise merge status after this update:
|
|
The E2E failure is caused by a selector expectation that no longer matches the trace selection UI. The failing test is Could the test be updated to select tools through the current row interaction, or better, through a stable test id / accessible selector that represents the selectable tool row instead of assuming each include control is a button named Code and PR drafted with Codex. |
jasonmadigan
left a comment
There was a problem hiding this comment.
thanks for putting this together. the implementation is thorough and the UX flow (generic labels first, stream in names, allow manual override) is well thought out.
a few things to discuss before we move forward on the code:
direction concern
tool naming/categorisation should be optional and pluggable, not tied to a specific inference approach. this PR bakes in four provider backends (Ollama, Gemini, OpenRouter, hosted fallback) with warmup, polling, retry, and background task infrastructure. that's a lot of surface area for what should be one possible implementation behind a simple interface. I'd rather see a clean abstraction (e.g. a ToolNamer protocol with a single name(image) -> str method) that this ML approach can plug into, alongside simpler alternatives (manual-only, filename-based, etc).
code-level findings (for when the direction is agreed)
- 3 of 21 new tests fail --
test_background_labeling_*tests pass"default"(string) whereSessionStoreis expected. the outertry/exceptswallows theAttributeError, so some tests pass by accident asyncio.create_task()with no stored reference -- task can be silently GC'd- multiple
ToolLabeler()instantiations per request (re-reads settings each time) os.getenv()alongside pydantic-settings creates double source of truth for config- 500ms polling (40 attempts) -- consider SSE or a lightweight status endpoint
happy to discuss the architecture direction. I've added a DESIGN.md to capture these kinds of decisions so we're aligned before PRs get too far along.
|
Hey, sorry for this one. I intended to keep it optional/pluggable but the implementation got complicated over time and I didn't consider how impactful it was by the time I finished. I'll rework this and follow your direction and fix those other things. Do you want me to share extra info before I re-submit a PR, or push to my fork or something? |
Summary
tool Nlabels first, then stream returned names into the Step 4 Save UI one by one.Why
The trace workflow should stay usable even when local VLM naming is slow or unavailable. Previously, cold Ollama requests could leave users staring at generic names, and late background results could conflict with manual edits or newer trace state. This keeps auto naming opportunistic: useful when it lands, harmless when it does not.
Validation
python -m pytest backend\testsusing the existing backend test venv: 90 passednpm exec -- tsc --noEmitnpm run buildgit diff --checkKnown existing issue:
npm run lintcurrently fails before linting because the script invokesnext lint, which is not valid in this Next 16 setup.Codex Disclaimer
This PR was drafted with assistance from OpenAI Codex.