Tags: janhq/jan
Tags
chore: update llamacpp settings (#8168) * chore: wire llamacpp settings end-to-end in router mode Per-model sidebar (ctx_len, ngl, chat_template, batch_size, cpu_moe, n_cpu_moe, no_kv_offload, override_tensor, offload_mmproj) now persists to model.yml and regenerates the router preset so values actually reach inference; provider-level (Settings → Providers → llama.cpp) settings now trigger a debounced router restart on change rather than dying in the in-memory config. Global [*] preset emits the previously-missing threads, n-predict, ubatch-size, device, split-mode, main-gpu, no-mmap, mlock, rope-*, context-shift, cache-ram, cache-reuse, swa-full, keep keys with default-skip guards so the preset stays minimal. Sampling defaults from the per-model sidebar now flow through each chat completion via the custom fetch parameter merge (assistant params still override). Adds Tier 1 llamacpp-only samplers — typical_p, top_n_sigma, dynatemp_*, xtc_*, dry_* family, ignore_eos — gated by LLAMACPP_ONLY_PARAM_KEYS so they're stripped from non-llamacpp request bodies. Cleanup: drops the unused _overrideSettings / bypassAutoUnload slots from llamacpp's load() override; deletes the deprecated defrag-thold control upstream marked DEPRECATED; passes --no-ui (b9222+) or --no-webui (older) to llama-server based on the backend build number instead of hard-coding in the Rust plugin. * chore: scrub orphan defrag_thold setting from persisted provider state Bump useModelProvider persist version to 14 with a migration that drops the deprecated `defrag_thold` entry from `provider.settings` for the llamacpp provider. Companion to the deletion of the control from settings.json + the preset emitter — without this, localStorage carries the dead JSON forever. * fix: remove auto-increase context size, persist manual ctx_len bump The auto-increase-on-context-error behavior never worked in router mode because writing the per-model setting to Zustand alone never reached the running llama-server — the next reload used the OLD preset ctx-size and the request still exceeded the cap, looping until the attempt counter exhausted. Removes the auto-increase setting + the useEffect that triggered it. Keeps the manual "Increase Context Size" button in the error banner and wires it through `updateModelSettings` so model.yml is rewritten and the router restarts before the regenerate kicks off — the reload now picks up the new context window. Migration v15 strips the orphan `auto_increase_ctx_len` entry from persisted model settings; the v10 block that inserted it is now a no-op stub kept for the migration chain. * fix: regenerate router preset after llamacpp model rename `update()` (model rename) wrote a new `model.yml` under the renamed id but left `router.preset.ini` referencing the OLD id, so a subsequent `POST /models/load <new-id>` returned 404 until something else triggered a preset regen. Call `startRouter()` after the yaml write so the preset reflects the new id immediately. * fix: hide stale auto_increase_ctx_len entry in model sidebar Render-time guard in ModelSetting belt-and-suspenders the v15 migration: if a user's persisted state lands on version 15+ with the orphan entry still present (e.g. they ran an interim build before the migration shipped), the sidebar would surface the now-dead control. Skip it unconditionally at render. * fix: route "Growing the Mind..." partial continuation through manual button The in-band auto-increase on `finishReason === 'length'` was the second half of the auto-increase behavior — the useChat `onFinish` handler silently fired `handleContextSizeIncrease` whenever the model stopped at the context cap, leaking auto-behavior we'd otherwise removed. Now onFinish saves the partial output into a ref and surfaces the standard error banner. Clicking "Increase Context Size" consumes the saved partial (via setContinueFromContent + setPendingContinueMessage) before regenerating, so the resume-from-partial UX and the "Growing the Mind..." shimmer still work — they're just user-opted-in. The pending partial is cleared on the next streaming transition so a stale capture can't bleed into an unrelated regenerate. * fix: harden llamacpp settings persistence * fix: guard ctx_len migration nullability * fix: backfill llamacpp router model settings on upgrade * fix: drop stale ctx_len=8192 override in llamacpp provider template The template layer in TauriProvidersService hard-coded ctx_len to 8192 on every getProviders() call, clobbering migration v12 (which resets the stale 8192 default to '' so llama.cpp picks auto-fit / model default) and forcing every new llamacpp model to render 8192 regardless of intent. That bogus value then flowed into model.yml and the router preset. * fix: persist llamacpp setting edits and debounce router restart `debouncedStopModel` was created fresh on every render. The cleanup effect (with `debouncedStopModel` in its deps) then fired on every render, cancelling both pending debounces — so typing a new ctx_len updated Zustand but never reached `model.yml`, and the router kept serving the model with the stale context size. - Memoize `debouncedStopModel` so its timer survives re-renders. - Flush (not cancel) on unmount so a half-typed edit still lands when the settings sheet closes. - Bump the persist/router-restart debounce from 600ms to 1500ms so typing a multi-digit value doesn't restart the router per keystroke. * fix: skip ctx-size in router preset when auto-fit is on When `fit = on` (default), llama-server is supposed to size the context to fit available VRAM. But the preset was still emitting `ctx-size` from `model.yml` (and from the provider-level config), which overrode fit and left users with whatever stale ctx_size happened to be on disk. Gate both the [*] and per-model `ctx-size` lines on `fit === false` so auto-fit owns the sizing whenever it's enabled.
PreviousNext