Skip to content

Tags: unslothai/unsloth

Tags

v0.1.45-beta

Toggle v0.1.45-beta's commit message
Unsloth v0.1.45-beta. PyPI release 2026.6.2.

v0.1.44-beta

Toggle v0.1.44-beta's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix macOS Apple Silicon installs resolving torch against x86_64 (#5976)

* Fix macOS Apple Silicon installs that resolve torch against x86_64

On Apple Silicon, `uv venv --python 3.13` can reuse a cached x86_64
(Rosetta) CPython, often because uv itself is an x86_64 build. The
resulting venv reports macosx_*_x86_64 to the wheel resolver, but PyTorch
has shipped no macOS x86_64 wheels since 2.2.2, so the torch install fails
with "no wheels with a matching platform tag (macosx_..._x86_64)".

Two changes, both scoped to macOS arm64 and additive (no other install
path is affected):

- Create the venv with an arch-explicit `cpython-X.Y-macos-aarch64-none`
  request on Apple Silicon (no --python override), so uv cannot fall back
  to a cached x86_64 interpreter.
- Harden the existing x86_64 venv guard: when the venv python cannot be
  executed (x86_64 binary on a Mac without Rosetta), the platform.machine()
  probe returns empty and the recreate was silently skipped. Fall back to
  reading the binary's Mach-O arch via lipo/file so migrated or
  pre-existing x86_64 venvs are still recreated as arm64.

* Harden arm64 static-arch fallback: file -L and set -e safety

Address review feedback on the lipo/file fallback:
- uv symlinks the venv's bin/python to the base interpreter; plain `file`
  reports the symlink ("symbolic link to ...") and the arch substring never
  matches. Use `file -L` to dereference (lipo already follows the link).
- Append `|| true` so the command substitution cannot abort the installer
  under set -e on a Mac that has neither lipo nor file.

---------

Co-authored-by: danielhanchen <michaelhan2050@gmail.com>

v0.1.431-beta

Toggle v0.1.431-beta's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Studio: clearer MCP server validation when stdio is disabled (#5928)

Improve the rejection message when an MCP server address is not an http(s) URL. It now points to the expected http(s):// form with an example, and only mentions that local commands are disabled when the value contains whitespace (a reliable command signal), since a lone token may just be a scheme-less URL. Wording is host-scoped rather than desktop-only because self-hosted hosts can opt in via an env var. Backend only, with tests; accepted input is unchanged.

v0.1.43-beta

Toggle v0.1.43-beta's commit message
test(install): cover the Apple Silicon venv arch rebuild guard

Extracts the real guard block from install.sh and asserts: clean arm64 venv
untouched, x86_64 venv rebuilt as arm64, the x86_64-then-3.13.8 corner case,
arm64 3.13.8 downgrade preserved, --python skip, and Intel/Rosetta no-op.

Co-authored-by: Ramakrishna Bachu <ramankrishna10@gmail.com>

v0.1.405-beta

Toggle v0.1.405-beta's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
studio: engage draft-mtp on vision MTP GGUFs (drop incorrect vision g…

…ate) (#5560)

* studio: engage draft-mtp on vision MTP GGUFs

The draft-mtp auto-promotion in LlamaCppBackend.load_model was gated on
not effective_is_vision, and the spec-emit branch repeated the same
guard. Every Unsloth -MTP GGUF repo ships an mmproj projector, so
effective_is_vision was always True for those repos and the MTP speedup
silently never engaged out of the box.

llama.cpp #22673 explicitly states MTP is compatible with vision input.
The bundled b9204 server happily loads both: a manual run with
--mmproj ... --spec-type draft-mtp --spec-draft-n-max 6 logs
"loaded multimodal model" followed by
"adding speculative implementation 'draft-mtp'".

Drop the vision gate from both sites and rewrite the matching short
circuit in _already_in_target_state so reload checks reach the auto
promotion path on vision MTP loads. Add three regression tests covering
vision MTP match (auto and default), and non MTP vision repo unaffected.

Verified on a B200 with unsloth/Qwen3.6-35B-A3B-MTP-GGUF:UD-Q4_K_XL:
base decode 179.7 t/s vs MTP decode 253.8 t/s, draft acceptance 0.57,
1.41x speedup on a 255 token completion. mmproj still loads and image
input remains available.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: prefer Qwen3.5 -MTP GGUF variants in default model lists

With the vision gate dropped in the previous commit, draft-mtp now
auto-engages on -MTP GGUF repos out of the box. Swap the four Qwen3.5
recommended entries in DEFAULT_MODELS_GGUF and DEFAULT_MODELS_STANDARD
to their -MTP-GGUF counterparts so new users get the speedup by default:

  unsloth/Qwen3.5-4B-GGUF        -> unsloth/Qwen3.5-4B-MTP-GGUF
  unsloth/Qwen3.5-9B-GGUF        -> unsloth/Qwen3.5-9B-MTP-GGUF
  unsloth/Qwen3.5-35B-A3B-GGUF   -> unsloth/Qwen3.5-35B-A3B-MTP-GGUF
  unsloth/Qwen3.5-0.8B-GGUF      -> unsloth/Qwen3.5-0.8B-MTP-GGUF

All four HF repos exist (HEAD 200) and ship the same UD-Q4_K_XL quant
layout as the non-MTP variants. Non-Qwen3.5 entries are untouched.

* bump version to 2026.5.4

Picks up the studio MTP vision-gate fix and the Qwen3.5 -MTP default
swap in this PR.

* studio: prefer Qwen3.6-35B-A3B-MTP-GGUF in default model lists

Same rationale as the previous Qwen3.5 swap. The Qwen3.6 MTP variant
exists at unsloth/Qwen3.6-35B-A3B-MTP-GGUF (HF HEAD 200) and now
auto-engages draft-mtp out of the box with the gate fix.

* studio: drop --spec-draft-n-max from 6 to 3 for draft-mtp

n=6 is too greedy: on Qwen3.6 the draft has to guess 6 tokens ahead
and acceptance crashes to ~0.45, leaving only ~14% throughput gain.

PR ggml-org/llama.cpp#22673's author benched n=3 at ~0.72 acceptance
and 2 to 3x speedup on the same Qwen3.6 family, and the README sample
command uses n=2 or n=3. Match that.

CPU/Mac branch already uses n=3, so this aligns both paths.

* studio: set --spec-draft-n-max back to 6 for draft-mtp on GPU

Reverts the n=3 tuning. n=6 is the original default; user-side comparisons
hold the larger draft window steady so the toggle (next commit) is the
primary on/off lever.

* studio: add Speculative Decoding toggle under Max Tokens

Adds a top-level kill switch (panel-switch under Max Tokens, mirroring
Auto-Healing Tool Calls) that forces the /load request's
speculative_type to "off" when disabled. The backend "off" branch in
LlamaCppBackend.load_model skips both the draft-mtp auto-promotion and
the spec-emit branch, so neither --spec-type draft-mtp nor
--spec-default reaches llama-server.

Wiring:

- chat-runtime-store: new speculativeDecodingEnabled bool, default
  true, persisted to localStorage under unsloth_speculative_decoding,
  plus a setSpeculativeDecodingEnabled setter.
- chat-settings-sheet: SpeculativeDecodingToggle rendered immediately
  beneath the Max Tokens slider for non-external models.
- use-chat-model-runtime: when speculativeDecodingEnabled is false,
  override speculative_type to "off" in the loadModel call so the
  switch wins over any pre-existing speculativeType state (including
  the existing per-model toggle in Model Settings).

Verified end to end on unsloth/Qwen3.6-35B-A3B-MTP-GGUF:UD-Q4_K_XL:
toggle ON emits --spec-type draft-mtp --spec-draft-n-max 6; toggle
OFF emits zero --spec-* flags on the same MTP GGUF.

* studio: relocate Speculative Decoding toggle into Model Settings

Move the toggle out from under Max Tokens and back into the Model
Settings section, directly beneath KV Cache Dtype, where the existing
Apply/Reset workflow already drives a reload on dirty. This way flipping
the switch in the UI actually picks up: the section becomes dirty,
Apply re-runs /load with the new speculative_type.

Drop the !currentModelIsMultimodal gate so vision MTP GGUFs can also
disable speculative decoding from the UI.

Switch the toggle's off-value from null to "off" so the backend's "off"
short-circuit fires for MTP models too (null normalises to None which
re-triggers the draft-mtp auto-promotion).

Tooltip now reads "Faster generation with 0% accuracy hit".

Remove the now-redundant speculativeDecodingEnabled bool + setter from
the runtime store and the load-time override in use-chat-model-runtime;
the toggle binds directly to speculativeType.

* studio: restore OOM/TIGHT badge on recommended GGUF rows

The recommended-list row passed vramStatus=null for any GGUF repo
because the existing useRecommendedModelVram hook reads safetensors
totals from HF model info, which GGUF-only repos do not expose. As a
result, an OOM Q-quant repo would render with only a "GGUF" badge and
no visual signal that nothing in it fits.

Add useGgufRecommendedFit: per repo, fetch the variant list via the
existing /api/models/gguf-variants endpoint, take the smallest
variant's size_bytes, and classify with the same 0.7*GPU + 0.7*RAM
thresholds as GgufVariantExpander. Session-scoped cache + in-flight
dedup so a repo is requested at most once.

Wire the result into the three GGUF row sites in pickers.tsx so OOM
and TIGHT badges show on the collapsed cards.

* Revert "studio: restore OOM/TIGHT badge on recommended GGUF rows"

This reverts commit 07793b1240df72b13e51d6dc15f63c4ee8c6cba9.

The new useGgufRecommendedFit hook was treating the symptom. PR #5561
identified the real root cause: useGpuInfo was calling /api/system
with plain fetch instead of authFetch, so the session-auth check
failed silently and gpu.available stayed false everywhere. With no
GPU info, every fit check (variant expander, recommended carousel)
fell back to "no signal" and dropped the OOM/TIGHT badges.

Reverting the over-engineered hook and applying the authFetch fix
in the next commit, which restores the existing badges with one line.

* chore: replace qwen suggested with MTP variant

* fix: restore GPU info auth for GGUF fit badges

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: imagineer99 <samleejackson0@gmail.com>

v0.1.40-beta

Toggle v0.1.40-beta's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
studio/chat: release stuck IME flag when compositionend never fires (#…

…5551)

* studio/chat: release stuck IME flag when compositionend never fires

Chrome on Windows talking to a WSL-hosted Studio (issue #5546) fires
compositionstart + compositionupdate but no compositionend after the
IME commits. The earlier hardening in #5327 cleared the stale flag on
the next non-composing input event, which never arrives in this
sequence, so composingRef stays true forever and the Send button stays
disabled even though the committed CJK text is already in the textarea.

Add a watchdog in both useImeComposerInputHandlers (main + edit
composer) and SharedComposer (compare mode) that runs the same reset
the missing compositionend would have done. The timer is rearmed on
every compositionupdate and on every non-composing input so it only
fires when the IME pipeline has actually gone quiet — normal candidate
selection keeps it alive, the WSL stuck case lets it expire.

Extends the existing IME Playwright smoke with a stuck-compositionend
repro and adds a static guard so the watchdog can't be removed without
the regression tests catching it.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio/chat: re-pin composing flag on IME keydown to close #5546 watchdog gap

The stuck-compositionend watchdog (PR #5551) releases composingRef after
2500 ms of IME silence so Send unwedges in the WSL+Chrome case. The same
release also fires during a long candidate-window pause in healthy IMEs,
which lets a subsequent IME-confirm Enter slip preedit text through
handleSubmit (main composer) or click-Send through send() (compare composer).

Add a keydown gate to both composers: when the browser still reports
nativeEvent.isComposing or keyCode 229, re-pin composingRef and cancel
any pending watchdog so the next form-submit / send() guard refuses.
The Send button stays visually enabled (avoids re-introducing the
stuck-UI bug) but the submit path is blocked until a real compositionend
or non-composing input arrives. Mirrors the existing isComposing guard
shape in shared-composer.onKeyDown.

Tests:
- tests/studio/test_composer_rtl_bidi_attribute.py: two new static
  guards asserting the keydown gate wiring in both composer files.
- tests/studio/playwright_chat_ime_i18n.py: new section 6c repro that
  fires the IME-confirm keydown after the watchdog has cleared, then
  triggers form.requestSubmit() and asserts the preedit text is not
  cleared (would indicate a leaked submit).

Verified across Chromium / Firefox / WebKit via a side-by-side pre-PR
vs post-PR simulation (54 scenarios, zero pageerror or console.error).
The #5546 stuck-end repro still passes (Send re-enables 2.5-3 s after
the silent commit) and the new keydown-repin probe confirms the submit
gate refuses on all three engines.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio/chat: re-arm IME watchdog after keydown re-pin (Codex P1)

The keydown re-pin added in 2c3c979 closed the watchdog-race for
healthy IMEs, but on the same WSL+Chrome no-compositionend path this
PR targets it would re-lock Send permanently: setting composingRef=true
and only *clearing* the watchdog leaves the flag pinned forever if no
follow-up compositionend or non-composing input ever arrives.

Swap clearStuckTimer/clearStuckImeTimer for refreshStuckTimer/
refreshStuckImeTimer in both composer keydown gates so the watchdog
fires once more after every IME keypress. Same visual contract — Send
stays enabled — the submit gate just keeps a 2.5s window before
re-releasing instead of staying locked.

Extends the playwright IME smoke with section 6d: clears composing via
the watchdog, fires an IME keydown, then waits past the re-armed
watchdog window and asserts the form submit actually flushes the
textarea. Two new static guards in test_composer_rtl_bidi_attribute
lock the refresh call into both keydown handlers.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>