Skip to content

docs(gguf): extend Encoding slot to support percentage-mixed recipes#1489

Draft
mishig25 wants to merge 1 commit into
ggml-org:masterfrom
mishig25:mishig/gguf-naming-percentage-mix
Draft

docs(gguf): extend Encoding slot to support percentage-mixed recipes#1489
mishig25 wants to merge 1 commit into
ggml-org:masterfrom
mishig25:mishig/gguf-naming-percentage-mix

Conversation

@mishig25

@mishig25 mishig25 commented May 13, 2026

Copy link
Copy Markdown
Contributor

Extends the <Encoding> slot to allow a multi-component recipe form for files with a genuinely mixed byte distribution (e.g. asymmetric MoE quants where routed experts use a different ggml type than attention projections). Single-token Encodings are unchanged.

New form

DeepSeek-V4-Flash-256x8.4B-v1.0-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                Encoding: ~55% IQ2_XXS, ~34% Q2_K, ~7% Q8_0, ~3% F16
  • <pct> is 1–3 digits, components listed descending by byte share.
  • Only components above ~2% should appear; the sum need not equal 100.

Regex change

-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+)
+(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+(?:-\d{1,3}[A-Z][\w_]*)*)

Verified backward-compatible against every existing example in the doc — Encoding/Type/Shard captures remain identical.

Originating use case

huggingface.co/antirez/deepseek-v4-gguf — DeepSeek V4 Flash shipped with routed experts at IQ2_XXS/Q2_K and the rest at Q8_0/F16. No single LLAMA_FTYPE_MOSTLY_* value captures this split honestly. Parser implementation in huggingface.js#2170.

For files where the byte distribution is genuinely mixed — common in
asymmetric MoE quantization where routed experts use a different ggml
type than attention projections — a single LLAMA_FTYPE_MOSTLY_* label
in the Encoding slot can be misleading (it captures the majority by
bytes but says nothing about the rest of the recipe).

Extends the Encoding slot grammar to allow a hyphen-joined sequence
of `<pct><quant>` tokens listed in descending order of byte share,
e.g. `55IQ2_XXS-34Q2_K-07Q8_0-03F16`. Only components above ~2%
should appear, so the sum need not equal 100. Single-token Encodings
remain unchanged, so all existing filenames are still valid under
the new regex.

Originating use case: `huggingface.co/antirez/deepseek-v4-gguf`
(DeepSeek V4 Flash shipped with routed experts at IQ2_XXS/Q2_K and
attention projections + shared experts at Q8_0).

Changes:
- Encoding section gains a paragraph describing the recipe form, the
  ordering convention (descending by byte share), and the threshold
  (only components >~2% should appear).
- New example added showing a full filename in spec-compliant form.
- Validator regex (in both the prose and the Node.js block) extended
  to accept the multi-component form. Verified backward-compat
  against every existing example in the doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mishig25 mishig25 force-pushed the mishig/gguf-naming-percentage-mix branch from 4dd2b87 to 2dc1573 Compare May 13, 2026 09:48
@mishig25

Copy link
Copy Markdown
Contributor Author

Verified the regex locally against the full example set in the doc — all parse correctly:

const ggufRegex = /^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+(?:-\d{1,3}[A-Z][\w_]*)*))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$/;
✓ Mixtral-8x7B-v0.1-KQ2.gguf
    BaseName = Mixtral | SizeLabel = 8x7B | Version = v0.1
    Encoding = KQ2 | Type = undefined | Shard = undefined
✓ Hermes-2-Pro-Llama-3-8B-v1.0-F16.gguf
    BaseName = Hermes-2-Pro-Llama-3 | SizeLabel = 8B | Version = v1.0
    Encoding = F16 | Type = undefined | Shard = undefined
✓ Grok-100B-v1.0-Q4_0-00003-of-00009.gguf
    BaseName = Grok | SizeLabel = 100B | Version = v1.0
    Encoding = Q4_0 | Type = undefined | Shard = 00003-of-00009
✓ Qwen3-27B-v1.0-Q4_K_M-MTP.gguf
    BaseName = Qwen3 | SizeLabel = 27B | Version = v1.0
    Encoding = Q4_K_M | Type = MTP | Shard = undefined
✓ DeepSeek-V4-Flash-256x8.4B-v1.0-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf
    BaseName = DeepSeek-V4-Flash | SizeLabel = 256x8.4B | Version = v1.0
    Encoding = 55IQ2_XXS-34Q2_K-07Q8_0-03F16 | Type = undefined | Shard = undefined
✓ DeepSeek-V4-Flash-256x8.4B-v1.0-95Q4_K-04Q8_0.gguf
    BaseName = DeepSeek-V4-Flash | SizeLabel = 256x8.4B | Version = v1.0
    Encoding = 95Q4_K-04Q8_0 | Type = undefined | Shard = undefined

Notes on what the trailing (?:-\d{1,3}[A-Z][\w_]*)* alternation does and doesn't absorb:

  • Single-token Encodings (KQ2, F16, Q4_0, Q4_K_M) — the trailing pattern matches 0 times, Encoding ends at the same boundary as before.
  • Type-slot collisions (-MTP after Encoding) — MTP has no leading digit, so the trailing pattern can't absorb it; Type slot still captures cleanly.
  • Shard-slot collisions (-00003-of-00009 after Encoding) — \d{1,3} would match up to 3 digits of 00003, but the next char is another 0, not [A-Z]. Backtracks through \d{1,3} and fails. Shard slot still captures cleanly.

One pre-existing inconsistency I noticed unrelated to this PR: the doc lists Hermes-2-Pro-Llama-3-8B-F16.gguf (without -v1.0-) but the regex requires the Version slot, so it doesn't actually match that filename. I tested with the explicit-version form. Happy to fix the example in a follow-up if useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant