docs(gguf): extend Encoding slot to support percentage-mixed recipes by mishig25 · Pull Request #1489 · ggml-org/ggml

mishig25 · 2026-05-13T09:43:24Z

Extends the <Encoding> slot to allow a multi-component recipe form for files with a genuinely mixed byte distribution (e.g. asymmetric MoE quants where routed experts use a different ggml type than attention projections). Single-token Encodings are unchanged.

New form

DeepSeek-V4-Flash-256x8.4B-v1.0-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                Encoding: ~55% IQ2_XXS, ~34% Q2_K, ~7% Q8_0, ~3% F16

<pct> is 1–3 digits, components listed descending by byte share.
Only components above ~2% should appear; the sum need not equal 100.

Regex change

-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+)
+(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+(?:-\d{1,3}[A-Z][\w_]*)*)

Verified backward-compatible against every existing example in the doc — Encoding/Type/Shard captures remain identical.

Originating use case

huggingface.co/antirez/deepseek-v4-gguf — DeepSeek V4 Flash shipped with routed experts at IQ2_XXS/Q2_K and the rest at Q8_0/F16. No single LLAMA_FTYPE_MOSTLY_* value captures this split honestly. Parser implementation in huggingface.js#2170.

For files where the byte distribution is genuinely mixed — common in asymmetric MoE quantization where routed experts use a different ggml type than attention projections — a single LLAMA_FTYPE_MOSTLY_* label in the Encoding slot can be misleading (it captures the majority by bytes but says nothing about the rest of the recipe). Extends the Encoding slot grammar to allow a hyphen-joined sequence of `<pct><quant>` tokens listed in descending order of byte share, e.g. `55IQ2_XXS-34Q2_K-07Q8_0-03F16`. Only components above ~2% should appear, so the sum need not equal 100. Single-token Encodings remain unchanged, so all existing filenames are still valid under the new regex. Originating use case: `huggingface.co/antirez/deepseek-v4-gguf` (DeepSeek V4 Flash shipped with routed experts at IQ2_XXS/Q2_K and attention projections + shared experts at Q8_0). Changes: - Encoding section gains a paragraph describing the recipe form, the ordering convention (descending by byte share), and the threshold (only components >~2% should appear). - New example added showing a full filename in spec-compliant form. - Validator regex (in both the prose and the Node.js block) extended to accept the multi-component form. Verified backward-compat against every existing example in the doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mishig25 · 2026-05-13T09:53:18Z

Verified the regex locally against the full example set in the doc — all parse correctly:

const ggufRegex = /^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+(?:-\d{1,3}[A-Z][\w_]*)*))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$/;

✓ Mixtral-8x7B-v0.1-KQ2.gguf
    BaseName = Mixtral | SizeLabel = 8x7B | Version = v0.1
    Encoding = KQ2 | Type = undefined | Shard = undefined
✓ Hermes-2-Pro-Llama-3-8B-v1.0-F16.gguf
    BaseName = Hermes-2-Pro-Llama-3 | SizeLabel = 8B | Version = v1.0
    Encoding = F16 | Type = undefined | Shard = undefined
✓ Grok-100B-v1.0-Q4_0-00003-of-00009.gguf
    BaseName = Grok | SizeLabel = 100B | Version = v1.0
    Encoding = Q4_0 | Type = undefined | Shard = 00003-of-00009
✓ Qwen3-27B-v1.0-Q4_K_M-MTP.gguf
    BaseName = Qwen3 | SizeLabel = 27B | Version = v1.0
    Encoding = Q4_K_M | Type = MTP | Shard = undefined
✓ DeepSeek-V4-Flash-256x8.4B-v1.0-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf
    BaseName = DeepSeek-V4-Flash | SizeLabel = 256x8.4B | Version = v1.0
    Encoding = 55IQ2_XXS-34Q2_K-07Q8_0-03F16 | Type = undefined | Shard = undefined
✓ DeepSeek-V4-Flash-256x8.4B-v1.0-95Q4_K-04Q8_0.gguf
    BaseName = DeepSeek-V4-Flash | SizeLabel = 256x8.4B | Version = v1.0
    Encoding = 95Q4_K-04Q8_0 | Type = undefined | Shard = undefined

Notes on what the trailing (?:-\d{1,3}[A-Z][\w_]*)* alternation does and doesn't absorb:

Single-token Encodings (KQ2, F16, Q4_0, Q4_K_M) — the trailing pattern matches 0 times, Encoding ends at the same boundary as before.
Type-slot collisions (-MTP after Encoding) — MTP has no leading digit, so the trailing pattern can't absorb it; Type slot still captures cleanly.
Shard-slot collisions (-00003-of-00009 after Encoding) — \d{1,3} would match up to 3 digits of 00003, but the next char is another 0, not [A-Z]. Backtracks through \d{1,3} and fails. Shard slot still captures cleanly.

One pre-existing inconsistency I noticed unrelated to this PR: the doc lists Hermes-2-Pro-Llama-3-8B-F16.gguf (without -v1.0-) but the regex requires the Version slot, so it doesn't actually match that filename. I tested with the explicit-version form. Happy to fix the example in a follow-up if useful.

mishig25 force-pushed the mishig/gguf-naming-percentage-mix branch from 4dd2b87 to 2dc1573 Compare May 13, 2026 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(gguf): extend Encoding slot to support percentage-mixed recipes#1489

docs(gguf): extend Encoding slot to support percentage-mixed recipes#1489
mishig25 wants to merge 1 commit into
ggml-org:masterfrom
mishig25:mishig/gguf-naming-percentage-mix

mishig25 commented May 13, 2026 •

edited

Loading

Uh oh!

mishig25 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mishig25 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New form

Regex change

Originating use case

Uh oh!

mishig25 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mishig25 commented May 13, 2026 •

edited

Loading