docs(gguf): extend Encoding slot to support percentage-mixed recipes#1489
Draft
mishig25 wants to merge 1 commit into
Draft
docs(gguf): extend Encoding slot to support percentage-mixed recipes#1489mishig25 wants to merge 1 commit into
mishig25 wants to merge 1 commit into
Conversation
For files where the byte distribution is genuinely mixed — common in asymmetric MoE quantization where routed experts use a different ggml type than attention projections — a single LLAMA_FTYPE_MOSTLY_* label in the Encoding slot can be misleading (it captures the majority by bytes but says nothing about the rest of the recipe). Extends the Encoding slot grammar to allow a hyphen-joined sequence of `<pct><quant>` tokens listed in descending order of byte share, e.g. `55IQ2_XXS-34Q2_K-07Q8_0-03F16`. Only components above ~2% should appear, so the sum need not equal 100. Single-token Encodings remain unchanged, so all existing filenames are still valid under the new regex. Originating use case: `huggingface.co/antirez/deepseek-v4-gguf` (DeepSeek V4 Flash shipped with routed experts at IQ2_XXS/Q2_K and attention projections + shared experts at Q8_0). Changes: - Encoding section gains a paragraph describing the recipe form, the ordering convention (descending by byte share), and the threshold (only components >~2% should appear). - New example added showing a full filename in spec-compliant form. - Validator regex (in both the prose and the Node.js block) extended to accept the multi-component form. Verified backward-compat against every existing example in the doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4dd2b87 to
2dc1573
Compare
Contributor
Author
|
Verified the regex locally against the full example set in the doc — all parse correctly: const ggufRegex = /^(?<BaseName>[A-Za-z0-9\s]*(?:(?:-(?:(?:[A-Za-z\s][A-Za-z0-9\s]*)|(?:[0-9\s]*)))*))-(?:(?<SizeLabel>(?:\d+x)?(?:\d+\.)?\d+[A-Za-z](?:-[A-Za-z]+(\d+\.)?\d+[A-Za-z]+)?)(?:-(?<FineTune>[A-Za-z0-9\s-]+))?)?-(?:(?<Version>v\d+(?:\.\d+)*))(?:-(?<Encoding>(?!LoRA|vocab|MTP)[\w_]+(?:-\d{1,3}[A-Z][\w_]*)*))?(?:-(?<Type>LoRA|vocab|MTP))?(?:-(?<Shard>\d{5}-of-\d{5}))?\.gguf$/;Notes on what the trailing
One pre-existing inconsistency I noticed unrelated to this PR: the doc lists |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extends the
<Encoding>slot to allow a multi-component recipe form for files with a genuinely mixed byte distribution (e.g. asymmetric MoE quants where routed experts use a different ggml type than attention projections). Single-token Encodings are unchanged.New form
<pct>is 1–3 digits, components listed descending by byte share.Regex change
Verified backward-compatible against every existing example in the doc — Encoding/Type/Shard captures remain identical.
Originating use case
huggingface.co/antirez/deepseek-v4-gguf— DeepSeek V4 Flash shipped with routed experts at IQ2_XXS/Q2_K and the rest at Q8_0/F16. No singleLLAMA_FTYPE_MOSTLY_*value captures this split honestly. Parser implementation inhuggingface.js#2170.