tests : add OLMoE-sized K-quant shapes to test_mul_mat_id (ref #1506)#1518
Open
devYRPauli wants to merge 1 commit into
Open
tests : add OLMoE-sized K-quant shapes to test_mul_mat_id (ref #1506)#1518devYRPauli wants to merge 1 commit into
devYRPauli wants to merge 1 commit into
Conversation
…rg#1506) The existing mul_mat_id matrix tops out at m=512, k=256 with n_mats in {4,8}, n_used in {1,2,4}. Production MoE models use much larger shapes (OLMoE-1B-7B is 64 experts, top-8 routing, with gate/up projection out=1024 in=2048 and down projection out=2048 in=1024). Add those shapes plus siblings as preventive regression coverage. Cases added (all currently pass on master, Metal + CPU agree): * Reporter shape from ggml-org#1506 for Q4_K, Q5_K, Q6_K * OLMoE real topology for Q2_K, Q3_K, Q4_K, Q5_K, Q6_K x n in {1,32,128} * Super-block IQ-quants (IQ2_XS, IQ3_S, IQ4_NL, IQ4_XS) * b_transposed=true siblings for Q4_K, Q5_K, Q6_K
Contributor
Author
|
Closing this one for now — I hadn't realised new contributors are asked to keep to a single open PR, so I'm trimming mine down. Happy to reopen this once I've had a PR merged. Sorry for the extra noise! |
Contributor
Author
|
Reopening now that my first PR has landed, as promised when I trimmed down to a single open PR. Thanks for your patience! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The existing
test_mul_mat_idmatrix tops out atm=512, k=256withn_mats ∈ {4,8}, n_used ∈ {1,2,4}. Production MoE models use much larger shapes — OLMoE-1B-7B is 64 experts with top-8 routing (gate/upprojectionout=1024 in=2048,downprojectionout=2048 in=1024). This PR adds those shapes plus a few siblings as preventive regression coverage.Cases added (all currently pass on master, Metal + CPU agree):
n_mats=8, n_used=2, m=2048, n=1, k=4096) forQ4_K, Q5_K, Q6_K.n_mats=64, n_used=8):gate/up (1024, n, 2048)anddown (2048, n, 1024)forQ2_K, Q3_K, Q4_K, Q5_K, Q6_K×n ∈ {1, 32, 128}.IQ2_XS, IQ3_S, IQ4_NL, IQ4_XS).b_transposed=truesiblings at OLMoE shape forQ4_K, Q5_K, Q6_K.Note: #1506 reports
mul_mat_idcorruption with K-quants at OLMoE shape, but the bug does not reproduce on current master with these tests. Verification methodology posted on the issue. Adding the shapes here as preventive coverage so any future regression at MoE-scale shapes shows up in CI.