tests : add OLMoE-sized K-quant shapes to test_mul_mat_id (ref #1506) by devYRPauli · Pull Request #1518 · ggml-org/ggml

devYRPauli · 2026-05-28T18:31:18Z

The existing test_mul_mat_id matrix tops out at m=512, k=256 with n_mats ∈ {4,8}, n_used ∈ {1,2,4}. Production MoE models use much larger shapes — OLMoE-1B-7B is 64 experts with top-8 routing (gate/up projection out=1024 in=2048, down projection out=2048 in=1024). This PR adds those shapes plus a few siblings as preventive regression coverage.

Cases added (all currently pass on master, Metal + CPU agree):

Reporter shape from mul_mat_id produces wrong output with K-quantized source weights #1506 (n_mats=8, n_used=2, m=2048, n=1, k=4096) for Q4_K, Q5_K, Q6_K.
OLMoE-1B-7B real topology (n_mats=64, n_used=8): gate/up (1024, n, 2048) and down (2048, n, 1024) for Q2_K, Q3_K, Q4_K, Q5_K, Q6_K × n ∈ {1, 32, 128}.
Same topology for super-block IQ-quants (IQ2_XS, IQ3_S, IQ4_NL, IQ4_XS).
b_transposed=true siblings at OLMoE shape for Q4_K, Q5_K, Q6_K.

Note: #1506 reports mul_mat_id corruption with K-quants at OLMoE shape, but the bug does not reproduce on current master with these tests. Verification methodology posted on the issue. Adding the shapes here as preventive coverage so any future regression at MoE-scale shapes shows up in CI.

…rg#1506) The existing mul_mat_id matrix tops out at m=512, k=256 with n_mats in {4,8}, n_used in {1,2,4}. Production MoE models use much larger shapes (OLMoE-1B-7B is 64 experts, top-8 routing, with gate/up projection out=1024 in=2048 and down projection out=2048 in=1024). Add those shapes plus siblings as preventive regression coverage. Cases added (all currently pass on master, Metal + CPU agree): * Reporter shape from ggml-org#1506 for Q4_K, Q5_K, Q6_K * OLMoE real topology for Q2_K, Q3_K, Q4_K, Q5_K, Q6_K x n in {1,32,128} * Super-block IQ-quants (IQ2_XS, IQ3_S, IQ4_NL, IQ4_XS) * b_transposed=true siblings for Q4_K, Q5_K, Q6_K

devYRPauli · 2026-05-29T18:14:38Z

Closing this one for now — I hadn't realised new contributors are asked to keep to a single open PR, so I'm trimming mine down. Happy to reopen this once I've had a PR merged. Sorry for the extra noise!

devYRPauli · 2026-06-11T00:07:42Z

Reopening now that my first PR has landed, as promised when I trimmed down to a single open PR. Thanks for your patience!

devYRPauli mentioned this pull request May 28, 2026

mul_mat_id produces wrong output with K-quantized source weights #1506

Closed

devYRPauli closed this May 29, 2026

devYRPauli reopened this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests : add OLMoE-sized K-quant shapes to test_mul_mat_id (ref #1506)#1518

tests : add OLMoE-sized K-quant shapes to test_mul_mat_id (ref #1506)#1518
devYRPauli wants to merge 1 commit into
ggml-org:masterfrom
devYRPauli:mul-mat-id-k-quant-moe-shapes

devYRPauli commented May 28, 2026

Uh oh!

devYRPauli commented May 29, 2026

Uh oh!

devYRPauli commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devYRPauli commented May 28, 2026

Uh oh!

devYRPauli commented May 29, 2026

Uh oh!

devYRPauli commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant