Skip to content

tests : add OLMoE-sized K-quant shapes to test_mul_mat_id (ref #1506)#1518

Open
devYRPauli wants to merge 1 commit into
ggml-org:masterfrom
devYRPauli:mul-mat-id-k-quant-moe-shapes
Open

tests : add OLMoE-sized K-quant shapes to test_mul_mat_id (ref #1506)#1518
devYRPauli wants to merge 1 commit into
ggml-org:masterfrom
devYRPauli:mul-mat-id-k-quant-moe-shapes

Conversation

@devYRPauli

Copy link
Copy Markdown
Contributor

The existing test_mul_mat_id matrix tops out at m=512, k=256 with n_mats ∈ {4,8}, n_used ∈ {1,2,4}. Production MoE models use much larger shapes — OLMoE-1B-7B is 64 experts with top-8 routing (gate/up projection out=1024 in=2048, down projection out=2048 in=1024). This PR adds those shapes plus a few siblings as preventive regression coverage.

Cases added (all currently pass on master, Metal + CPU agree):

  • Reporter shape from mul_mat_id produces wrong output with K-quantized source weights #1506 (n_mats=8, n_used=2, m=2048, n=1, k=4096) for Q4_K, Q5_K, Q6_K.
  • OLMoE-1B-7B real topology (n_mats=64, n_used=8): gate/up (1024, n, 2048) and down (2048, n, 1024) for Q2_K, Q3_K, Q4_K, Q5_K, Q6_K × n ∈ {1, 32, 128}.
  • Same topology for super-block IQ-quants (IQ2_XS, IQ3_S, IQ4_NL, IQ4_XS).
  • b_transposed=true siblings at OLMoE shape for Q4_K, Q5_K, Q6_K.

Note: #1506 reports mul_mat_id corruption with K-quants at OLMoE shape, but the bug does not reproduce on current master with these tests. Verification methodology posted on the issue. Adding the shapes here as preventive coverage so any future regression at MoE-scale shapes shows up in CI.

…rg#1506)

The existing mul_mat_id matrix tops out at m=512, k=256 with
n_mats in {4,8}, n_used in {1,2,4}. Production MoE models use much
larger shapes (OLMoE-1B-7B is 64 experts, top-8 routing, with
gate/up projection out=1024 in=2048 and down projection
out=2048 in=1024). Add those shapes plus siblings as preventive
regression coverage.

Cases added (all currently pass on master, Metal + CPU agree):
* Reporter shape from ggml-org#1506 for Q4_K, Q5_K, Q6_K
* OLMoE real topology for Q2_K, Q3_K, Q4_K, Q5_K, Q6_K x n in {1,32,128}
* Super-block IQ-quants (IQ2_XS, IQ3_S, IQ4_NL, IQ4_XS)
* b_transposed=true siblings for Q4_K, Q5_K, Q6_K
@devYRPauli

Copy link
Copy Markdown
Contributor Author

Closing this one for now — I hadn't realised new contributors are asked to keep to a single open PR, so I'm trimming mine down. Happy to reopen this once I've had a PR merged. Sorry for the extra noise!

@devYRPauli devYRPauli closed this May 29, 2026
@devYRPauli devYRPauli reopened this Jun 10, 2026
@devYRPauli

Copy link
Copy Markdown
Contributor Author

Reopening now that my first PR has landed, as promised when I trimmed down to a single open PR. Thanks for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant