ggml : assert b->ne[3] == 1 in ggml_conv_transpose_2d_p0 (#1448)#1520
Open
devYRPauli wants to merge 1 commit into
Open
ggml : assert b->ne[3] == 1 in ggml_conv_transpose_2d_p0 (#1448)#1520devYRPauli wants to merge 1 commit into
devYRPauli wants to merge 1 commit into
Conversation
The CPU compute kernel for ggml_conv_transpose_2d_p0 (ggml_compute_forward_conv_transpose_2d_impl in ggml-cpu/ops.cpp) only iterates over batch index 0: * the src1 permutation loop iterates ne12 (channels), ne11 (height), ne10 (width) but never ne13 (batch); * the compute loop offsets the output by i2*nb2 (channel) only, not by i3*nb3 (batch); * the work-buffer sizing omits a *ne13 factor. The output 4D shape is computed correctly, so callers see plausible batched output where batch 0 is correct and batches 1..N-1 are silently zero. All existing callers (SAM example, test-conv-transpose.c, test-backend-ops.cpp) pass ne[3] == 1, which is why this has not been caught since the op was introduced in 8da5be2 (Aug 2023). Reject batch > 1 at the API level. Matches the precedent set by ggml_conv_transpose_1d which already enforces a->ne[3] == 1. Implementing batch support in the CPU/Metal/Vulkan kernels is a sensible follow-up if a caller needs it. Credit to @janblumenkamp for the bug analysis, repro, and historical research in ggml-org#1448.
Contributor
Author
|
Trimming down to one open PR to respect the new-contributor guideline — closing this for now and I'll resubmit after my first one lands. Thanks! |
Contributor
Author
|
Reopening now that my first PR has landed, as promised when I trimmed down to a single open PR. Thanks for your patience! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses #1448 via Option A from @janblumenkamp's analysis (assert at the API level, no kernel changes).
Bug
ggml_conv_transpose_2d_p0accepts inputs withb->ne[3] > 1and returns a tensor with the correct 4D output shape, but the CPU compute kernel (ggml_compute_forward_conv_transpose_2d_implinggml-cpu/ops.cpp) only iterates batch index 0:src1permutation loop iteratesne12, ne11, ne10but neverne13;i2*nb2(channel) only, not byi3*nb3(batch);*ne13factor.Net effect: batch 0 is correct, batches 1..N-1 are silently zero. Caught only by explicitly comparing across batch elements. The bug has been present since 8da5be2 (Aug 2023) — every existing caller (SAM example,
test-conv-transpose.c,test-backend-ops.cppparameterizations) usesne[3] == 1, so it has never surfaced in CI.Fix
Add
GGML_ASSERT(b->ne[3] == 1)toggml_conv_transpose_2d_p0. Matches the existingGGML_ASSERT(a->ne[3] == 1)precedent inggml_conv_transpose_1d. Turns silent data corruption into an immediate, debuggable abort.Why not implement batch support instead?
Option B (implementing the batch loop in CPU + Metal + Vulkan kernels) is genuinely useful but adds capability for a code path with no current caller. Option A solves the correctness problem and matches existing precedent with minimal risk. If a caller later needs batched
conv_transpose_2d_p0, removing the assert and implementing batch support across backends is a clean follow-up.Verification
ggml_conv_transpose_2d_p0silently ignores batch dimension (ne[3] > 1) #1448 body, 3-batch input with identity-like kernel) — before this change: printsbatch 1: sum_abs = 0.0 <-- BUG. After this change: aborts withGGML_ASSERT(b->ne[3] == 1) failedatggml.c:4882.test-backend-ops test(Metal + CPU) — no regressions. AllCONV_TRANSPOSE_2Dparameterizations usene[3] == 1and continue to pass.No test added
The natural regression test ("assert fires on
ne[3] > 1") requires a death-test pattern (fork + check exit on SIGABRT) sinceGGML_ASSERTaborts the process —test-backend-ops's compute-and-compare model can't host it. Happy to add a separate death-test executable if maintainers prefer; let me know.Credit to @janblumenkamp for the original bug analysis, complete reproducer, and historical research.