Skip to content

metal: collapse consecutive same-target passes into one encoder#641

Open
benface wants to merge 1 commit into
not-fl3:masterfrom
benface:metal-merge-passes
Open

metal: collapse consecutive same-target passes into one encoder#641
benface wants to merge 1 commit into
not-fl3:masterfrom
benface:metal-merge-passes

Conversation

@benface

@benface benface commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Higher-level engines like macroquad issue one begin_pass / end_render_pass pair per draw call so they can stream batched draws to whatever the active camera's render target is. Each pair was becoming its own MTLRenderCommandEncoder, which on tile-based GPUs forces a full color-attachment store + (for MSAA) resolve on every draw — heavy memory bandwidth that visibly tanks real-device frame rate at retina resolution with Conf.sample_count > 1.

OpenGL absorbs this pattern transparently because re-binding an FBO doesn't trigger a resolve; Metal's per-encoder load/store actions make the cost explicit, and minimizing encoder count is a standard Metal best practice for exactly this reason.

This PR defers endEncoding until the next begin_pass actually requires a new encoder (different target, Clear action, or commit_frame). If the next begin_pass continues the same target with PassAction::Nothing, it keeps encoding into the existing encoder so the whole sequence collapses into one Metal pass with a single store/resolve at the real end. Each existing draw call's apply_pipeline / apply_bindings / apply_uniforms already re-sends its full state, so reusing the encoder is correctness-safe.

Tested on iPhone running iOS 27 with macroquad's per-draw-call begin/end pattern at 4× MSAA — drops the per-frame encoder count from the draw-call count down to one per render target, eliminating real-device frame lag.

@benface benface force-pushed the metal-merge-passes branch from 9cb786f to f07747a Compare June 13, 2026 15:28
@not-fl3

not-fl3 commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Uff thats a good one, happy to see IOS side of miniquad seeing some light!

Thanks for PR!

@benface benface force-pushed the metal-merge-passes branch from f07747a to fd9e269 Compare June 15, 2026 22:22
Higher-level engines (e.g. macroquad's draw-call loop) issue one
`begin_pass` / `end_render_pass` pair per draw call so they can stream
batched draws to whatever the active camera's render target is. Each
pair was becoming its own `MTLRenderCommandEncoder`, which on tile-
based GPUs forces a full color-attachment store + (for MSAA) resolve
on every draw — heavy memory bandwidth that visibly tanks real-device
frame rate at retina resolution with `Conf.sample_count > 1`.

OpenGL absorbs this pattern transparently because re-binding an FBO
doesn't trigger a resolve; Metal's per-encoder load/store actions
make the cost explicit. Apple's own guidance is to minimize encoder
count for exactly this reason.

Defer `endEncoding` until the next `begin_pass` actually requires a
new encoder (different target, `Clear` action, or `commit_frame`). If
the next `begin_pass` continues the same target with
`PassAction::Nothing`, keep encoding into the existing encoder so the
whole sequence collapses into one Metal pass with a single store /
resolve at the real end. Each existing draw call's
`apply_pipeline` / `apply_bindings` / `apply_uniforms` already
re-sends its full state, so reusing the encoder is correctness-safe.

Tested on iPhone running iOS 27 with macroquad's per-draw-call begin/
end pattern at 4x MSAA — drops the per-frame encoder count from the
draw-call count down to one per render target.
@benface benface force-pushed the metal-merge-passes branch from fd9e269 to 14a7978 Compare June 15, 2026 23:06
@benface

benface commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Force-pushed: amended the commit to also call really_end_encoder() in texture_generate_mipmaps. Hit this on iOS — MTLDebugCommandBuffer blitCommandEncoder validates "encoding in progress" if a render encoder is still open on the same command buffer, which the deferred-end mechanism introduced in this PR makes possible. The blit encoder for mipmap generation was the missing audit site. Apologies for the miss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants