metal: collapse consecutive same-target passes into one encoder#641
Open
benface wants to merge 1 commit into
Open
metal: collapse consecutive same-target passes into one encoder#641benface wants to merge 1 commit into
benface wants to merge 1 commit into
Conversation
9cb786f to
f07747a
Compare
Owner
|
Uff thats a good one, happy to see IOS side of miniquad seeing some light! Thanks for PR! |
f07747a to
fd9e269
Compare
Higher-level engines (e.g. macroquad's draw-call loop) issue one `begin_pass` / `end_render_pass` pair per draw call so they can stream batched draws to whatever the active camera's render target is. Each pair was becoming its own `MTLRenderCommandEncoder`, which on tile- based GPUs forces a full color-attachment store + (for MSAA) resolve on every draw — heavy memory bandwidth that visibly tanks real-device frame rate at retina resolution with `Conf.sample_count > 1`. OpenGL absorbs this pattern transparently because re-binding an FBO doesn't trigger a resolve; Metal's per-encoder load/store actions make the cost explicit. Apple's own guidance is to minimize encoder count for exactly this reason. Defer `endEncoding` until the next `begin_pass` actually requires a new encoder (different target, `Clear` action, or `commit_frame`). If the next `begin_pass` continues the same target with `PassAction::Nothing`, keep encoding into the existing encoder so the whole sequence collapses into one Metal pass with a single store / resolve at the real end. Each existing draw call's `apply_pipeline` / `apply_bindings` / `apply_uniforms` already re-sends its full state, so reusing the encoder is correctness-safe. Tested on iPhone running iOS 27 with macroquad's per-draw-call begin/ end pattern at 4x MSAA — drops the per-frame encoder count from the draw-call count down to one per render target.
fd9e269 to
14a7978
Compare
Contributor
Author
|
Force-pushed: amended the commit to also call |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Higher-level engines like macroquad issue one
begin_pass/end_render_passpair per draw call so they can stream batched draws to whatever the active camera's render target is. Each pair was becoming its ownMTLRenderCommandEncoder, which on tile-based GPUs forces a full color-attachment store + (for MSAA) resolve on every draw — heavy memory bandwidth that visibly tanks real-device frame rate at retina resolution withConf.sample_count > 1.OpenGL absorbs this pattern transparently because re-binding an FBO doesn't trigger a resolve; Metal's per-encoder load/store actions make the cost explicit, and minimizing encoder count is a standard Metal best practice for exactly this reason.
This PR defers
endEncodinguntil the nextbegin_passactually requires a new encoder (different target,Clearaction, orcommit_frame). If the nextbegin_passcontinues the same target withPassAction::Nothing, it keeps encoding into the existing encoder so the whole sequence collapses into one Metal pass with a single store/resolve at the real end. Each existing draw call'sapply_pipeline/apply_bindings/apply_uniformsalready re-sends its full state, so reusing the encoder is correctness-safe.Tested on iPhone running iOS 27 with macroquad's per-draw-call begin/end pattern at 4× MSAA — drops the per-frame encoder count from the draw-call count down to one per render target, eliminating real-device frame lag.