Skip to content

Conversation

@rattus128
Copy link
Contributor

@rattus128 rattus128 commented Nov 27, 2025

In flux2 these weights are huge (200MB). As plain_tensor is a throw-away deep copy, do this multiplication in-place to save VRAM.

This will at least improve (if not fully fix): #10891

Example test conditions:
RTX5090
--reserve-vram 8.9 (to emulate 3090 VRAM ceiling)

Before (peak is 23.3GB):

Screenshot from 2025-11-27 21-18-10

After (peak is 23.0GB):

Screenshot from 2025-11-27 21-20-05

In flux2 these weights are huge (200MB). As plain_tensor is a throw-away
deep copy, do this multiplication in-place to save VRAM.
@comfyanonymous comfyanonymous merged commit 3f382a4 into comfyanonymous:master Nov 27, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants