quant ops: Dequantize weight in-place (reduce flux2 VRAM usage) #10935

rattus128 · 2025-11-27T11:24:57Z

In flux2 these weights are huge (200MB). As plain_tensor is a throw-away deep copy, do this multiplication in-place to save VRAM.

This will at least improve (if not fully fix): #10891

Example test conditions:
RTX5090
--reserve-vram 8.9 (to emulate 3090 VRAM ceiling)

Before (peak is 23.3GB):

After (peak is 23.0GB):

In flux2 these weights are huge (200MB). As plain_tensor is a throw-away deep copy, do this multiplication in-place to save VRAM.

quant ops: Dequantize weight in-place

d114ea0

In flux2 these weights are huge (200MB). As plain_tensor is a throw-away deep copy, do this multiplication in-place to save VRAM.

rattus128 requested a review from Kosinkadink as a code owner November 27, 2025 11:24

rattus128 mentioned this pull request Nov 27, 2025

Flux2: torch.OutOfMemoryError: Allocation on device #10891

Open

1 task

comfyanonymous merged commit 3f382a4 into comfyanonymous:master Nov 27, 2025
10 checks passed

komikndr mentioned this pull request Dec 17, 2025

RuntimeError: "mul_cuda" not implemented for 'Float8_e4m3fn' komikndr/raylight#45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

quant ops: Dequantize weight in-place (reduce flux2 VRAM usage) #10935

quant ops: Dequantize weight in-place (reduce flux2 VRAM usage) #10935

Uh oh!

rattus128 commented Nov 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

quant ops: Dequantize weight in-place (reduce flux2 VRAM usage) #10935

quant ops: Dequantize weight in-place (reduce flux2 VRAM usage) #10935

Uh oh!

Conversation

rattus128 commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rattus128 commented Nov 27, 2025 •

edited

Loading