Optimisation: Extend the Lut generation to LUT4 elements #2458

WoutLegiest · 2025-12-09T15:34:01Z

We can make the LUT CGGI pipeline more efficient by using 4 elements LUTs instead of 3 elements. Since each LUT only produces a single bit output, and the whole program only consits of 4-to-1 LUT, we cannot overflow a 4b ciphertext.
In the case we use TFHE-rs as backend, this will leads to a more efficient use of PBS (need to check the impact on the Jaxite backend).

For now, would we like to dual support of lut3 and lut4 elements? For respectively, the jaxite and TFHE-rs backends?

j2kun · 2025-12-09T18:52:08Z

Lut4 by default sounds great, but please make the lut width configurable on the yosys optimizer sub-pipeline via pass option. Maybe it makes sense to just extend the mode pass option to split LUT into LUT3 and LUT4?

IIUC if we can limit the yosys optimizer pipeline to generate luts of a given size by configuration, then the rest of the pipeline doesn't need any special configuration and can just lower whatever luts it sees/supports at that stage. I.e., you don't need to touch the jaxite emitter and just let it fail on lut4, and secret-to-cggi can have both lut3 and lut4 patterns added without conflict.

I'm not too worried about having stuff in HEIR that jaxite_bool doesn't support (though adding support in jaxite_bool should be easy). But I know there are other groups (e.g., Cornami) that use the CGGI pipeline and may have limited support for certain LUT sizes.

j2kun · 2025-12-09T20:39:14Z

Also, a long time ago when I was working on lut support in the yosys optimizer, I did some tests to see what was the best tradeoff of lut size vs runtime, and I recall lut5/lut6 being a net negative because you needed too large crypto parameters to support them, and the incremental reduction in circuit size didn't outweigh the added cost of the PBS. But lut3/lut4 were basically the same: lut4 took longer per PBS, but had a smaller circuit, and for the circuits I tried the runtime was similar.

At the time, my tfhe-rs CPU experiments put 3-bit PBS at 15.8ms, up to lut6 at 145ms, while circuit size reduction looked like this (for one example circuit), so 10x runtime for < 0.5x circuit reduction was pretty bad. I don't seem to have the exact lut4 numbers recorded from that experiment, though.

j2kun · 2025-12-10T03:58:34Z

lib/Transforms/YosysOptimizer/yosys/techmap_lut4.v

I'd recommend grepping the entire repo (including the .github/workflows) for instances of techmap.v to ensure that this new techmap file is included in all places.

For example, it's included in the release workflows like https://github.com/google/heir/blob/nightly/.github/workflows/nightly.yml

I've refactored the whole repo. Now, by default the LUT3 optimisation will be chosen.

Working 4lut impl Updated tests Update Emitter Update Emitter Working mode Lut3, Lut4 selector Correct generation of LWE Types

WoutLegiest · 2025-12-22T10:26:31Z

Ooh, very cool to see the past research! Is just thought about the ability to run on FPGAs, and would be a 'nice to have' as an appendix somewhere to write about the FPGA performance of the current CGGI pipelines (and mostly to show how much improvement we can do in the future).

j2kun reviewed Dec 10, 2025

View reviewed changes

WoutLegiest force-pushed the lut4 branch 3 times, most recently from b3aadc6 to f8d3a8a Compare December 19, 2025 13:54

Add LUT4 def to Yosys + CGGI Dialect

2e14a2e

Working 4lut impl Updated tests Update Emitter Update Emitter Working mode Lut3, Lut4 selector Correct generation of LWE Types

WoutLegiest force-pushed the lut4 branch from f8d3a8a to 2e14a2e Compare December 22, 2025 10:21

WoutLegiest marked this pull request as ready for review December 22, 2025 10:26

WoutLegiest mentioned this pull request Dec 22, 2025

Small update to the TfheRustEmitter to use Array #2495

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimisation: Extend the Lut generation to LUT4 elements #2458

Optimisation: Extend the Lut generation to LUT4 elements #2458

Uh oh!

WoutLegiest commented Dec 9, 2025

Uh oh!

j2kun commented Dec 9, 2025

Uh oh!

j2kun commented Dec 9, 2025

Uh oh!

j2kun Dec 10, 2025

Uh oh!

WoutLegiest Dec 22, 2025

Uh oh!

WoutLegiest commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimisation: Extend the Lut generation to LUT4 elements #2458

Are you sure you want to change the base?

Optimisation: Extend the Lut generation to LUT4 elements #2458

Uh oh!

Conversation

WoutLegiest commented Dec 9, 2025

Uh oh!

j2kun commented Dec 9, 2025

Uh oh!

j2kun commented Dec 9, 2025

Uh oh!

j2kun Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

WoutLegiest Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

WoutLegiest commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants