Skip to content

[SOT] Add C++ compiled guard lookup#79036

Draft
SigureMo wants to merge 6 commits into
PaddlePaddle:developfrom
cattidea:sot/cpp-guard-lookup
Draft

[SOT] Add C++ compiled guard lookup#79036
SigureMo wants to merge 6 commits into
PaddlePaddle:developfrom
cattidea:sot/cpp-guard-lookup

Conversation

@SigureMo
Copy link
Copy Markdown
Member

PR Category

Performance Optimization

PR Types

Performance

Description

本 PR 为 SOT guard 增加 C++ compiled guard 与跨 guard 的 trie lookup,用于降低热启动 cache 命中链路中的 guard 开销。

主要改动:

  • 将 Python guard spec 编译为 C++ CompiledGuard,覆盖 type/id/value/length/tensor meta/layer hook 等 SOT guard 场景。
  • 新增 CompiledGuardLookup,按 guard op key 构建 trie,在多 cache entry 场景下复用公共前缀并直接返回 cache index。
  • executor cache 接入 compiled lookup;strict guard 模式下仍会运行原 guard/mirror guard 并校验 index 一致性,不提供 fallback-to-Python guard 逻辑。
  • 增加 test/sot/test_compiled_guard.py 覆盖 hit/miss、layer hook、constructor error、3 miss + 4th hit lookup 等 correctness case。
  • 增加 test/sot/benchmark_compiled_guard.py 便于本地和 nightly 对比 Paddle/Torch guard 性能。

本地验证:

  • prek
  • ninja -C build_312 python/paddle/base/libpaddle.so
  • python -m py_compile ...
  • python test/sot/test_compiled_guard.py
  • SOT_ENABLE_STRICT_GUARD_CHECK=True python test/sot/test_sot_resnet.py
  • python -m unittest test.sot.test_guard_tree test.sot.test_sot_cache
  • python test/sot/test_guard_fastpath_strategy.py
  • python test/sot/benchmark_compiled_guard.py --case resnet18 --resnet-image-size 64 --iterations 10000 --hot-iterations 20 --rounds 7 --compare-torch --multi-cache-count 4 --max-torch-guard-ratio 1.1 --max-torch-multi-lookup-ratio 1.1

ResNet guard benchmark result:

  • Paddle compiled guard only: 2.516 us/check
  • Torch Dynamo guard only: 2.448 us/check
  • Paddle/Torch single guard ratio: 1.03x
  • Paddle compiled trie lookup, 4-cache hit after 3 misses: 2.944 us/lookup
  • Torch Dynamo 4-guard lookup: 4.150 us/lookup
  • Paddle/Torch multi-guard ratio: 0.71x

是否引起精度变化

否。该改动只影响 SOT guard/cache 命中判断路径,不改变算子计算或模型数值逻辑;strict guard check 下会校验 compiled guard 与原 guard 结果一致。

Add a C++ compiled guard implementation and trie-based lookup for SOT cache entries. Wire it into the executor cache without Python guard fallback, and keep strict guard checking as a mirror path that validates compiled hits against the original guard semantics.

Add focused correctness tests plus a ResNet guard benchmark comparing Paddle compiled guard lookup with Torch Dynamo guard behavior.

Co-authored-by: Codex <noreply@openai.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 17, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

SigureMo and others added 5 commits May 17, 2026 22:35
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant