Skip to content

[None][fix] pool-qualify KV cache transfer pending keys#15272

Open
chienchunhung wants to merge 2 commits into
NVIDIA:mainfrom
chienchunhung:codex/kv-cache-transfer-pool-key
Open

[None][fix] pool-qualify KV cache transfer pending keys#15272
chienchunhung wants to merge 2 commits into
NVIDIA:mainfrom
chienchunhung:codex/kv-cache-transfer-pool-key

Conversation

@chienchunhung

@chienchunhung chienchunhung commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

This draft fixes pending KV cache transfer tracking so primary and secondary pools no longer collide when they use the same dense memory-pool slot index.

KVCacheIndex::get() strips the secondary-pool bit, but the transfer manager used that dense value as the key for pending read/write CUDA events. That aliases physical locations such as primary slot 0 and secondary slot 0. The change adds a pool-qualified pending-transfer key and routes onboard/offload read/write tracking through it.

Test Coverage

Added a deterministic regression test in kvCacheManagerTest that schedules:

  • primary slot 0 -> secondary slot 1
  • secondary slot 0 -> primary slot 1

The test verifies the transfer manager keeps two pending reads and two pending writes, one per physical (pool, slot) location. This would collapse to one read/write entry with dense-only keys.

Validation

  • git diff --check
  • pre-commit on commit, with clang-format passing

Not yet run: GPU C++ gtest. Suggested command:

./cpp/build/tests/unit_tests/batch_manager/kvCacheManagerTest --gtest_filter=KVCacheManagerTest.KVCacheTransferManager*

Pre-commit note: skipped waive list check, validate-test-lists, and type-check on the successful commit because scripts/check_test_list.py failed in this local hook environment with TypeError: unsupported operand type(s) for |: 'type' and 'NoneType', unrelated to this C++ change.

Summary by CodeRabbit

  • Refactor

    • Improved internal synchronization logic for KV cache transfer operations.
  • Tests

    • Added test infrastructure and unit test for KV cache transfer management, including validation of pending transfer tracking across primary and secondary memory slots.

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@chienchunhung chienchunhung force-pushed the codex/kv-cache-transfer-pool-key branch from f8f1a27 to 1a84767 Compare June 11, 2026 19:33
@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #53720 [ run ] triggered by Bot. Commit: 1a84767 Link to invocation

Comment thread cpp/tensorrt_llm/batch_manager/kvCacheTransferManager.cpp Outdated
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@chienchunhung chienchunhung force-pushed the codex/kv-cache-transfer-pool-key branch from 1a84767 to d88df99 Compare June 12, 2026 03:09
@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@chienchunhung chienchunhung marked this pull request as ready for review June 12, 2026 03:11
@chienchunhung chienchunhung requested a review from a team as a code owner June 12, 2026 03:11
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e01b2ca0-f690-471a-b5d6-1848d80a574a

📥 Commits

Reviewing files that changed from the base of the PR and between 82ddf75 and d88df99.

📒 Files selected for processing (3)
  • cpp/include/tensorrt_llm/batch_manager/kvCacheTransferManager.h
  • cpp/tensorrt_llm/batch_manager/kvCacheTransferManager.cpp
  • cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp

📝 Walkthrough

Walkthrough

KVCacheTransferManager refactors pending transfer tracking to use pool-qualified memory indices instead of raw block indices. Changes include a new helper method, updated synchronization logic in onboard/offload operations, a test access class, and validation test covering primary and secondary slot distinction.

Changes

Pool-Qualified Pending Transfer Tracking

Layer / File(s) Summary
Header contract and test access interface
cpp/include/tensorrt_llm/batch_manager/kvCacheTransferManager.h
Forward-declares tensorrt_llm::testing::KVCacheTransferManagerTestAccess, declares it as friend, introduces private static getPendingTransferIndex() signature, and clarifies that pool-qualified memory pool index identifies raw blocks for I/O.
Pending transfer index computation
cpp/tensorrt_llm/batch_manager/kvCacheTransferManager.cpp
Implements getPendingTransferIndex() helper that computes pool-qualified keys from block memory-pool indices and encodes secondary-pool blocks with a flag.
Onboard/offload synchronization updates
cpp/tensorrt_llm/batch_manager/kvCacheTransferManager.cpp
Updates onboard() and offload() methods to use pending-transfer indices instead of raw block indices for mPendingReads/mPendingWrites map operations, CUDA event recording, and synchronization waits. Corrects comments describing event-keying rationale and fixes synchronization comment typos.
Test access infrastructure and pending transfer validation
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
Adds KVCacheTransferManagerTestAccess helper class with query methods for pending read/write state. Introduces test validating that pending transfers correctly distinguish primary and secondary slots through offload/onboard sequences with synchronization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is fully related to the main change: it directly describes the core fix of pool-qualifying KV cache transfer pending keys to resolve slot index collisions.
Description check ✅ Passed The PR description covers all required sections: a clear summary of the problem and solution, test coverage details, and validation steps. All major template sections are addressed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #53767 [ run ] triggered by Bot. Commit: d88df99 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #53720 [ run ] completed with state ABORTED. Commit: 1a84767

Link to invocation

@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #53792 [ run ] triggered by Bot. Commit: d88df99 Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants