[None][feature] Add thinking token budget control#14665
Conversation
Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com>
Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com>
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughThis PR adds per-request thinking token budget enforcement to TensorRT-LLM. A new ChangesThinking Budget Token Limiting
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/unittest/llmapi/test_sampling_params.py (2)
58-63: ⚡ Quick winAdd explicit return annotations to updated test/helper functions.
Please add
-> Noneto the changed test functions and a concrete return annotation forFakeTokenizer.encodeto align with repo typing standards.Suggested patch
-def test_completion_logprobs_assignment_revalidates(): +def test_completion_logprobs_assignment_revalidates() -> None: @@ -def test_thinking_token_budget_unlimited_values(value): +def test_thinking_token_budget_unlimited_values(value: int | None) -> None: @@ -def test_thinking_token_budget_rejects_invalid_values(value): +def test_thinking_token_budget_rejects_invalid_values(value: object) -> None: @@ -def test_chat_thinking_token_budget_request_validation(): +def test_chat_thinking_token_budget_request_validation() -> None: @@ -def test_thinking_budget_logits_processor_forces_end_token(): +def test_thinking_budget_logits_processor_forces_end_token() -> None: @@ -def test_thinking_budget_logits_processor_completes_partial_end_sequence(): +def test_thinking_budget_logits_processor_completes_partial_end_sequence() -> None: @@ -def test_thinking_budget_logits_processor_ignores_partial_end_before_budget(): +def test_thinking_budget_logits_processor_ignores_partial_end_before_budget() -> None: @@ -def test_thinking_budget_logits_processor_ignores_closed_reasoning_block(): +def test_thinking_budget_logits_processor_ignores_closed_reasoning_block() -> None: @@ -def test_add_thinking_budget_logits_processor_uses_reasoning_parser_tokens(): +def test_add_thinking_budget_logits_processor_uses_reasoning_parser_tokens() -> None: class FakeTokenizer: - def encode(self, text, add_special_tokens=False): + def encode(self, text: str, add_special_tokens: bool = False) -> list[int]: del add_special_tokensAs per coding guidelines, “Static type checking with mypy is opt-in by submodule; always annotate functions with return types (use
Noneif function does not return)”.Also applies to: 65-169
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/llmapi/test_sampling_params.py` around lines 58 - 63, Add explicit return type annotations to the modified test and helper functions: append "-> None" to any test functions in the updated block (the tests around request.to_sampling_params checks) and to other test helpers in the 65-169 region, and add a concrete return annotation to FakeTokenizer.encode (e.g., -> List[int] or -> Sequence[int] consistent with project typing) so all functions have explicit return types per the repo's mypy guidance; update imports if needed for typing names.
65-169: QA list updates are unnecessary for this PR scope.These are unittest/API-stability changes only, so no
tests/integration/test_lists/qa/*update is required in this PR.As per coding guidelines, “If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/llmapi/test_sampling_params.py` around lines 65 - 169, The PR touches only unit tests and API-stability tests (e.g., test_thinking_token_budget_unlimited_values, test_thinking_budget_logits_processor_*, SamplingParams, add_thinking_budget_logits_processor), so explicitly state in the PR description (or commit message) that QA list updates are unnecessary for this change; add a single-line note in the PR body like “QA list updates unnecessary: changes limited to unittest/API-stability” and optionally add a one-line comment at the top of the modified test module to record that scope for future reviewers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unittest/llmapi/test_sampling_params.py`:
- Around line 58-63: Add explicit return type annotations to the modified test
and helper functions: append "-> None" to any test functions in the updated
block (the tests around request.to_sampling_params checks) and to other test
helpers in the 65-169 region, and add a concrete return annotation to
FakeTokenizer.encode (e.g., -> List[int] or -> Sequence[int] consistent with
project typing) so all functions have explicit return types per the repo's mypy
guidance; update imports if needed for typing names.
- Around line 65-169: The PR touches only unit tests and API-stability tests
(e.g., test_thinking_token_budget_unlimited_values,
test_thinking_budget_logits_processor_*, SamplingParams,
add_thinking_budget_logits_processor), so explicitly state in the PR description
(or commit message) that QA list updates are unnecessary for this change; add a
single-line note in the PR body like “QA list updates unnecessary: changes
limited to unittest/API-stability” and optionally add a one-line comment at the
top of the modified test module to record that scope for future reviewers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 25fd3653-5f10-4ea2-ae12-a2f2d4e96fb2
📒 Files selected for processing (9)
tensorrt_llm/llmapi/__init__.pytensorrt_llm/llmapi/llm.pytensorrt_llm/llmapi/thinking_budget.pytensorrt_llm/sampling_params.pytensorrt_llm/serve/openai_protocol.pytensorrt_llm/serve/openai_server.pytensorrt_llm/serve/responses_utils.pytests/unittest/api_stability/references/sampling_params.yamltests/unittest/llmapi/test_sampling_params.py
|
PR_Github #50686 [ run ] triggered by Bot. Commit: |
nv-guomingz
left a comment
There was a problem hiding this comment.
LGTM for llm api part
|
PR_Github #50686 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #50777 [ run ] triggered by Bot. Commit: |
Signed-off-by: tijyojwad <1127155+tijyojwad@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #50815 [ run ] triggered by Bot. Commit: |
|
PR_Github #50777 [ run ] completed with state |
|
PR_Github #50815 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #50920 [ run ] triggered by Bot. Commit: |
|
PR_Github #50920 [ run ] completed with state |
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.