Skip to content

[None][perf] Remove unnecessary ToPIL() from find_mm_token_lengths#11640

Merged
2ez4bz merged 2 commits into
NVIDIA:mainfrom
yechank-nvidia:mm_token_len
Apr 24, 2026
Merged

[None][perf] Remove unnecessary ToPIL() from find_mm_token_lengths#11640
2ez4bz merged 2 commits into
NVIDIA:mainfrom
yechank-nvidia:mm_token_len

Conversation

@yechank-nvidia

@yechank-nvidia yechank-nvidia commented Feb 23, 2026

Copy link
Copy Markdown
Collaborator

This PR removes ToPIL() from find_mm_token_lengths.
It takes around 5ms per request to convert Tensor to Image.Image object. By removing this, we could obtain around ~3-5% improved TTFT.

Summary by CodeRabbit

Release Notes

  • Refactor

    • Updated image and video input processing to use torch.Tensor instead of PIL Image objects.
    • Modified method signatures in multimodal input processors to accept tensor-based inputs.
    • Removed automatic PIL image conversion in multimodal length computation.
  • Tests

    • Updated image token count tests to align with tensor-based input handling.

@yechank-nvidia yechank-nvidia requested review from a team as code owners February 23, 2026 06:45
@yechank-nvidia yechank-nvidia changed the title [perf][fix] Remove unnecessary ToPIL() from find_mm_token_lengths [None][perf] Remove unnecessary ToPIL() from find_mm_token_lengths Feb 23, 2026
@yechank-nvidia yechank-nvidia self-assigned this Feb 23, 2026
@yechank-nvidia yechank-nvidia added the Multimodal Label for issues & PRs regarding Multimodal related objects label Feb 23, 2026
@yechank-nvidia

Copy link
Copy Markdown
Collaborator Author

/bot run

@coderabbitai

coderabbitai Bot commented Feb 23, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Type annotations are refactored across multiple model files to replace PIL Image.Image parameters with torch.Tensor parameters in image and video processing methods. Dimension extraction logic is updated to derive height and width from tensor shapes instead of PIL image attributes. Supporting framework code and tests are updated accordingly.

Changes

Cohort / File(s) Summary
Model implementations
tensorrt_llm/_torch/models/modeling_mistral.py, tensorrt_llm/_torch/models/modeling_nemotron_nano.py, tensorrt_llm/_torch/models/modeling_phi4mm.py
Method signatures updated to accept torch.Tensor instead of PIL image types in get_num_tokens_per_image and get_num_tokens_per_video methods. Dimension extraction changed from PIL attributes to tensor .shape[-2:]. PIL import removed from phi4mm.py.
Input processing framework
tensorrt_llm/inputs/registry.py, tensorrt_llm/inputs/multimodal.py
Base class and utility code updated to work with tensor-based inputs. Removed PIL conversion logic in multimodal.py token counting. Updated dimension extraction to use tensor shapes in registry.py BaseMultimodalInputProcessor.
Tests
tests/unittest/_torch/multimodal/test_find_num_image_tokens.py
Updated test data loading to use tensor format and extract dimensions from tensor shape instead of PIL image size attributes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The pull request description is incomplete and does not follow the required template structure. Complete the description section with a clear explanation of what and why, provide test coverage details, and complete the PR Checklist.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main optimization: removing unnecessary ToPIL() conversion from find_mm_token_lengths for performance improvement.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/inputs/registry.py (1)

288-314: ⚠️ Potential issue | 🟠 Major

Type annotation mismatch: video parameter still typed as List[Image.Image].

The type annotation on line 291 specifies video: List[Image.Image], but the implementation on lines 303 and 310 uses tensor-specific attributes (.shape[-2], .shape[-1]). This inconsistency will mislead developers and cause static analysis tools to report false positives.

🐛 Proposed fix
     def get_num_tokens_per_video(
         self,
         *,
-        video: List[Image.Image],
+        video: List[torch.Tensor],
         **kwargs,
     ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/inputs/registry.py` around lines 288 - 314, The parameter type
for get_num_tokens_per_video is incorrect: update the annotation for video in
get_num_tokens_per_video from List[Image.Image] to a sequence of tensors (e.g.,
Sequence[torch.Tensor] or Sequence[Any] if torch isn't available) so it matches
the implementation that accesses .shape; add the appropriate typing import
(Sequence) and, if using PyTorch, reference torch.Tensor in the annotation and
update the docstring to reflect that frames are tensors, not PIL Images; keep
the function name get_num_tokens_per_video and usages unchanged.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/models/modeling_mistral.py (1)

281-285: Signature inconsistency with base class BaseMultimodalInputProcessor.

The base class in registry.py (line 270-286) defines the method with keyword-only arguments:

def get_num_tokens_per_image(self, *, image: torch.Tensor, **kwargs):

This implementation lacks the *, marker and **kwargs, which could cause issues if callers pass additional keyword arguments or use the method polymorphically through the base class interface.

♻️ Suggested fix to align with base class signature
-    def get_num_tokens_per_image(self, image: torch.Tensor):
+    def get_num_tokens_per_image(self, *, image: torch.Tensor, **kwargs):
         h, w = image.shape[-2:]
         ncols, nrows = self.image_processor._image_to_num_tokens(
             Image.new("RGB", (w, h)))
         return ncols * nrows + nrows
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_mistral.py` around lines 281 - 285, The
method get_num_tokens_per_image in modeling_mistral.py must match the
BaseMultimodalInputProcessor signature: change its definition to include the
keyword-only marker and accept arbitrary kwargs (i.e., def
get_num_tokens_per_image(self, *, image: torch.Tensor, **kwargs):) so it
supports calls via the base class and extra keywords; keep the current logic
that computes h,w from image.shape[-2:], calls
self.image_processor._image_to_num_tokens(Image.new("RGB", (w, h))) and returns
ncols * nrows + nrows unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unittest/_torch/multimodal/test_find_num_image_tokens.py`:
- Around line 121-122: The test assigns image_width, image_height =
test_image.shape[-2:] but tensor.shape[-2:] yields (height, width), so swap the
assignment to image_height, image_width = test_image.shape[-2:] to match actual
dimensions; locate the usage in the test around load_image and the variables
image_width/image_height (used later in assertions) and update the assignment to
avoid misleading error messages.

---

Outside diff comments:
In `@tensorrt_llm/inputs/registry.py`:
- Around line 288-314: The parameter type for get_num_tokens_per_video is
incorrect: update the annotation for video in get_num_tokens_per_video from
List[Image.Image] to a sequence of tensors (e.g., Sequence[torch.Tensor] or
Sequence[Any] if torch isn't available) so it matches the implementation that
accesses .shape; add the appropriate typing import (Sequence) and, if using
PyTorch, reference torch.Tensor in the annotation and update the docstring to
reflect that frames are tensors, not PIL Images; keep the function name
get_num_tokens_per_video and usages unchanged.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/modeling_mistral.py`:
- Around line 281-285: The method get_num_tokens_per_image in
modeling_mistral.py must match the BaseMultimodalInputProcessor signature:
change its definition to include the keyword-only marker and accept arbitrary
kwargs (i.e., def get_num_tokens_per_image(self, *, image: torch.Tensor,
**kwargs):) so it supports calls via the base class and extra keywords; keep the
current logic that computes h,w from image.shape[-2:], calls
self.image_processor._image_to_num_tokens(Image.new("RGB", (w, h))) and returns
ncols * nrows + nrows unchanged.

Comment thread tests/unittest/_torch/multimodal/test_find_num_image_tokens.py Outdated
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #36464 [ run ] triggered by Bot. Commit: 26239bd Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #36464 [ run ] completed with state SUCCESS. Commit: 26239bd
/LLM/main/L0_MergeRequest_PR pipeline #28209 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yechank-nvidia

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #36580 [ run ] triggered by Bot. Commit: 26239bd Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #36580 [ run ] completed with state SUCCESS. Commit: 26239bd
/LLM/main/L0_MergeRequest_PR pipeline #28310 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
@yechank-nvidia yechank-nvidia requested a review from a team as a code owner April 24, 2026 06:50
@yechank-nvidia yechank-nvidia requested a review from moraxu April 24, 2026 06:50
@yechank-nvidia

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #45358 [ run ] triggered by Bot. Commit: e6a2bb9 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #45358 [ run ] completed with state SUCCESS. Commit: e6a2bb9
/LLM/main/L0_MergeRequest_PR pipeline #35605 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yechank-nvidia

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #45403 [ run ] triggered by Bot. Commit: e6a2bb9 Link to invocation

@2ez4bz 2ez4bz enabled auto-merge (squash) April 24, 2026 17:21
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #45403 [ run ] completed with state SUCCESS. Commit: e6a2bb9
/LLM/main/L0_MergeRequest_PR pipeline #35640 completed with status: 'SUCCESS'

CI Report

Link to invocation

@2ez4bz 2ez4bz merged commit a2a7d0b into NVIDIA:main Apr 24, 2026
5 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026
…VIDIA#11640)

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Multimodal Label for issues & PRs regarding Multimodal related objects

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants