[None][perf] Remove unnecessary ToPIL() from find_mm_token_lengths#11640
Conversation
|
/bot run |
📝 WalkthroughWalkthroughType annotations are refactored across multiple model files to replace PIL Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/inputs/registry.py (1)
288-314:⚠️ Potential issue | 🟠 MajorType annotation mismatch:
videoparameter still typed asList[Image.Image].The type annotation on line 291 specifies
video: List[Image.Image], but the implementation on lines 303 and 310 uses tensor-specific attributes (.shape[-2],.shape[-1]). This inconsistency will mislead developers and cause static analysis tools to report false positives.🐛 Proposed fix
def get_num_tokens_per_video( self, *, - video: List[Image.Image], + video: List[torch.Tensor], **kwargs, ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/inputs/registry.py` around lines 288 - 314, The parameter type for get_num_tokens_per_video is incorrect: update the annotation for video in get_num_tokens_per_video from List[Image.Image] to a sequence of tensors (e.g., Sequence[torch.Tensor] or Sequence[Any] if torch isn't available) so it matches the implementation that accesses .shape; add the appropriate typing import (Sequence) and, if using PyTorch, reference torch.Tensor in the annotation and update the docstring to reflect that frames are tensors, not PIL Images; keep the function name get_num_tokens_per_video and usages unchanged.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/models/modeling_mistral.py (1)
281-285: Signature inconsistency with base classBaseMultimodalInputProcessor.The base class in
registry.py(line 270-286) defines the method with keyword-only arguments:def get_num_tokens_per_image(self, *, image: torch.Tensor, **kwargs):This implementation lacks the
*,marker and**kwargs, which could cause issues if callers pass additional keyword arguments or use the method polymorphically through the base class interface.♻️ Suggested fix to align with base class signature
- def get_num_tokens_per_image(self, image: torch.Tensor): + def get_num_tokens_per_image(self, *, image: torch.Tensor, **kwargs): h, w = image.shape[-2:] ncols, nrows = self.image_processor._image_to_num_tokens( Image.new("RGB", (w, h))) return ncols * nrows + nrows🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_mistral.py` around lines 281 - 285, The method get_num_tokens_per_image in modeling_mistral.py must match the BaseMultimodalInputProcessor signature: change its definition to include the keyword-only marker and accept arbitrary kwargs (i.e., def get_num_tokens_per_image(self, *, image: torch.Tensor, **kwargs):) so it supports calls via the base class and extra keywords; keep the current logic that computes h,w from image.shape[-2:], calls self.image_processor._image_to_num_tokens(Image.new("RGB", (w, h))) and returns ncols * nrows + nrows unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unittest/_torch/multimodal/test_find_num_image_tokens.py`:
- Around line 121-122: The test assigns image_width, image_height =
test_image.shape[-2:] but tensor.shape[-2:] yields (height, width), so swap the
assignment to image_height, image_width = test_image.shape[-2:] to match actual
dimensions; locate the usage in the test around load_image and the variables
image_width/image_height (used later in assertions) and update the assignment to
avoid misleading error messages.
---
Outside diff comments:
In `@tensorrt_llm/inputs/registry.py`:
- Around line 288-314: The parameter type for get_num_tokens_per_video is
incorrect: update the annotation for video in get_num_tokens_per_video from
List[Image.Image] to a sequence of tensors (e.g., Sequence[torch.Tensor] or
Sequence[Any] if torch isn't available) so it matches the implementation that
accesses .shape; add the appropriate typing import (Sequence) and, if using
PyTorch, reference torch.Tensor in the annotation and update the docstring to
reflect that frames are tensors, not PIL Images; keep the function name
get_num_tokens_per_video and usages unchanged.
---
Nitpick comments:
In `@tensorrt_llm/_torch/models/modeling_mistral.py`:
- Around line 281-285: The method get_num_tokens_per_image in
modeling_mistral.py must match the BaseMultimodalInputProcessor signature:
change its definition to include the keyword-only marker and accept arbitrary
kwargs (i.e., def get_num_tokens_per_image(self, *, image: torch.Tensor,
**kwargs):) so it supports calls via the base class and extra keywords; keep the
current logic that computes h,w from image.shape[-2:], calls
self.image_processor._image_to_num_tokens(Image.new("RGB", (w, h))) and returns
ncols * nrows + nrows unchanged.
|
PR_Github #36464 [ run ] triggered by Bot. Commit: |
|
PR_Github #36464 [ run ] completed with state
|
|
/bot run |
|
PR_Github #36580 [ run ] triggered by Bot. Commit: |
|
PR_Github #36580 [ run ] completed with state
|
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
26239bd to
e6a2bb9
Compare
|
/bot run |
|
PR_Github #45358 [ run ] triggered by Bot. Commit: |
|
PR_Github #45358 [ run ] completed with state
|
|
/bot run |
|
PR_Github #45403 [ run ] triggered by Bot. Commit: |
|
PR_Github #45403 [ run ] completed with state |
…VIDIA#11640) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
This PR removes ToPIL() from find_mm_token_lengths.
It takes around 5ms per request to convert Tensor to Image.Image object. By removing this, we could obtain around ~3-5% improved TTFT.
Summary by CodeRabbit
Release Notes
Refactor
Tests