Skip to content

Add test and docs for multimodal tool responses#5448

Merged
qgallouedec merged 4 commits into
mainfrom
multimodal-tool-responses-doc-and-test
Apr 6, 2026
Merged

Add test and docs for multimodal tool responses#5448
qgallouedec merged 4 commits into
mainfrom
multimodal-tool-responses-doc-and-test

Conversation

@qgallouedec

@qgallouedec qgallouedec commented Apr 3, 2026

Copy link
Copy Markdown
Member

Note

Low Risk
Low risk: changes are limited to documentation and an additional unit test covering image-returning tool outputs; no production training logic is modified.

Overview
Adds documentation for multimodal tool outputs in GRPO agent training, showing that tools can return a list of {type: image/text} blocks and that images are injected into subsequent VLM turns.

Adds a new vision-gated GRPO trainer test (test_training_with_tools_multimodal_response) that mocks generation to trigger tool calls and verifies multimodal tool responses (PIL image + text) train successfully and log expected tool call/failure frequencies.

Reviewed by Cursor Bugbot for commit c40f4f5. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sergiopaniego sergiopaniego left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!!

@qgallouedec qgallouedec merged commit 5c22894 into main Apr 6, 2026
14 checks passed
@qgallouedec qgallouedec deleted the multimodal-tool-responses-doc-and-test branch April 6, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants