Skip to content

Conversation

@sxy-trans-n
Copy link
Collaborator

Performance & Bug Fix: ModelPool Optimization and VLM Detection Improvements

Changes

  • Performance: Unified VLM detection logic to eliminate redundant getContainer() calls
  • Bug Fix: Added Gemma DWQ special handling - quantized models lose vision capabilities and should be treated as LLM
  • Feature: Added extraEOSTokens support for QAT tokenization bugs (e.g., <end_of_turn> in Gemma models)
  • Feature: Add Gemma 3n - Text Only (LM) support
  • Optimization: Cached model type detection to avoid repeated registry lookups

Testing

  • ✅ All 44 unit tests pass
  • ✅ Release build completes successfully (107.26s)
  • ✅ CI validation completed

Impact

  • Faster model loading through reduced redundant operations
  • Correct model type detection for quantized variants
  • Better tokenization handling for models with EOS token bugs

Copy link
Collaborator

@syh-trans-n syh-trans-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@sxy-trans-n sxy-trans-n merged commit 2b76c3a into main Jul 24, 2025
2 checks passed
@sxy-trans-n sxy-trans-n deleted the modelpool-vlm-detection-optimization branch July 24, 2025 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants