MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
-
Updated
Apr 18, 2026 - Python
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
Minimalist implementation of PaliGemma 2 & PaliGemma VLM from scratch
Use PaliGemma to auto-label data for use in training fine-tuned vision models.
PyTorch implementation of PaliGemma 2
Notes for the Vision Language Model implementation by Umar Jamil
PyTorch implementation of Google’s Paligemma VLM with SigLip image encoder, KV caching, Rotary embeddings and Grouped Query attention . Modular, research-friendly, and easy to extend for experimentation.
AI-powered tool to convert text from images into your desired language. Gemma vision model and multilingual model are used.
Leverage PaliGemma 2's DOCCI fine-tuned variant capabilities using LitServe.
🌟 Build a PyTorch implementation of Google's PaliGemma model for advanced vision-language tasks, including object detection and segmentation.
Fine-tuning Google PaLiGemma for specialized downstream vision-language tasks
PyTorch implementation of PaliGemma VLM from scratch — image + text understanding using SigLIP and Gemma.
Leverage PaliGemma 2 mix model variant capabilities using LitServe.
PyTorch implementation of Google's PaliGemma vision-language model with VQ-VAE decoder for processing referring expression segmentation outputs. Supports detection, segmentation, VQA, and captioning.
Add a description, image, and links to the paligemma topic page so that developers can more easily learn about it.
To associate your repository with the paligemma topic, visit your repo's landing page and select "manage topics."