📐 Transcribe handwritten math into accurate LaTeX using a modular Vision-Language Model fine-tuning pipeline for efficient training on consumer GPUs.
-
Updated
Mar 28, 2026 - Jupyter Notebook
📐 Transcribe handwritten math into accurate LaTeX using a modular Vision-Language Model fine-tuning pipeline for efficient training on consumer GPUs.
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
🌟 Build a PyTorch implementation of Google's PaliGemma model for advanced vision-language tasks, including object detection and segmentation.
🌳 Run multiple isolated Claude Code instances in Docker containers, ensuring automatic branch management and full development environments for simultaneous tasks.
🔍 Explore GEMM: a C/C++ library for efficient matrix multiplication using OpenMP, designed for parallel computing learners and practitioners.
👁️ Deploy YOLO11 for efficient computer vision on edge devices, optimized for the Horizon X5 RDK with a streamlined C++ codebase.
Foundation-Models chat app tutorial for iOS with on-device LLMs, tools, and chat. Shows on-device inference with FoundationModels and calendar tool use. 🐙
A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.
A collection of guides and examples for the Gemma open models from Google.
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
A collection of guides and examples for the Gemma open models from Google.
Fine-tuning Google PaLiGemma for specialized downstream vision-language tasks
A production-ready, modular fine-tuning pipeline for converting handwritten mathematical expressions into LaTeX using Google's PaliGemma 3B and QLoRA.
PyTorch implementation of Google's PaliGemma vision-language model with VQ-VAE decoder for processing referring expression segmentation outputs. Supports detection, segmentation, VQA, and captioning.
vision language models finetuning notebooks & use cases (Medgemma - paligemma - florence .....)
PyTorch implementation of Google’s Paligemma VLM with SigLip image encoder, KV caching, Rotary embeddings and Grouped Query attention . Modular, research-friendly, and easy to extend for experimentation.
This repository contains the project of the lecture DI725
This repository contains code for fine-tuning Google's PaliGemma vision-language model on the Flickr8k dataset for image captioning tasks
Add a description, image, and links to the paligemma topic page so that developers can more easily learn about it.
To associate your repository with the paligemma topic, visit your repo's landing page and select "manage topics."