#

vision-language

Here are 228 public repositories matching this topic...

MahshadSa / radiology-clip-mini

Mini CLIP-style CXR↔report retrieval with prompt ablation, token-length check, and interpretability

medical-imaging radiology vision-language contrastive-learning foundation-models iu-xray

Updated Dec 2, 2025
Python

yaekobB / multimodal-image-captioning

Fine-tuned BLIP model on Flickr8k for multimodal image captioning (vision + language).

nlp computer-vision deep-learning transformers pytorch image-captioning lora fine-tuning huggingface vision-language flickr8k multimodal-ai blip-model

Updated Sep 1, 2025
Jupyter Notebook

yc-cui / LLaRS

Multi-modal remote sensing image restoration and fusion foundation model with language prompting.

computer-vision remote-sensing satellite-imagery sar super-resolution image-restoration image-fusion multimodal pansharpening vision-language cloud-removal foundation-model spatiotemporal-fusion

Updated Apr 8, 2026
Python

navaneet625 / RealTimeVQACaptioning

A real-time image captioning and visual question answering (VQA) system. This project uses computer vision and NLP to generate descriptive captions for images and answer user questions about them.

Updated Nov 26, 2025
Python

svn05 / vietnamese-image-captioning

Vietnamese image captioning pipeline: BLIP + CLIP + NLLB. Gradio demo with BLEU/METEOR evaluation.

vietnamese image-captioning multi-modal clip gradio blip vision-language nllb

Updated Feb 24, 2026
Python

Soheil-jafari / XCLIP_Baseline

Resource-aware X-CLIP baseline on Cholec80: FP16 + grad-accum training and evaluation for surgical video –text localization

pytorch clip xclip multimodel vision-language medical-ai contrastive-learning moment-retrieval video-text temporal-localization

Updated Feb 1, 2026
Python

janhadl / tuw-master-thesis

This repository hosts the code for Jan Hadl's Master Thesis at TU Wien: GS-VQA, a zero-shot VQA pipeline that uses VLMs for visual perception and ASP for symbolic reasoning.

machine-learning zero-shot visual-question-answering vision-language

Updated Feb 23, 2024
Jupyter Notebook

zchoi / PKOL

[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”

pytorch pytorch-implementation video-retrieval vision-language video-question-answering

Updated Jan 27, 2024
Python

maliha-usui / cross-lingual-clip-memes

Cross-lingual evaluation of CLIP on Japanese vs English memes — revealing 7.4% performance gap and sarcasm detection failure

python nlp japanese memes english clip sarcasm cross-lingual explainable-ai multimodal vision-language

Updated Feb 14, 2026
Jupyter Notebook

plxmert

phiyodr / plxmert

PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".

naacl transformers vision-and-language pre-training vision-language lxmert naacl2022 unibwm

Updated Jul 20, 2022
Python

mwasifanwar / VisionBrain

Real-time AI that sees, understands, and talks about what it sees - like a visual brain.

python opencv ai realtime transformers python3 innovation artificial-intelligence machinelearning deeplearning computervision huggingface vision-language ai-assistant vision-language-model multimodal-ai

Updated Oct 31, 2025
Python

miccunifi / SpectralGCD

[ICLR 2026] - Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery

cd gcd clip spectral-analysis vlm spectral-methods multimodal vision-language spectral-filtering vision-language-model generalized-category-discovery category-discovery spectralgcd

Updated Mar 18, 2026
Python

truefrontier-ai / Monolith

Mamba for Vision, Perception and Action

model mamba vision-language llm llm-inference

Updated Dec 21, 2023

pritamqu / HALVA

[ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination

vision-language multimodal-large-language-models hallucination-mitigation

Updated Jan 27, 2025
Python

zjr2000 / REVERIE

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

dataset rationale vision-language visual-instruction-tuning multimodal-large-language-models

Updated Jul 17, 2024
Python

egeozsoy / ORacle

Official code of the paper ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling accepted at MICCAI 2024.

video deep-learning knowledge sds scene-graph scene-graph-generation vision-language llm large-language-model vision-language-model

Updated Jan 6, 2025
Python

ZhngQ1 / MLLM-as-a-Judge

Benchmark for evaluating MLLMs as judges of vision-task outputs across intrinsic and tool-mediated settings

benchmark computer-vision multimodal vision-language mllm mllm-evaluation

Updated Mar 10, 2026
Jupyter Notebook

rd20karim / MB-ORES

MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing

computer-vision artificial-intelligence transformer remote-sensing object-detection representation-learning reasoning state-of-the-art multimodal mahcine-learning visual-grounding vision-language state-of-the-art-models referring-expression-comprehension

Updated Apr 1, 2025

fork123aniket / Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Streamlit App Combining Vision, Language, and Audio AI Models

conversational-interface conversational-ai multimodal-learning multimodal multimodal-deep-learning multimodal-data conversational-agent conversational-bot vision-language vision-language-transformer generative-ai vision-language-model vision-language-navigation multimodal-large-language-models vision-language-learning vision-language-models internvl internvl2

Updated Jan 27, 2025
Python

eshoyuan / TrackGPT

TrackGPT: Track What You Need in Videos via Text Prompts

vision-language video-object-tracking segment-anything

Updated May 16, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the vision-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language topic, visit your repo's landing page and select "manage topics."