gpt-4v

Here are 27 public repositories matching this topic...

juanchito22-cpu / InternVL-U

Provide unified multimodal models for understanding, reasoning, generation, and editing across text and visual data.

mysql android testing programming-language gui typescript compiler libgdx gpt v semantic-segmentation video-classification vision-language-model gpt-4v vit-22b gpt-4o

Updated Apr 10, 2026
Python

Deathfrosthacker / Accessibility-Text-Enhancer

Star

✨ Enhance web accessibility in real-time with this browser extension that empowers users to identify and fix common issues for a better online experience.

css php wordpress-plugin machine-learning front-end ocr mobile-app wcag openai ui-ux hacktoberfest statamic-addon fediverse tensorflow-lite alt-text fediverse-bot voice-to-text-transcription gpt-4v lmstudio

Updated Apr 10, 2026
JavaScript

open-compass / VLMEvalKit

Star

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated Apr 10, 2026
Python

android-com-pl / wp-ai-alt-generator

Sponsor

Star

WordPress plugin that leverages OpenAI's Vision API to automatically generate descriptive alt text for images, enhancing accessibility and SEO.

plugin php wordpress wordpress-plugin ai openai hacktoberfest gpt-4 gpt-4v

Updated Apr 2, 2026
TypeScript

zli12321 / Vision-Language-Models-Overview

Star

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

reinforcement-learning clip claude world-models multimodal-models sota-model llava blip2 gpt-4v gemini-pro deepseek vision-language-models qwen-vl llama-vision-model multimodal-benchmarks vision-language-model-applications finevision-pretrain-dataset

Updated Mar 27, 2026
HTML

rutvik29 / vision-llm-pipeline

Star

Vision + LLM pipeline: YOLOv8 object detection, GPT-4V scene understanding, and automated visual QA with streaming API

python computer-vision object-detection multimodal fastapi yolov8 gpt-4v visual-qa

Updated Mar 25, 2026
Python

rutvik29 / multimodal-rag

Star

Production multimodal RAG pipeline: ingests PDFs, images, and tables with GPT-4V understanding and hybrid vector retrieval

python pdf openai multimodal rag hybrid-search langchain chromadb gpt-4v

Updated Mar 25, 2026
Python

ansh1113 / cooking-humanoid-vla

Star

Vision-Language-Action system for humanoid manipulation. Achieves 77% accuracy on real-world cooking videos.

deep-learning transformer clip whisper vla humanoid-robot continual-learning embodied-ai ai2thor gpt-4v

Updated Feb 7, 2026
Python

tianyi-lab / HallusionBench

Star

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark benchmarks lmm hallucination gpt-4 large-language-models llm llava large-vision-language-models vlms gpt-4v

Updated Oct 14, 2025
Python

OpenGVLab / InternVL

Star

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification gpt multi-modal semantic-segmentation video-classification image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

Updated Sep 22, 2025
Python

RLHF-V / RLAIF-V

Star

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

chatbot multimodal llava vision-language-learning gpt-4v llava-next rlaif-v minicpm-v cvpr2025

Updated May 14, 2025
Python

jiayuww / SpatialEval

Star

[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs

machine-learning gemini reasoning claude spatial-reasoning multimodal-deep-learning foundation-models large-language-models gpt-4v vision-language-models llama3 gpt-4o

Updated Jan 23, 2025
Python

jameszhou-gl / gpt-4v-distribution-shift

Star

Code for ICLR'24 workshop ME-FoMo-How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation

python ai openai clip robustness generalization distribution-shift llava gpt-4v

Updated Oct 18, 2024
Jupyter Notebook

ShareGPT4Omni / ShareGPT4Video

Star

[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"

gpt sora text-to-video large-language-models chatgpt large-vision-language-models large-multimodal-models gpt-4v large-video-language-models

Updated Oct 9, 2024
Python

autodistill / autodistill-gpt-4v

Star

GPT-4V(ision) module for use with Autodistill.

computer-vision object-detection gpt-4 autodistill gpt-4v

Updated Jul 30, 2024
Python

ShareGPT4Omni / ShareGPT4V

Star

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

gpt language-model large-language-models chatgpt instruction-tuning vision-language-model large-vision-language-models gpt4v large-multimodal-models gpt-4v eccv2024

Updated Jul 1, 2024
Python

gutbash / lmm-graph-vision

Star

How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?

data-structures openai vqa visual-question-answering vqa-dataset google-generative-ai gpt-4v gpt-4-vision-preview gemini-pro-vision claude-3

Updated Jun 13, 2024
Python

ShareGPT4Omni / ShareGPT4Omni

Star

ShareGPT4Omni: Towards Building Omni Large Multi-modal Models with Comprehensive Multi-modal Annotations

gpt chatgpt large-vision-language-models large-multimodal-models gpt-4v gpt-4o gpt4-omni

Updated Jun 6, 2024

aymenfurter / copilot-insurance-claim-demo

Star

How a Picture of Car Damage Can File Your Insurance Claim

java azure-openai semantic-kernel gpt-4v gpt-4-vision

Updated Apr 30, 2024
Java

davideuler / awesome-assistant-api

Star

Try openai assistant api apps on Google Colab for free. Awesome assistant API Demos!

examples assistant chatgpt function-calling dalle-3 gpt-4v gpt-4-turbo assistant-api vision-gpt

Updated Jan 21, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the gpt-4v topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpt-4v topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt-4v

Here are 27 public repositories matching this topic...

juanchito22-cpu / InternVL-U

Deathfrosthacker / Accessibility-Text-Enhancer

open-compass / VLMEvalKit

android-com-pl / wp-ai-alt-generator

zli12321 / Vision-Language-Models-Overview

rutvik29 / vision-llm-pipeline

rutvik29 / multimodal-rag

ansh1113 / cooking-humanoid-vla

tianyi-lab / HallusionBench

OpenGVLab / InternVL

RLHF-V / RLAIF-V

jiayuww / SpatialEval

jameszhou-gl / gpt-4v-distribution-shift

ShareGPT4Omni / ShareGPT4Video

autodistill / autodistill-gpt-4v

ShareGPT4Omni / ShareGPT4V

gutbash / lmm-graph-vision

ShareGPT4Omni / ShareGPT4Omni

aymenfurter / copilot-insurance-claim-demo

davideuler / awesome-assistant-api

Improve this page

Add this topic to your repo