🌍 Generate rich, context-aware captions from images by integrating location, events, and dates for more informative and meaningful descriptions.
-
Updated
Feb 8, 2026 - Python
🌍 Generate rich, context-aware captions from images by integrating location, events, and dates for more informative and meaningful descriptions.
(ICLR 2026) An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
🖼️BLIP-powered Image-to-Text Generator achieving 136.7 CIDEr score on TextCaps benchmark (142K captions, 28K images). 129M parameter model with batch processing (25 images), ZIP export with embedded captions & conditional captioning. Live Streamlit demo for instant AI-powered captioning.
Joycaption optimized for windows
Automatic Caption Generation for YT shorts and raw videos.
A real-time image captioning and visual question answering (VQA) system. This project uses computer vision and NLP to generate descriptive captions for images and answer user questions about them.
Caption generator using Vision Language Models and vLLM
Qwen Uncensored Image Captioner
Context-Aware Image Captioning with BLIP-2
Transcribe videos and generate captions using Whisper and FFmpeg with a Streamlit UI
Computer Vision Playground ⚡️
A 100% free & open-source AI Content Automation Tool that writes scripts, generates voiceovers, creates videos, and uploads them automatically — hands-free YouTube growth powered by AI.
A some what optimized implementation of some light weight and popular models
JustInCase is a tool that generates .srt subtitles from any given video or audio file. It uses AI (Whisper model) to generate captions
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Accessibility-focused SteamVR Overlay improving communication between deaf, hard-of-hearing, and hearing users in VR. It is leveraging AI allowing users to see real-time speech transcription in their 3D space. DISCLAIMER: Voice recognition technology is prone to errors and project should not be used as a replacement for medical hearing aid.
Captionify: Describing Images with AI An AI-powered image captioning system that uses CNNs and LSTMs to generate human-like captions for images. Trained on the Flickr8k dataset and evaluated with BLEU scores, it bridges computer vision and natural language processing for real-world applications like accessibility, social media, and e-commerce.
Official Repository of OmniCaptioner
A Python project that extracts audio from video files, transcribes the speech, translates it into a target language, and generates SRT subtitles.
Image To Text with Florence 2
Add a description, image, and links to the caption-generation topic page so that developers can more easily learn about it.
To associate your repository with the caption-generation topic, visit your repo's landing page and select "manage topics."