DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
-
Updated
Dec 6, 2024 - Python
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Content-Based Image Retrieval System
Integrates AWS Bedrock's multimodal capabilities (Claude 3) into the Docling framework for generating image descriptions within document processing pipelines.
NotyVisualScan uses AI to generate descriptions and tags for images in Notion, with automated image uploads.
Python script that turns PDF into text files; It extracts text and change image into text by vision models.
This Python script processes images in a specified folder, sends them to the OpenAI API and saves the responses as text files.
Tech spec specialized #markitdown-plugin, for example enhanced markdowns with image descriptions and more
Add a description, image, and links to the image-descriptions topic page so that developers can more easily learn about it.
To associate your repository with the image-descriptions topic, visit your repo's landing page and select "manage topics."