Stars
The definitive Web UI for local AI, with powerful features and easy setup.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
WebUI extension for ControlNet
Flet enables developers to easily build realtime web, mobile and desktop apps in Python. No frontend experience required.
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)
ImageBind One Embedding Space to Bind Them All
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Optical character recognition for Japanese text, with the main focus being Japanese manga
A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
An LLM-based autonomous agent controlling real-world applications via RESTful APIs
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
Official code for "Style Aligned Image Generation via Shared Attention"
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".