🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python 3,291 209 Updated Mar 5, 2024

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,539 189 Updated Apr 2, 2025

kha-white / manga-ocr

Optical character recognition for Japanese text, with the main focus being Japanese manga

Python 2,527 124 Updated Jun 14, 2025

facebookresearch / blt

Code for BLT research paper

Python 2,027 190 Updated Nov 3, 2025

nidhaloff / deep-translator

A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.

Python 1,942 237 Updated Jul 23, 2024

nat / natbot

Drive a browser with GPT-3

Python 1,935 276 Updated Jun 9, 2024

QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,868 140 Updated Jul 5, 2024

showlab / Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,865 86 Updated Jan 8, 2026

Kav-K / GPTDiscord

A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

Python 1,849 294 Updated May 30, 2024

ytongbai / LVM

Python 1,840 61 Updated Jun 28, 2024

lyuchenyang / Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Python 1,595 132 Updated Jan 1, 2025

thu-ml / unidiffuser

Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"

Python 1,472 92 Updated May 31, 2023

Yifan-Song793 / RestGPT

An LLM-based autonomous agent controlling real-world applications via RESTful APIs

Python 1,389 105 Updated Jun 7, 2024

microsoft / X-Decoder

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,343 164 Updated Oct 5, 2023

google / style-aligned

Official code for "Style Aligned Image Generation via Shared Attention"

Python 1,314 97 Updated Dec 29, 2023

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python 1,312 73 Updated Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ogkalu2

Achievements

Achievements

Block or report ogkalu2

Stars

meta-llama / llama

oobabooga / text-generation-webui

chenfei-wu / TaskMatrix

lllyasviel / ControlNet

haotian-liu / LLaVA

Mikubill / sd-webui-controlnet

flet-dev / flet

zyddnys / manga-image-translator

facebookresearch / ImageBind

Plachtaa / VALL-E-X

deep-floyd / IF

zai-org / GLM-130B

AILab-CVC / YOLO-World

aiwaves-cn / agents

EvolvingLMMs-Lab / Otter