Skip to content
View ogkalu2's full-sized avatar

Block or report ogkalu2

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
57 stars written in Python
Clear filter

Inference code for Llama models

Python 59,112 9,829 Updated Jan 26, 2025

The definitive Web UI for local AI, with powerful features and easy setup.

Python 46,005 5,891 Updated Feb 3, 2026

Let us control diffusion models!

Python 33,617 2,998 Updated Feb 25, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,426 2,724 Updated Aug 12, 2024

WebUI extension for ControlNet

Python 17,880 2,031 Updated Aug 12, 2024

Flet enables developers to easily build realtime web, mobile and desktop apps in Python. No frontend experience required.

Python 15,515 614 Updated Feb 5, 2026

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)

Python 9,345 912 Updated Dec 17, 2025

ImageBind One Embedding Space to Bind Them All

Python 8,958 842 Updated Nov 21, 2025

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,959 783 Updated Feb 11, 2024
Python 7,846 528 Updated Apr 14, 2024

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Python 7,676 608 Updated Jul 25, 2023

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 6,196 582 Updated Feb 26, 2025

An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents

Python 5,859 476 Updated Sep 26, 2024

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python 3,291 209 Updated Mar 5, 2024

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,539 189 Updated Apr 2, 2025

Optical character recognition for Japanese text, with the main focus being Japanese manga

Python 2,527 124 Updated Jun 14, 2025

Code for BLT research paper

Python 2,027 190 Updated Nov 3, 2025

A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.

Python 1,942 237 Updated Jul 23, 2024

Drive a browser with GPT-3

Python 1,935 276 Updated Jun 9, 2024

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,868 140 Updated Jul 5, 2024

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,865 86 Updated Jan 8, 2026

A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

Python 1,849 294 Updated May 30, 2024
Python 1,840 61 Updated Jun 28, 2024

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Python 1,595 132 Updated Jan 1, 2025

Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"

Python 1,472 92 Updated May 31, 2023

An LLM-based autonomous agent controlling real-world applications via RESTful APIs

Python 1,389 105 Updated Jun 7, 2024

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,343 164 Updated Oct 5, 2023

Official code for "Style Aligned Image Generation via Shared Attention"

Python 1,314 97 Updated Dec 29, 2023

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python 1,312 73 Updated Jan 17, 2024
Next