Stars
[CVPR 2025 Highlight] Official implementation of "MangaNinja: Line Art Colorization with Precise Reference Following"
(CVPR 2025) Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Code release for "Git Re-Basin: Merging Models modulo Permutation Symmetries"
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
🍃 MINT-1T: A one trillion token multimodal interleaved dataset.
Optical character recognition for Japanese text, with the main focus being Japanese manga
Flet enables developers to easily build realtime web, mobile and desktop apps in Python. No frontend experience required.
GitHub repo analytics tool without 14 days limit
A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.
Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.
A language learning app to improve speaking and listening skills.
The official implementation of HierSpeech++
A manga translator built with python
Official code for "Style Aligned Image Generation via Shared Attention"
✨ 首个CJK(中日韩)字体识别以及样式提取模型 YuzuMarker的字体识别模型与实现 / First-ever CJK (Chinese Japanese Korean) Font Recognition and Style Extractor, side project of YuzuMarker
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)
This is a Korean OCR Python code using the Pororo library
Bubble Blaster removes text from speech bubbles in mangas/manhwas, made for Scanlations groups.
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
A repo to evaluate various LLM's chess playing abilities.
An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents