Lists (7)
Sort Name ascending (A-Z)
Stars
一个基于nano banana pro🍌的原生AI PPT生成应用,迈向真正的"Vibe PPT"; 支持上传任意模板图片;上传任意素材&智能解析;一句话/大纲/页面描述自动生成PPT;口头修改指定区域、一键导出 - An AI-native PPT generator based on nano banana pro🍌
Thinking with Programming Vision: Towards a Unified View for Thinking with Images
Open-source and strong foundation image recognition models.
A framework for efficient model inference with omni-modality models
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
Fully Open Framework for Democratized Multimodal Training
A PyTorch native platform for training generative AI models
🍌Awesome Prompts; Nano Banana;Banana Pro; Gemini;AI Studio;Prompt Quickly[正在开发 Sidebar 高级功能,敬请期待]
GenExam: A Multidisciplinary Text-to-Image Exam
Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"
HunyuanVideo-1.5: A leading lightweight video generation model
Kandinsky 5.0: A family of diffusion models for Video & Image generation
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
NEO Series: Native Vision-Language Models from First Principles
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Zhejiang University Graduation Thesis LaTeX Template
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
Native Multimodal Models are World Learners
🐻 Uniform Discrete Diffusion with Metric Path for Video Generation
🎥 Python and OpenCV-based scene cut/transition detection program & library.
[ICCV'25] Unified Open-World Segmentation with Multi-Modal Prompts
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
The author's implementation of FUDOKI, a multimodal large language model purely based on discrete flow matching.