Lists (3)
Sort Name ascending (A-Z)
Stars
[MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
[NeurIPS 2025 Spotlight] Official implementation for DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing
[NeurIPS 2025] Improving Video Generation with Human Feedback
[NeurIPS 2025 D&B🔥] ImgEdit: A Unified Image Editing Dataset and Benchmark
基于Python的A股智能分析工具,结合大语言模型提供数据驱动的投资建议和市场洞察
Kronos: A Foundation Model for the Language of Financial Markets
Wan: Open and Advanced Large-Scale Video Generative Models
We write your reusable computer vision tools. 💜
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Finetuning and inference tools for the CogView4 and CogVideoX model series.
The official homepage of the COCO-Stuff dataset.
Diffusers pipeline for inpainting with any available finetune
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Generative Models by Stability AI
Official implementation for "RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers" (ICML 2025)
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
Fully Open Framework for Democratized Multimodal Training
[CVPR 2020] The first large-scale public benchmark dataset for image harmonization. The code used in our paper "DoveNet: Deep Image Harmonization via Domain Verification", CVPR2020. Useful for imag…
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
[SIGGRAPH Asia 2024] I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
INFTY Engine: An Optimization Toolkit to Support Continual AI
(ACM TOMM) This is the official code repository for "VM-UNet: Vision Mamba UNet for Medical Image Segmentation".
[npj Digital Medicine] The official repository for "Large-Vocabulary Segmentation for Medical Images with Text Prompts"