Skip to content
View HowieMa's full-sized avatar

Block or report HowieMa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2025 Spotlight] Demo implementation of MoCha Towards Movie-Grade Talking Character Synthesis

Python 15 2 Updated Dec 27, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,802 2,592 Updated Mar 5, 2026

Open-source unified multimodal model

Python 5,802 513 Updated Oct 27, 2025

Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

Python 303 14 Updated Apr 23, 2025

Official inference repo for FLUX.1 models

Python 25,401 1,875 Updated Jul 31, 2025

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 8,641 1,119 Updated Sep 14, 2024

"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)

Python 2,705 168 Updated Dec 12, 2023

A generative world for general-purpose robotics & embodied AI learning.

Python 28,490 2,669 Updated Apr 13, 2026

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 11,964 1,226 Updated Nov 21, 2025

Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation

Python 110 Updated Apr 16, 2025

[ICCV2025] UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Python 275 13 Updated May 1, 2025

Multimodal Models in Real World

Jupyter Notebook 557 23 Updated Feb 24, 2025

Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.

Python 1,131 71 Updated Feb 7, 2025

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Python 141 6 Updated Jun 4, 2025

Next-Token Prediction is All You Need

Python 2,395 95 Updated Jan 12, 2026

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Jupyter Notebook 6,524 426 Updated Jun 28, 2024

StoryMaker: Towards consistent characters in text-to-image generation

Python 718 61 Updated Dec 2, 2024

MoVQGAN - model for the image encoding and reconstruction

Jupyter Notebook 264 18 Updated Oct 31, 2023

[ICLR2025] Kolmogorov-Arnold Transformer

Python 856 59 Updated Mar 23, 2025

SEED-Story: Multimodal Long Story Generation with Large Language Model

Python 884 69 Updated Oct 11, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,947 93 Updated Aug 15, 2024

Official Pytorch implementation of StreamV2V.

Python 542 59 Updated Dec 29, 2025

Extract frames and motion vectors from H.264 and MPEG-4 encoded video.

C++ 397 74 Updated Oct 14, 2025

A collection of resources on controllable generation with text-to-image diffusion models.

1,113 32 Updated Dec 31, 2024

[T-PAMI 2025] V3D: Video Diffusion Models are Effective 3D Generators

Python 520 18 Updated Mar 26, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 28,868 2,929 Updated Apr 9, 2026

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 11,939 880 Updated Jul 18, 2024

[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation

Python 504 17 Updated Jul 2, 2024

[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.

Python 686 56 Updated Nov 28, 2024
Next