Starred repositories
A Survey of Reinforcement Learning for Large Reasoning Models
Tracking the latest and greatest research papers on text-to-image generation.
Trending projects & awesome papers about data-centric llm studies.
Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.
Record the canvas as an image, mp4 video, or gif from the browser
Highcharts Draggable 3D Rotation
How to add stickers to live video streams with Python, teachable machine and OpenCV
Content-Based Video-Music Retrieval using Soft Intra-Modal Structure Constraint
A video-music cross-retrieval model. Training pipeline and deployment API are provided.
A curated list of recent diffusion models for video generation, editing, and various other applications.
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Codebase for CVPR2020 A Local-to-Global Approach to Multi-modal Movie Scene Segmentation
[CVPR 2024] | LAMP: Learn a Motion Pattern for Few-Shot Based Video Generation
Training and evaluation pipeline for MEG and EEG brain signal encoding and decoding using deep learning. Code for our paper "Decoding speech perception from non-invasive brain recordings" published…
ykk648 / AnimateDiff-I2V
Forked from guoyww/AnimateDiffAnimateDiff I2V version.
[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Official implementation of the paper "ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models"(SIGGRAPH Asia 2023)
Generate comic panels using a LLM + SDXL. Powered by Hugging Face 🤗
Public code release for: ColorfulCurves: Palette-Aware Lightness Control and Color Editing via Sparse Optimization (SIGGRAPH 2023) [Ted Chao, Jason Klein, Jianchao Tan, Jose Echevarria, Yotam Gingold]
[CSUR] A Survey on Video Diffusion Models
Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech
List of academic resources on Multimodal ML for Music
Transformer-based Conditional Variational Autoencoder for Controllable Story Generation
A simple notebook demonstrating prompt-based music generation via Mubert API