Stars
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A generative speech model for daily dialogue.
Easily train a good VC model with voice data <= 10 mins!
Real-time face swap for PC streaming or video calls
Rembg is a tool to remove images background
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
openvpi / DiffSinger
Forked from MoonInTheRiver/DiffSingerAn advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.
Use supervised learning to illuminate the latent space of GAN for controlled generation and edit
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
🔓 Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
pix2pix3D: Generating 3D Objects from 2D User Inputs
A series of Houdini shelf tools that are geared towards game developers!
Age Progression/Regression by Conditional Adversarial Autoencoder
fast style transfer on your webcam, with pre-trained models at
OpenCV Object Tracking using CamShift algorithm and Unity3d Mashup
Steering wheel and joystick emulation using steamVR
Experiments with OpenCV to detect circular objects, plates, cups, etc.