- London, UK
- https://www.linkedin.com/in/shehrum/
- @shehrum
Stars
Official code for "AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation" (CVPR2023)
[ECCV 2024] Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
An open source, self-hosted implementation of the Shotstack API backend
Python SDK for Shotstack, the cloud video editing API
Code and dataset for photorealistic Codec Avatars driven from audio
Code of SIGGRAPH 2023 Conference paper: StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
This repository contains the code for my master thesis on Emotion-Aware Facial Animation
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
Audio-Visual Speech Separation with Cross-Modal Consistency
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
A pytorch CUDA extension implementation of instant-ngp (sdf and nerf), with a GUI.
A Blender add-on for importing a sequence of OBJ meshes as frames
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.
A repository listing out the potential sources which will help you in preparing for a Data Science/Machine Learning interview. New resources added frequently.
TensorFlow's Visualization Toolkit
VIP cheatsheets for Stanford's CS 230 Deep Learning
A curated list of different papers and datasets in various areas of audio-visual processing
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Machine Learning and Computer Vision Engineer - Technical Interview Questions
Deep neural networks for voice conversion (voice style transfer) in Tensorflow
Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams (Interspeech'19)
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
FSGAN - Official PyTorch Implementation