Lists (2)
Sort Name ascending (A-Z)
Stars
A modular framework for neural networks with Euclidean symmetry
Zero-shot Image-to-Image Translation [SIGGRAPH 2023]
Configs and boilerplates for Label Studio's Machine Learning backend
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
FlagGems is an operator library for large language models implemented in the Triton Language.
Code for "Neural 3D Reconstruction in the Wild", SIGGRAPH 2022 (Conference Proceedings)
Python examples on how to use GStreamer within OpenCV. Now with GPU support! 🔥🔥🔥
Perceptual Metrics of Audio - perceptually relevant loss function. DPAM and CDPAM
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
Can CNNs transliterate Pinyin into Chinese characters correctly?
Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model 搜喵拼音输入法
[NeurIPS 2022, T-PAMI 2023] Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
Audio-Visual Speech Separation with Cross-Modal Consistency
This is a toolbox for Deep Active Learning, an extension from previous work https://github.com/ej0cl6/deep-active-learning (DeepAL toolbox).
Tiny optimized Stable-diffusion that can run on GPUs with just 1GB of VRAM. (Beta)
[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727
Implementation of Post-training Quantization on Diffusion Models (CVPR 2023)
Official code for the paper "Visual Speech Enhancement Without A Real Visual Stream" published at WACV 2021
Python codes for Lite Audio-Visual Speech Enhancement.
Implementation of DiffusionOverDiffusion architecture presented in NUWA-XL in a form of ControlNet-like module on top of ModelScope text2video model for extremely long video generation.
Automatic detection of multi-speaker fragments with high time resolution