- New York, New York
-
17:32
(UTC -05:00) - https://joannahong.github.io/
Stars
No non-sense and no BS repo for how data structure code should be in Python - simple and elegant.
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"
[TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23
pix2tex: Using a ViT to convert images of equations into LaTeX code.
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhan…
FREE Bootstrap 5 Light Mode Resume/CV Template for Developers
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
A self-supervised learning framework for audio-visual speech
Robust Speech Recognition via Large-Scale Weak Supervision
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
PyTorch implementation of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (focused on DiffSpeech)
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
[CVPR 2022] Official PyTorch Implementation for "Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network"
[ICCV 2023] Official PyTorch Implementation for "Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning"
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Official PyTorch implementation of BigVGAN (ICLR 2023)
Implementation of WaveGrad high-fidelity vocoder from Google Brain in PyTorch.
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
PyTorch implementation of "Lip to Speech Synthesis with Visual Context Attentional GAN" (NeurIPS2021)
Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)