Skip to content
View joannahong's full-sized avatar

Block or report joannahong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

No non-sense and no BS repo for how data structure code should be in Python - simple and elegant.

Python 3,042 624 Updated Apr 6, 2024

[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Python 43 3 Updated Sep 6, 2024

Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"

Python 12 Updated Mar 9, 2024

[TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation

Python 31 4 Updated Sep 6, 2024

Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23

Python 35 2 Updated Jun 20, 2023

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 16,037 1,269 Updated Jan 18, 2025

PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhan…

Python 20 2 Updated Apr 3, 2024

FREE Bootstrap 5 Light Mode Resume/CV Template for Developers

SCSS 31 43 Updated Sep 16, 2024

PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)

Python 20 4 Updated Apr 11, 2022

PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)

Python 70 7 Updated Mar 9, 2024

Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)

Python 77 7 Updated Feb 27, 2025

A self-supervised learning framework for audio-visual speech

Python 962 154 Updated Dec 7, 2023

Robust Speech Recognition via Large-Scale Weak Supervision

Python 92,130 11,547 Updated Dec 15, 2025

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 21,892 2,679 Updated Dec 15, 2025

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Python 401 34 Updated Sep 11, 2023

PyTorch implementation of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (focused on DiffSpeech)

Python 242 33 Updated Feb 3, 2022

Denoising Diffusion Implicit Models

Python 1,754 228 Updated Jul 26, 2024

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,304 1,250 Updated Aug 4, 2025

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Python 343 45 Updated Feb 21, 2022

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,172 736 Updated May 31, 2024

[CVPR 2022] Official PyTorch Implementation for "Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network"

Python 32 4 Updated Mar 13, 2023

[ICCV 2023] Official PyTorch Implementation for "Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning"

Python 32 3 Updated Oct 13, 2023

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Python 321 59 Updated Jul 25, 2024

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,157 143 Updated Sep 5, 2024

Implementation of WaveGrad high-fidelity vocoder from Google Brain in PyTorch.

Jupyter Notebook 405 53 Updated Jul 7, 2021

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Jupyter Notebook 1,630 349 Updated Apr 22, 2024

End-to-End Speech Processing Toolkit

Python 9,647 2,364 Updated Dec 16, 2025

PyTorch implementation of "Lip to Speech Synthesis with Visual Context Attentional GAN" (NeurIPS2021)

Python 25 5 Updated Mar 9, 2024

Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

Python 80 11 Updated Aug 14, 2023

EE474 Term Project

Python 3 1 Updated Nov 26, 2025
Next