- New York, New York
-
03:17
(UTC -05:00) - https://joannahong.github.io/
Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Robust Speech Recognition via Large-Scale Weak Supervision
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
No non-sense and no BS repo for how data structure code should be in Python - simple and elegant.
Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch
Simple Tensorflow Cookbook for easy-to-use
A Flow-based Generative Network for Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing
This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.
Official PyTorch implementation of BigVGAN (ICLR 2023)
Implementation A Style-Based Generator Architecture for Generative Adversarial Networks in PyTorch
A self-supervised learning framework for audio-visual speech
Out of time: automated lip sync in the wild
This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"
MelGAN vocoder (compatible with NVIDIA/tacotron2)
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Visual Speech Recognition for Multiple Languages
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation (ICLR 2020 spotlight)
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network