Skip to content
View joannahong's full-sized avatar

Block or report joannahong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
64 stars written in Python
Clear filter

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 156,161 31,963 Updated Feb 5, 2026

Robust Speech Recognition via Large-Scale Weak Supervision

Python 94,195 11,720 Updated Dec 15, 2025

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 22,007 2,694 Updated Jan 23, 2026

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 16,165 1,280 Updated Jan 18, 2025

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,443 1,266 Updated Aug 4, 2025

End-to-End Speech Processing Toolkit

Python 9,717 2,378 Updated Feb 4, 2026

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,334 751 Updated May 31, 2024

Graph Convolutional Networks in PyTorch

Python 5,395 1,224 Updated Sep 20, 2020

No non-sense and no BS repo for how data structure code should be in Python - simple and elegant.

Python 3,061 624 Updated Apr 6, 2024

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

Python 2,830 629 Updated Nov 6, 2023

Simple Tensorflow Cookbook for easy-to-use

Python 2,758 465 Updated Feb 9, 2020

A Flow-based Generative Network for Speech Synthesis

Python 2,335 536 Updated Oct 19, 2023

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Python 2,309 550 Updated Jul 27, 2024

Denoising Diffusion Implicit Models

Python 1,782 230 Updated Jul 26, 2024

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

Python 1,557 280 Updated Feb 9, 2022

This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.

Python 1,210 315 Updated Dec 19, 2020

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,181 143 Updated Sep 5, 2024

Implementation A Style-Based Generator Architecture for Generative Adversarial Networks in PyTorch

Python 1,111 229 Updated Aug 26, 2021

A self-supervised learning framework for audio-visual speech

Python 970 158 Updated Dec 7, 2023

Out of time: automated lip sync in the wild

Python 868 188 Updated Jan 23, 2024

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"

Python 714 153 Updated Jul 6, 2023

A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"

Python 691 140 Updated Nov 8, 2023

MelGAN vocoder (compatible with NVIDIA/tacotron2)

Python 650 114 Updated Oct 3, 2020

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.

Python 635 192 Updated May 27, 2023

Visual Speech Recognition for Multiple Languages

Python 459 72 Updated Aug 17, 2023

ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks

Python 432 102 Updated May 18, 2023

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Python 401 35 Updated Sep 11, 2023

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation (ICLR 2020 spotlight)

Python 351 61 Updated Apr 12, 2020

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Python 347 44 Updated Feb 21, 2022

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Python 321 59 Updated Jul 25, 2024
Next