Skip to content
View wnhsu's full-sized avatar

Block or report wnhsu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,533 319 Updated May 26, 2026

Audio waveform visualisation, converts any audio to a nice video

Python 336 44 Updated Mar 7, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 729 52 Updated Jun 5, 2025

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,494 795 Updated May 30, 2026

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,623 792 Updated May 31, 2024

Code for the paper "Training Diffusion Models with Reinforcement Learning"

Python 572 35 Updated Jul 5, 2023

Perf monitoring CLI tool for Apple Silicon

Python 4,589 212 Updated Apr 18, 2024

Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍

Shell 25,084 3,590 Updated Jun 14, 2026

Port of OpenAI's Whisper model in C/C++

C++ 50,758 5,665 Updated Jun 15, 2026

LLM inference in C/C++

C++ 116,724 19,616 Updated Jun 16, 2026

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

Python 1,002 101 Updated Apr 2, 2023

Python implementation of performance metrics in Loizou's Speech Enhancement book

Python 455 93 Updated Feb 15, 2025

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Python 7,375 1,298 Updated Mar 16, 2026

Hydra is a framework for elegantly configuring complex applications

Python 10,446 866 Updated Jun 15, 2026

A self-supervised learning framework for audio-visual speech

Python 988 161 Updated Dec 7, 2023

[ECCV 2018] Official code for "Graph R-CNN for Scene Graph Generation"

Python 748 159 Updated Apr 1, 2020

A faster pytorch implementation of faster r-cnn

Python 7,857 2,298 Updated May 20, 2022

Lingvo

Python 2,864 450 Updated Jun 12, 2026

Wavenet Autoencoder for Unsupervised speech representation learning (after Chorowski, Jan 2019)

Python 176 23 Updated Sep 16, 2020

VQVAE for Unsupervised Voice Conversion

Python 21 5 Updated Apr 25, 2019

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

C++ 17,601 4,229 Updated Mar 11, 2023

Learning diverse image-to-image translation from unpaired data

Python 856 155 Updated Aug 5, 2020

"Learning To Blend Photos," ECCV 2018

51 1 Updated Sep 11, 2018

Collection of generative models in Pytorch version.

Python 2,630 538 Updated Apr 12, 2020

This repository provides state of the art (SoTA) results for all machine learning problems. We do our best to keep this repository up to date. If you do find a problem's SoTA result is out of date …

8,901 1,298 Updated Jun 25, 2019

Implementations of various VAE-based semi-supervised and generative models in PyTorch

Python 709 123 Updated Mar 2, 2020

Tensorflow implementation of the speech model described in Neural Discrete Representation Learning (a.k.a. VQ-VAE)

Python 129 31 Updated Jul 26, 2018

WaveNet vocoder

Python 2,371 493 Updated Jul 29, 2023

Python library for audio and music analysis

Python 8,467 1,054 Updated Jun 15, 2026