Skip to content
View 980202006's full-sized avatar

Block or report 980202006

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 2,276 164 Updated Dec 19, 2025

DACVAE

Python 138 13 Updated Dec 19, 2025

MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation Model

787 25 Updated Dec 17, 2025

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 248 5 Updated Dec 16, 2025

Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?

Python 138 7 Updated Dec 15, 2025

[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Python 447 18 Updated Dec 19, 2025
Jupyter Notebook 15 1 Updated Aug 22, 2025

Advanced Signal Processing Notebooks and Tutorials

Jupyter Notebook 171 47 Updated Dec 9, 2021

Suggestions for those interested in developing audio applications of machine learning

14 Updated Jan 10, 2020

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 755 92 Updated Dec 17, 2025

Ravetable synthesis - Latent signal processing

Max 32 Updated Sep 25, 2025

CogView4, CogView3-Plus and CogView3(ECCV 2024)

Python 1,100 79 Updated Mar 29, 2025
Python 565 56 Updated Sep 23, 2025

An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone

Python 18,508 2,899 Updated Dec 19, 2025

A simple yet effective Audio-to-Midi Automatic Piano Transcription system

Python 268 26 Updated Nov 22, 2024
Python 6 1 Updated Jul 11, 2025
Python 47 6 Updated Aug 31, 2024

UniSpeech - Large Scale Self-Supervised Learning for Speech

Python 474 74 Updated Apr 5, 2024
Python 2 Updated Oct 16, 2024

Variational Autoencoder in the mel-spectrogram domain for one-shot audio synthesis

Jupyter Notebook 145 18 Updated Dec 12, 2021

Pytorch Implementation of the paper "M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis"

Python 92 2 Updated Dec 18, 2025

This is the official implementation for the paper "Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training".

Python 12 1 Updated Dec 7, 2025

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

99 3 Updated Dec 11, 2025

PyTorch implementation of V2Coder

Python 3 2 Updated Jun 27, 2025
OpenEdge ABL 198 20 Updated Nov 22, 2022

Official repository for “DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation”

Python 134 6 Updated Dec 18, 2025
Python 530 33 Updated Feb 13, 2024
Python 41 4 Updated Dec 15, 2025
Next