Skip to content
View L0SG's full-sized avatar

Block or report L0SG

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OpenFLAM: Framewise Language Audio Model

Python 101 6 Updated Jan 14, 2026

PersonaPlex code.

Python 8,830 1,246 Updated Mar 2, 2026

This is the official implementation for εar-VAE model including inference and evaluation parts, more details coming soon...

Python 69 6 Updated Feb 13, 2026

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 114 10 Updated Mar 3, 2026

[ACL 2025] Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis

Python 14 Updated Apr 1, 2026

ACE-Step: A Step Towards Music Generation Foundation Model

Python 4,297 539 Updated Feb 15, 2026

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,558 343 Updated Jun 21, 2025

[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

Python 125 11 Updated Apr 8, 2026
Python 59 3 Updated Mar 22, 2025

The official implementation of TokenSynth (ICASSP 2025)

Python 81 4 Updated Oct 27, 2025

A low-bitrate single-codebook 16 / 24 kHz speech codec based on focal modulation

Jupyter Notebook 161 16 Updated Nov 30, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 706 51 Updated Jun 5, 2025

Training Large Language Model to Reason in a Continuous Latent Space

Python 1,562 170 Updated Apr 8, 2026

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 424 31 Updated Feb 12, 2026

A suite of image and video neural tokenizers

Jupyter Notebook 1,716 87 Updated Feb 11, 2025

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

8,096 516 Updated Jan 6, 2026

Event Relation in Text-to-Audio (TTA) Generation

Python 20 Updated Feb 26, 2025

[ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

Jupyter Notebook 847 77 Updated Jan 28, 2026

LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation with Spoken Language Models" (arXiv 2024).

93 4 Updated Dec 28, 2024
Python 334 31 Updated Dec 17, 2024

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,142 251 Updated Feb 23, 2026

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 93 4 Updated Dec 3, 2024

Official repository of Wavehax vocoder

Python 67 7 Updated Dec 20, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,891 120 Updated Feb 20, 2026

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 10,001 930 Updated Mar 4, 2026

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 214 18 Updated Sep 19, 2024

Text-to-Music Generation with Rectified Flow Transformers

Python 1,712 128 Updated Dec 10, 2024

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,283 110 Updated Mar 2, 2025

The official Implementation of PeriodWave and PeriodWave-Turbo

Python 220 17 Updated Apr 14, 2025

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Jupyter Notebook 13,152 1,876 Updated Dec 19, 2025
Next