Skip to content
View i-MaTh's full-sized avatar
🎯
Focusing
🎯
Focusing
  • East China Normal University
  • Shanghai

Highlights

  • Pro

Block or report i-MaTh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.

Python 344 22 Updated Dec 25, 2025

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 793 99 Updated Dec 17, 2025

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…

19,149 1,998 Updated Dec 12, 2025

Voice Activity Detector (VAD) : low-latency, high-performance and lightweight

C 1,828 143 Updated Dec 23, 2025

Open-source framework for conversational voice AI agents

Python 9,414 1,102 Updated Dec 25, 2025
Python 7,882 465 Updated Dec 25, 2025

T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech synthesis with zero-shot capabilities.

Python 28 5 Updated Nov 7, 2025

dLLM: Simple Diffusion Language Modeling

Python 1,511 155 Updated Dec 25, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,512 214 Updated Dec 16, 2025

Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Python 402 29 Updated Apr 22, 2025

A search engine that "just works" for Obsidian. Supports OCR and PDF indexing.

TypeScript 1,736 81 Updated Nov 18, 2025

Precision Alignment, Infinite Possibilities

Python 100 7 Updated Dec 22, 2025

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 791 53 Updated Dec 22, 2025

Official implementation of "Continuous Autoregressive Language Models"

Python 677 81 Updated Dec 1, 2025

The official Implementation of PeriodWave and PeriodWave-Turbo

Python 215 16 Updated Apr 14, 2025

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 93 5 Updated Oct 15, 2025

VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)

Python 879 338 Updated Dec 24, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 2,822 352 Updated Dec 11, 2025

Trainging, inference, and testing of the SAC speech codec model.

Python 92 6 Updated Nov 1, 2025

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 607 52 Updated Oct 29, 2025

Code for the blog "Neural audio codecs: how to get audio into LLMs"

Python 140 3 Updated Oct 20, 2025

[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

Python 118 10 Updated Mar 27, 2025

PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models

924 74 Updated Dec 15, 2025

LongCat Audio Tokenizer and Detokenizer

Python 264 18 Updated Dec 15, 2025

Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 106 Updated Oct 17, 2025

PESQ (Perceptual Evaluation of Speech Quality) Wrapper for Python Users (narrow band and wide band)

C 617 104 Updated Sep 5, 2024

FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego -- an omnimodal model with native full duplexity.

Python 53 8 Updated Dec 9, 2025

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 4,266 487 Updated Apr 15, 2025

A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books and PDF documents.

Python 1,020 121 Updated Dec 15, 2025

Official implementation of DNSMOS Pro (accepted at INTERSPEECH 2024).

Python 73 9 Updated Jun 8, 2025
Next