The implementation of "X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion", which is accepted by Information Fusion.

Python 86 13 Updated Sep 2, 2025

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,020 238 Updated Nov 30, 2025

jzq2000 / MoonCast

Python 336 42 Updated Apr 11, 2025

jishengpeng / WavChat

A Survey of Spoken Dialogue Models (60 pages)

313 17 Updated Nov 28, 2024

anicolson / DeepXi

Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.

MATLAB 519 126 Updated Feb 17, 2022

exa-labs / exa-mcp-server

Exa MCP for web search and web crawling!

TypeScript 3,442 263 Updated Dec 22, 2025

orca3 / MiniAutoML

Source code for "Enginneering Deep Learning Platforms"

Java 56 14 Updated May 4, 2025

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 18,981 1,652 Updated Nov 19, 2025

Ldoun / DeepSinger

Jupyter Notebook 36 7 Updated Jul 15, 2023

stakira / OpenUtau

Open singing synthesis platform / Open source UTAU successor

C# 3,415 426 Updated Nov 29, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,806 1,083 Updated Dec 23, 2025

livekit / livekit

End-to-end realtime stack for connecting humans and AI

Go 16,219 1,636 Updated Dec 23, 2025

openvpi / DiffSinger

Forked from MoonInTheRiver/DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Python 3,037 319 Updated Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AltasK

Block or report AltasK

Lists (1)

🚀 My stack

Stars

refly-ai / refly

facebookresearch / omnilingual-asr

karpathy / nanochat

GeekOrangeLuYao / multimodal_pairwise_constrained_speaker_diarization

liutaocode / talking-face-arxiv-daily

BUTSpeechFIT / DiariZen

FireRedTeam / FireRedChat

xiaomi-research / dasheng-lm

pipecat-ai / smart-turn

TEN-framework / ten-vad

aaronng91 / semantic-turn-detection

HaoFengyuan / X-TF-GridNet

hkchengrex / MMAudio

jzq2000 / MoonCast

jishengpeng / WavChat

anicolson / DeepXi

exa-labs / exa-mcp-server

orca3 / MiniAutoML

nari-labs / dia

Ldoun / DeepSinger

stakira / OpenUtau

modelscope / ms-swift

livekit / livekit

openvpi / DiffSinger

gabolsgabs / DALI

bensapirstein / lyrics-alignment

SesameAILabs / csm

jasonppy / VoiceCraft

lucidrains / vector-quantize-pytorch

ASLP-lab / DiffRhythm