Skip to content
View AltasK's full-sized avatar

Block or report AltasK

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Vibe Workflow Platform for Non-technical Creators.

TypeScript 5,753 530 Updated Dec 23, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,503 214 Updated Dec 16, 2025

The best ChatGPT that $100 can buy.

Python 39,145 4,956 Updated Dec 23, 2025

MultiModal Pairwise Constrained Speaker Diarization System

Python 3 Updated Jul 25, 2025

🎓 Update Talking-Face Research Papers Daily

Python 367 28 Updated Dec 24, 2025

A toolkit for speaker diarization.

Jupyter Notebook 349 39 Updated Dec 9, 2025

A Fully Self-Hosted Solution for Full-Duplex Voice Interaction

Python 456 34 Updated Sep 28, 2025

Efficient audio understanding with general audio captions

Python 391 39 Updated Nov 3, 2025
Python 1,171 66 Updated Dec 3, 2025

Voice Activity Detector (VAD) : low-latency, high-performance and lightweight

C 1,788 142 Updated Dec 23, 2025

Script to demonstrate how to use a Language Model for Semantic Turn Detection. Refer to blog post for full details.

Python 16 1 Updated May 9, 2025

The implementation of "X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion", which is accepted by Information Fusion.

Python 86 13 Updated Sep 2, 2025

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,020 238 Updated Nov 30, 2025
Python 336 42 Updated Apr 11, 2025

A Survey of Spoken Dialogue Models (60 pages)

313 17 Updated Nov 28, 2024

Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.

MATLAB 519 126 Updated Feb 17, 2022

Exa MCP for web search and web crawling!

TypeScript 3,442 263 Updated Dec 22, 2025

Source code for "Enginneering Deep Learning Platforms"

Java 56 14 Updated May 4, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 18,981 1,652 Updated Nov 19, 2025
Jupyter Notebook 36 7 Updated Jul 15, 2023

Open singing synthesis platform / Open source UTAU successor

C# 3,415 426 Updated Nov 29, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,806 1,083 Updated Dec 23, 2025

End-to-end realtime stack for connecting humans and AI

Go 16,219 1,636 Updated Dec 23, 2025

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Python 3,037 319 Updated Dec 10, 2025

DALI: a large Dataset of synchronised Audio, LyrIcs and vocal notes.

Python 375 36 Updated Jun 11, 2020

A tool for real-time lyrics alignment and visualization, integrating audio processing, phoneme-level synchronization, and interactive variable font typography.

Jupyter Notebook 4 Updated Mar 20, 2025

A Conversational Speech Generation Model

Python 14,369 1,458 Updated May 27, 2025

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,445 798 Updated Mar 15, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,781 309 Updated Dec 16, 2025

Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Python 2,160 250 Updated Nov 27, 2025
Next