Stars
The ultimate training toolkit for finetuning diffusion models
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
kaldi-asr/kaldi is the official location of the Kaldi project.
A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
SSSegmentation: An Open Source Supervised Semantic Segmentation Toolbox Based on PyTorch.
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
⚡️ 10x - Up to 20x faster AI coding with multi-step Superpowers. Open-source agent with smart model routing, BYOK, fully self-hosted.
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
An Open Source text-to-speech system built by inverting Whisper.
Noise supression using deep filtering
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a ca…
Robust Speech Recognition via Large-Scale Weak Supervision
A playbook for systematically maximizing the performance of deep learning models.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
A collection of libraries to optimise AI model performances
A collection of research materials on explainable AI/ML
Latex code for making neural networks diagrams
PyTorch implementation of Darknet53