Highlights
- Pro
Stars
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
This repository is the official implementation of the ECAI 2024 conference paper SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
LAVIS - A One-stop Library for Language-Vision Intelligence
SALMONN family: A suite of advanced multi-modal LLMs
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup
Tools for handling multimodal data in machine learning projects.
Speech-to-text server framework with next-gen Kaldi
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, L…
kaldi-asr/kaldi is the official location of the Kaldi project.
FSA/FST algorithms, differentiable, with PyTorch compatibility.
personal website + blog for every github user
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation
Tensorflow r2.1 reimplementation of Model-Agnostic Meta-Learning
深度学习与PyTorch入门实战视频教程 配套源代码和PPT