asr

Star

Here are 802 public repositories matching this topic...

m-bain / whisperX

Sponsor

Star

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

speech speech-recognition speech-to-text whisper asr

Updated Mar 25, 2026
Python

NVIDIA-NeMo / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr speech-translation speaker-diariazation generative-ai

Updated Mar 28, 2026
Python

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Updated Mar 16, 2026
Python

speechbrain / speechbrain

Star

A PyTorch-based Speech Toolkit

Updated Mar 27, 2026
Python

FunAudioLLM / SenseVoice

Star

Multilingual Voice Understanding Model

multilingual python ai pytorch speech-recognition speech-to-text asr cross-lingual speech-emotion-recognition audio-event-classification aigc llm gpt-4o

Updated Dec 30, 2025
Python

jdepoix / youtube-transcript-api

Sponsor

Star

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

python cli youtube youtube-video youtube-api captions subtitles transcript subtitle transcripts asr youtube-subtitles youtube-transcripts youtube-captions youtube-transcript translating-transcripts youtube-asr

Updated Mar 23, 2026
Python

wzpan / wukong-robot

Sponsor

Star

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

alexa ai amazon-echo muse tts openai google-home unit bci speaker homeassistant snowboy asr anyq raspeberry-pi gpt3 chatgpt

Updated Oct 25, 2024
Python

wenet-e2e / wenet

Star

Production First and Production Ready End-to-End Speech Recognition Toolkit

pytorch transformer speech-recognition automatic-speech-recognition production-ready whisper asr conformer e2e-models

Updated Dec 19, 2025
Python

PeterH0323 / Streamer-Sales

Star

Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭建后端🗝️、Docker-compose 打包部署🐋

chat chatbot text-generation tts gpt chat-application asr rag digital-human llm chatgpt internlm-chat-7b internlm2 meta-human

Updated Mar 8, 2025
Python

ahmetoner / whisper-asr-webservice

Sponsor

Star

OpenAI Whisper ASR Webservice API

docker speech speech-recognition automatic-speech-recognition speech-to-text asr openai-whisper

Updated Nov 23, 2025
Python

CheshireCC / faster-whisper-GUI

Star

faster_whisper GUI with PySide6

openai vad whisper asr transcribe voice-transcription faster-whisper whisperx

Updated Dec 8, 2024
Python

tensorflow / lingvo

Star

Lingvo

nlp research translation tensorflow machine-translation speech distributed tts speech-synthesis mnist speech-recognition lm seq2seq speech-to-text gpu-computing language-model asr

Updated Mar 28, 2026
Python

linto-ai / whisper-timestamped

Star

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Updated Sep 9, 2025
Python

mravanelli / pytorch-kaldi

Star

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

deep-neural-networks deep-learning speech dnn pytorch recurrent-neural-networks lstm gru speech-recognition rnn kaldi rnn-model asr lstm-neural-networks multilayer-perceptron-network timit dnn-hmm

Updated Mar 14, 2022
Python

harry0703 / AudioNotes

Star

快速提取音视频内容，整理成一份结构化的markdown笔记

python ai whisper asr ollama qwen2 funasr

Updated Jul 26, 2024
Python

FireRedTeam / FireRedASR

Star

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.

open-source transformer speech-recognition automatic-speech-recognition asr conformer llm industrial-grade multimodal-llm speechllm