Build software better, together

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

audio python nlp machine-learning natural-language-processing deep-learning pytorch transformer speech-recognition glm pretrained-models hacktoberfest gemma vlm pytorch-transformers model-hub llm qwen deepseek

Updated Nov 12, 2025
Python

sl5net / SL5-aura-service

Star

Your offline, privacy-first voice assistant framework. Transform speech into commands and actions with a powerful, scriptable rule engine.

automation framework cross-platform rule-engine offline voice-commands speech-recognition extensible speech-to-text languagetool text-processing dictation scripting-engine pluggable stt assistive-technology voice-assistant privacy-first vosk

Updated Nov 12, 2025
Python

michaelborck-education / deep-brief

Star

A video analysis application that helps students, educators, and professionals analyze presentations by combining speech transcription, visual analysis, and AI-powered feedback. The app processes videos to provide actionable insights on speaking performance, visual effectiveness, and overall presentation quality.

python computer-vision web-application speech-recognition cli-tool video-analysis ai-feedback presentation-analysis

Updated Nov 12, 2025
Python

JosefAlbers / whisper-turbo-mlx

Star

Blazing fast whisper turbo for ASR (speech-to-text) tasks

deep-learning speech-recognition speech-to-text whisper asr mlx whisper-turbo

Updated Nov 12, 2025
Python

cubist38 / mlx-openai-server

Star

A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.

flux queue speech-recognition image-generation whisper vision-api mlx fastapi apple-silicon structured-outputs mlx-lm mlx-vlm openai-compatible mlx-openai-server

Updated Nov 12, 2025
Python

Onuronon-lab / Shrutik

Star

Open-source voice data collection platform for building inclusive voice datasets. Collaborative transcription with quality consensus. FastAPI + React + PostgreSQL.

Updated Nov 12, 2025
Python

silalahi / bisik

Star

Improve pronunciation with real-time AI feedback

python education flask language-learning pronunciation speech-recognition ipa phonetics speech-processing audio-processing pronunciation-evaluation openai-whisper

Updated Nov 12, 2025
Python

Ghalwash123 / MiMo-Audio-Training

Star

🔊 Train audio models efficiently with MiMo-Audio-Training, a toolkit designed for straightforward implementation and enhanced performance in audio processing tasks.

python open-source machine-learning deep-learning signal-processing modeling dataset feature-extraction speech-recognition neural-networks data-analysis performance-evaluation mimo audio-research audio-training

Updated Nov 12, 2025
Python

falvarop / jarvis

Star

🎤 Control your world with Jarvis, a voice-activated AI assistant that simplifies tasks and enhances productivity.

linux chat bot home-automation raspberry-pi webpack ai deep-learning messenger python-programming assistant voice-recognition speech-recognition openai personal-assistant virtual-assistant tauri jarvis-ai

Updated Nov 12, 2025
Python

Damijan123 / TatvaX-AI-PROTOTYPE

Star

📚 Transform learning with TatvaX, an AI platform providing personalized education in 8 Indian languages, breaking down language barriers for millions.

edtech speech-recognition flask-application voice-assistant smart-education smart-learning educational-technology translation-api python-chatbot indian-languages ai-education voice-enabled nlp-project multilingual-learning ai-chatbot-project student-learning ai-tutor educational-chatbot

Updated Nov 12, 2025
Python

SANTANC / speech-to-owl

Star

🎤 Transform spoken phrases into OWL ontologies, making it easy to create structured data from voice. Ideal for developers and researchers alike.

python flask rdf owl ontology rdflib speech-recognition openai speech-to-text whisper audio-processing rdfxml voice-interface

Updated Nov 12, 2025
Python

visu123s / MimicKit

Star

🤖 Learn motion imitation with MimicKit, a framework offering advanced methods to train motion controllers using state-of-the-art algorithms and techniques.

open-source machine-learning deep-learning signal-processing python-library speech-recognition neural-networks user-interface data-augmentation audio-processing generative-models voice-synthesis sound-design mimic-kit real-time-synthesis

Updated Nov 12, 2025
Python

zkazuya / MiMo-Audio-Eval

Star

🔊 Evaluate audio performance with the MiMo-Audio-Eval toolkit, designed for accurate assessment and streamlined analysis in audio processing tasks.

python open-source machine-learning research deep-learning signal-processing data-visualization audio-analysis dataset speech-recognition neural-networks evaluation-metrics real-time-processing mimo audio-evaluation

Updated Nov 12, 2025
Python

engosoro / BanglaSTT

Star

🎤 Convert Bangla audio files to text accurately with BanglaSTT, a cross-platform speech-to-text tool powered by OpenAI Whisper.

nlp open-source machine-learning ai speech-recognition openai bangla speech-to-text transcription whisper cli-tool python-project

Updated Nov 12, 2025
Python

20XD099 / mlx-lm

Star

📄 Generate and fine-tune large language models on Apple silicon effortlessly with MLX LM, integrating seamlessly with the Hugging Face Hub.

python training chat apple ai speech-recognition openai image-generation neovim-plugin copilot vision-api mlx supervised-machine-learning openai-api finetuning-llms structured-outputs pydantic-ai mlx-lm

Updated Nov 12, 2025
Python

NotAbhinavGamerz / emotion-aware-automatic-speech-recognition

Star

🎤 Enhance speech recognition by detecting emotions in spoken language, combining OpenAI's Whisper and emotion analysis for deeper insights.

nlp sentiment-analysis speech-recognition speech-to-text audio-processing emotion-detection audio-processing-with-python python-projects sentiment-analysis-model speech-recognition-model emotion-ai emotion-detection-emotion-classification whisper-asr-model

Updated Nov 12, 2025
Python

atsu12345 / WhisperAlign-CLI

Star

🗣️ Align audio with text seamlessly on macOS, generating accurate timestamps and subtitles in multiple formats for better accessibility.

python macos cli pytorch subtitles speech-recognition vtt speech-to-text mps whisper asr tqdm forced-alignment pydub ffmpg audio-transcription apple-silicon stable-ts

Updated Nov 12, 2025
Python

wannabehacker512 / ai-toolkit

Star

🛠️ Train diffusion models with ease using this all-in-one toolkit, designed for image and video on consumer-grade hardware. Run it as a GUI or CLI.

python nlp ai deep-learning mcp gemini speech-recognition medical-image-processing ncnn streamlit tnn monai mnn-model yolox robustvideomatting stable-diffusion yolov8 generative-ai

Updated Nov 12, 2025
Python

Jahangirbd23 / WenetSpeech-Yue

Star

📑 Explore WenetSpeech-Yue, a comprehensive Cantonese speech corpus with rich annotations, designed for advancing speech recognition research.

multilingual python open-source machine-learning natural-language-processing research deep-learning dataset speech-recognition neural-networks transcription audio-processing wenetspeech yue-dialect voice-technology chinese-languages

Updated Nov 12, 2025
Python

espnet / espnet

Star

End-to-End Speech Processing Toolkit

text-to-speech deep-learning chainer end-to-end machine-translation pytorch speech-synthesis speech-recognition kaldi voice-conversion speaker-diarization speech-separation speech-enhancement spoken-language-understanding speech-translation singing-voice-synthesis

Updated Nov 12, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speech-recognition

Here are 2,614 public repositories matching this topic...

huggingface / transformers

sl5net / SL5-aura-service

michaelborck-education / deep-brief

JosefAlbers / whisper-turbo-mlx

cubist38 / mlx-openai-server

Onuronon-lab / Shrutik

silalahi / bisik

Ghalwash123 / MiMo-Audio-Training

falvarop / jarvis

Damijan123 / TatvaX-AI-PROTOTYPE

SANTANC / speech-to-owl

visu123s / MimicKit

zkazuya / MiMo-Audio-Eval

engosoro / BanglaSTT

20XD099 / mlx-lm

NotAbhinavGamerz / emotion-aware-automatic-speech-recognition

atsu12345 / WhisperAlign-CLI

wannabehacker512 / ai-toolkit

Jahangirbd23 / WenetSpeech-Yue

espnet / espnet

Improve this page

Add this topic to your repo