🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
-
Updated
Nov 12, 2025 - Python
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Your offline, privacy-first voice assistant framework. Transform speech into commands and actions with a powerful, scriptable rule engine.
A video analysis application that helps students, educators, and professionals analyze presentations by combining speech transcription, visual analysis, and AI-powered feedback. The app processes videos to provide actionable insights on speaking performance, visual effectiveness, and overall presentation quality.
Blazing fast whisper turbo for ASR (speech-to-text) tasks
A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.
Open-source voice data collection platform for building inclusive voice datasets. Collaborative transcription with quality consensus. FastAPI + React + PostgreSQL.
Improve pronunciation with real-time AI feedback
🔊 Train audio models efficiently with MiMo-Audio-Training, a toolkit designed for straightforward implementation and enhanced performance in audio processing tasks.
🎤 Control your world with Jarvis, a voice-activated AI assistant that simplifies tasks and enhances productivity.
📚 Transform learning with TatvaX, an AI platform providing personalized education in 8 Indian languages, breaking down language barriers for millions.
🎤 Transform spoken phrases into OWL ontologies, making it easy to create structured data from voice. Ideal for developers and researchers alike.
🤖 Learn motion imitation with MimicKit, a framework offering advanced methods to train motion controllers using state-of-the-art algorithms and techniques.
🔊 Evaluate audio performance with the MiMo-Audio-Eval toolkit, designed for accurate assessment and streamlined analysis in audio processing tasks.
🎤 Convert Bangla audio files to text accurately with BanglaSTT, a cross-platform speech-to-text tool powered by OpenAI Whisper.
📄 Generate and fine-tune large language models on Apple silicon effortlessly with MLX LM, integrating seamlessly with the Hugging Face Hub.
🎤 Enhance speech recognition by detecting emotions in spoken language, combining OpenAI's Whisper and emotion analysis for deeper insights.
🗣️ Align audio with text seamlessly on macOS, generating accurate timestamps and subtitles in multiple formats for better accessibility.
🛠️ Train diffusion models with ease using this all-in-one toolkit, designed for image and video on consumer-grade hardware. Run it as a GUI or CLI.
📑 Explore WenetSpeech-Yue, a comprehensive Cantonese speech corpus with rich annotations, designed for advancing speech recognition research.
End-to-End Speech Processing Toolkit
Add a description, image, and links to the speech-recognition topic page so that developers can more easily learn about it.
To associate your repository with the speech-recognition topic, visit your repo's landing page and select "manage topics."