End-to-End Speech Processing Toolkit
-
Updated
Nov 12, 2025 - Python
End-to-End Speech Processing Toolkit
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Python SDK for Palabra AI's real-time speech-to-speech translation API. Break down language barriers and enable seamless communication across 25+ languages
The application uses SpeechRecognition, GoogleTranslator, and gTTS to convert spoken English or Tamil into the opposite language, display the translated text, and play the audio output.
Repository containing the open source code of works published at the FBK MT unit.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"
StreamUni is a framework that efficiently enables unified Large Speech-Language Models to accomplish streaming speech translation in a cohesive manner.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Code for GMU's submission to IWSLT 2025 Low-Resource Speech Translation Shared Task
Official Repository for our IWSLT 2025 paper "Streaming Sequence Transduction through Dynamic Compression"
ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Simultaneous Speech-to-Text and Speech Translation using Azure AI.
🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.
Code from the paper "Towards Speech-to-Pictograms Translation" (Interspeech 2024)
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
Pushing the Limits of Zero-shot End-to-End Speech Translation
Add a description, image, and links to the speech-translation topic page so that developers can more easily learn about it.
To associate your repository with the speech-translation topic, visit your repo's landing page and select "manage topics."