A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
Updated
Nov 11, 2025 - Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Python SDK for Palabra AI's real-time speech-to-speech translation API. Break down language barriers and enable seamless communication across 25+ languages
End-to-End Speech Processing Toolkit
The application uses SpeechRecognition, GoogleTranslator, and gTTS to convert spoken English or Tamil into the opposite language, display the translated text, and play the audio output.
Repository containing the open source code of works published at the FBK MT unit.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Code for the papers: "Efficient Speech Translation through Model Compression and Knowledge Distillation" and "Iterative Layer Pruning for Efficient Translation Inference"
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice isolation, language detection and more.
The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"
StreamUni is a framework that efficiently enables unified Large Speech-Language Models to accomplish streaming speech translation in a cohesive manner.
This repository contains the data resources for the LacunaFund supported project, Multimodal datasets for the Bemba Language of Zambia.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation (INTERSPEECH 2022)
AI Video Translator and Subtitler
ESO speech dataset: an English-language speech corpus of the oncology domain for ASR training and benchmarking and MT benchmarking.
Code for GMU's submission to IWSLT 2025 Low-Resource Speech Translation Shared Task
Official Repository for our IWSLT 2025 paper "Streaming Sequence Transduction through Dynamic Compression"
ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Add a description, image, and links to the speech-translation topic page so that developers can more easily learn about it.
To associate your repository with the speech-translation topic, visit your repo's landing page and select "manage topics."