A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 13,956 1,446 Updated Dec 17, 2025

stepfun-ai / Step-Audio

Python 4,571 370 Updated Jun 12, 2025

ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Python 3,485 410 Updated Nov 12, 2025

snakers4 / silero-models

Silero Models: pre-trained text-to-speech models made embarrassingly simple

Jupyter Notebook 5,662 358 Updated Dec 5, 2025

yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 6,094 643 Updated Aug 10, 2024

chentuochao / Target-Conversation-Extraction

This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamics"

Python 55 6 Updated Aug 15, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,638 12,029 Updated Dec 17, 2025

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,007 155 Updated Apr 21, 2025

rainavyas / prepend_acoustic_attack

Prepend universal audio attack segment to mute Whisper

Python 31 12 Updated Jan 22, 2025

popcornell / ASRLightningFT

Python 7 1 Updated Jun 19, 2024

speechbrain / benchmarks

This repository contains the SpeechBrain Benchmarks

Python 133 46 Updated Jul 15, 2025

unslothai / unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 49,545 4,087 Updated Dec 17, 2025

ml-explore / mlx

MLX: An array framework for Apple silicon

C++ 23,106 1,423 Updated Dec 17, 2025

mlabonne / llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

70,505 8,061 Updated Jun 4, 2025

idiap / translation-aided-slu

Reference code for the paper The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation.

Python 1 Updated May 28, 2025

amazon-science / stac-speech-translation

Python 12 1 Updated Mar 6, 2024

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 19,214 2,044 Updated Oct 21, 2025

JuanPZuluaga / accent-recog-slt2022

Repository for Accent Recognition (Hackathon @SLT2022)

Jupyter Notebook 37 13 Updated May 12, 2024

mt-upc / SHAS

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Python 40 5 Updated Feb 9, 2023

dustinkirkland / byobu

text window manager, shell multiplexer, integrated DevOps environment

Shell 1,468 132 Updated Mar 30, 2025

lixin4ever / Conference-Acceptance-Rate

Acceptance rates for the major AI conferences

Jupyter Notebook 4,691 315 Updated Sep 23, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,309 3,236 Updated Dec 17, 2025

idiap / atco2-corpus

A Corpus for Research on Robust Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications

Python 76 8 Updated Mar 24, 2023

mzboito / IWSLT2022_Tamasheq_data

Repository for sharing the data in the Tamasheq language, one of the target languages for the low-resource speech translation track at IWSLT2022.

18 8 Updated Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Juan Pablo Zuluaga JuanPZuluaga

Achievements

Achievements

Block or report JuanPZuluaga

Lists (1)

Speech Research

Stars

lingjzhu / zipa

joke2k / faker

linkedin / Liger-Kernel

dusty-nv / jetson-containers

vllm-project / compressed-tensors

k2-fsa / k2

modelscope / FunASR