-
IDIAP
- Switzerland
- https://juanpzuluaga.github.io/
- @PabloGomez3
Lists (1)
Sort Name ascending (A-Z)
Stars
A family of efficient speech models for multilingual phone recognition
Faker is a Python package that generates fake data for you.
Efficient Triton Kernels for LLM Training
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
A safetensors extension to efficiently store sparse quantized tensors on disk
FSA/FST algorithms, differentiable, with PyTorch compatibility.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Whisper realtime streaming for long speech-to-text transcription and translation
Silero Models: pre-trained text-to-speech models made embarrassingly simple
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamics"
A high-throughput and memory-efficient inference and serving engine for LLMs
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Prepend universal audio attack segment to mute Whisper
This repository contains the SpeechBrain Benchmarks
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Reference code for the paper The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Repository for Accent Recognition (Hackathon @SLT2022)
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
text window manager, shell multiplexer, integrated DevOps environment
Acceptance rates for the major AI conferences
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A Corpus for Research on Robust Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications
Repository for sharing the data in the Tamasheq language, one of the target languages for the low-resource speech translation track at IWSLT2022.