-
Brno University of Technology
-
03:32
(UTC +02:00)
Stars
The official implementation of GTCRN, an ultra-lightweight SE model.
🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
WeDefense: A Toolkit to Defend Against Fake Audio
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Open Source framework for voice and multimodal conversational AI
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Official inference library for Mistral models
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
The baselines of ARC-Challenge-Interspeech2026
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
CHIME-7/8 diarization champion system: neural speaker diarization using memory-aware multi-speaker embedding with sequence-to-sequence architecture
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
This repo contains a PyTorch implementation of the paper: "Evidential Deep Learning to Quantify Classification Uncertainty"
Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models