Skip to content
View vectominist's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@s3prl

Block or report vectominist

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Create SQL that match your selection (with explainable AI), not the other way around

Python 15 1 Updated Jun 17, 2026

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 161 11 Updated Mar 26, 2026

A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NeurIPS 2024]

Python 171 16 Updated Apr 29, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 19,323 1,687 Updated Nov 19, 2025

Gruvbox with Material Palette

Vim Script 2,596 192 Updated Apr 15, 2026

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 1,146 88 Updated Dec 23, 2024

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,586 1,187 Updated Jun 11, 2026

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…

Python 3,411 300 Updated Jun 18, 2026

Open-Source Frontier Voice AI

Python 49,476 5,516 Updated May 6, 2026

Alignment files of LibriTTS.

70 7 Updated Mar 16, 2020

Word alignments generated by the Montreal Forced Aligner for the Librispeech dataset

Python 182 24 Updated Mar 25, 2019

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 10,140 1,072 Updated Jun 17, 2026

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 9,365 786 Updated Mar 26, 2026

🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.

Jupyter Notebook 4,191 360 Updated May 25, 2026

[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.

Python 34 2 Updated Feb 27, 2026

DACVAE

Python 226 18 Updated Dec 22, 2025

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Python 72 8 Updated Mar 22, 2026

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Jupyter Notebook 171 16 Updated Jun 8, 2026

A benchmark for evaluating audio encoders on various audio tasks.

Python 53 8 Updated Apr 27, 2026

State-of-the-art pretrained music models for training, evaluation, inference

Python 182 20 Updated Jan 20, 2026

[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Python 1,014 77 Updated Jul 10, 2025

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 930 64 Updated Oct 28, 2024

MIT IAP short course: Matrix Calculus for Machine Learning and Beyond

Jupyter Notebook 591 86 Updated Jan 31, 2026

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Python 53 6 Updated Jan 18, 2024

Mamba SSM architecture

Python 18,460 1,758 Updated Jun 15, 2026

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,803 1,174 Updated Apr 8, 2026

FAIR Sequence Modeling Toolkit 2

Python 1,140 143 Updated Jun 16, 2026

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

Python 6,447 995 Updated Apr 4, 2025

[CVPR2023] Blind Video Deflickering by Neural Filtering with a Flawed Atlas

Python 761 45 Updated May 21, 2025
Next