so-vits-svc fork with realtime support, improved interface and more features.
-
Updated
Dec 16, 2025 - Python
so-vits-svc fork with realtime support, improved interface and more features.
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Phoneme segmentation using pre-trained speech models
[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations
This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trains linear multi-head self-attention layers on top of them to extract vocal beat activations. Then, it uses HMM decoder to infer signing beats and t…
unsupervised spoken utterances scoring
Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models, to appear in ICAIIC 2026
code for our paper DistilALHuBERT: A Distilled Parameter Sharing Audio Representation Model
The code for the MAPSS measures for source separation evaluation.
Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification, ISCA Interspeech 2025
Speech Keyword detection using Wav2Vec Model
Unofficial PyTorch implementation of Higgs Audio V2 Tokenizer with HuBERT semantic features. Complete training pipeline for semantic-acoustic audio tokenization with 960x downsampling and 8-layer RVQ.
Functionality for speech data processing including time alignment, encoding with speech encoders (tokenizers) and data preprocessing of common datasets
Acoustic Transformer Models for Audio Classification
🐶 Voice Cloning Bark HuBERT - Enables voice cloning from personalized audio samples by processing model's outputs into semantic tokens compatible with text-to-audio system.
This repository contains different approaches I tried for improving ASR systems for accented English speech. All of them use the HuBERT model as baseline
Implementation of Speech Emotion Recognition (SER) on the CREMA-D dataset using both log-Mel spectrograms and HuBERT embeddings. Includes preprocessing, feature extraction, CNN/MLP models, training/evaluation scripts, and visualization tools for analyzing accuracy, loss, and confusion matrices.
Pipeline for generating images conditioned on input audio
Add a description, image, and links to the hubert topic page so that developers can more easily learn about it.
To associate your repository with the hubert topic, visit your repo's landing page and select "manage topics."