You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trains linear multi-head self-attention layers on top of them to extract vocal beat activations. Then, it uses HMM decoder to infer signing beats and t…
This repository contains different approaches I tried for improving ASR systems for accented English speech. All of them use the HuBERT model as baseline
Implementation of Speech Emotion Recognition (SER) on the CREMA-D dataset using both log-Mel spectrograms and HuBERT embeddings. Includes preprocessing, feature extraction, CNN/MLP models, training/evaluation scripts, and visualization tools for analyzing accuracy, loss, and confusion matrices.
Unofficial PyTorch implementation of Higgs Audio V2 Tokenizer with HuBERT semantic features. Complete training pipeline for semantic-acoustic audio tokenization with 960x downsampling and 8-layer RVQ.
Functionality for speech data processing including time alignment, encoding with speech encoders (tokenizers) and data preprocessing of common datasets