Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigating Attention Sinks and Massive Activations in Audio-Visual …

Python 52 4 Updated Nov 26, 2025

Aria-K-Alethia / BigCodec

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 211 17 Updated Sep 19, 2024

Stability-AI / stable-audio-tools

Generative models for conditional audio generation

Python 3,538 405 Updated Oct 9, 2025

facebookresearch / MovieGenBench

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

431 23 Updated Mar 8, 2025

xi-j / Mamba-ASR

ConMamba for Automatic Speech Recognition

Python 100 10 Updated Aug 12, 2024

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,826 113 Updated Sep 27, 2024

bytedance / SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

1,371 113 Updated Sep 28, 2025

mhamilton723 / DenseAV

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 85 13 Updated Jun 12, 2024

kylebgorman / syllabify

Python module for syllabifying English ARPABET transcriptions

Python 71 17 Updated Feb 15, 2019

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,554 772 Updated May 27, 2025

mct10 / RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 189 12 Updated Jul 12, 2024

ZhangXInFD / SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 636 64 Updated Jun 9, 2024

NVlabs / GroupViT

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.

Python 778 54 Updated May 10, 2022

rosinality / vq-vae-2-pytorch

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch

Python 1,788 285 Updated Feb 15, 2023

OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,927 383 Updated Mar 14, 2024

lstrgar / self-supervised-phone-segmentation

Phoneme segmentation using pre-trained speech models

Python 55 10 Updated Nov 4, 2022

xinjli / alqalign

multilingual speech aligner

Python 77 6 Updated Nov 19, 2023

YuanGongND / uavm

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

Python 57 3 Updated Apr 20, 2023

kamperh / vqwordseg

Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Jupyter Notebook 39 8 Updated Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cheng-I Jeff Lai jefflai108

Achievements

Achievements

Block or report jefflai108

Stars

MeiGen-AI / MultiTalk

volcengine / verl

facebookresearch / xformers

apple / axlearn

facebookresearch / seamless_communication

meta-pytorch / torchtune

vllm-project / vllm

xdit-project / xDiT

openai / codex

ByteDance-Seed / Bagel

kyutai-labs / moshi

umbertocappellazzo / Llama-AVSR