Skip to content
View jefflai108's full-sized avatar
🍄
venture to a bigger world
🍄
venture to a bigger world

Block or report jefflai108

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,727 460 Updated Dec 18, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,661 2,860 Updated Dec 21, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,201 746 Updated Dec 12, 2025

An Extensible Deep Learning Library

Python 2,303 392 Updated Dec 11, 2025

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,727 1,167 Updated Nov 14, 2024

PyTorch native post-training library

Python 5,625 692 Updated Dec 21, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,865 12,101 Updated Dec 21, 2025

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,478 295 Updated Dec 19, 2025

Lightweight coding agent that runs in your terminal

Rust 54,418 6,907 Updated Dec 21, 2025

Open-source unified multimodal model

Python 5,491 481 Updated Oct 27, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,194 832 Updated Nov 20, 2025

Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigating Attention Sinks and Massive Activations in Audio-Visual …

Python 52 4 Updated Nov 26, 2025

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 211 17 Updated Sep 19, 2024

Generative models for conditional audio generation

Python 3,538 405 Updated Oct 9, 2025

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

431 23 Updated Mar 8, 2025

ConMamba for Automatic Speech Recognition

Python 100 10 Updated Aug 12, 2024

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,826 113 Updated Sep 27, 2024

SALMONN family: A suite of advanced multi-modal LLMs

1,371 113 Updated Sep 28, 2025

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 85 13 Updated Jun 12, 2024

Python module for syllabifying English ARPABET transcriptions

Python 71 17 Updated Feb 15, 2019

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,554 772 Updated May 27, 2025

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 189 12 Updated Jul 12, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 636 64 Updated Jun 9, 2024

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.

Python 778 54 Updated May 10, 2022

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch

Python 1,788 285 Updated Feb 15, 2023

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,927 383 Updated Mar 14, 2024

Phoneme segmentation using pre-trained speech models

Python 55 10 Updated Nov 4, 2022

multilingual speech aligner

Python 77 6 Updated Nov 19, 2023

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

Python 57 3 Updated Apr 20, 2023

Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Jupyter Notebook 39 8 Updated Mar 4, 2024
Next