Skip to content
View vvwangvv's full-sized avatar

Highlights

  • Pro

Block or report vvwangvv

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 4 Updated May 3, 2026
Python 7 Updated Jun 12, 2026

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,533 319 Updated May 26, 2026

Versatile Evaluation of Speech and Audio

Python 416 48 Updated May 29, 2026

Official baseline for ICASSP 2026 URGENT Challenge Track 2 (Speech Quality Assessment)

Python 31 3 Updated Jun 8, 2026

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,838 254 Updated Dec 30, 2025
Python 11 1 Updated Oct 23, 2025

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

Python 29,696 3,360 Updated Jun 10, 2026

the missing toolbox for an async world

Python 370 28 Updated Jun 14, 2026

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 511 68 Updated Dec 22, 2025

This is the repository for the Tool Learning survey.

484 15 Updated Aug 9, 2025

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes; NeurIPS 2024; Official code

Python 823 108 Updated Oct 16, 2024

[ICLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

Python 1,156 150 Updated Aug 24, 2025

SoTA LLM for converting natural language questions to SQL queries

Jupyter Notebook 4,035 278 Updated May 23, 2024

LlamaIndex is the leading document agent and OCR platform

Python 50,150 7,565 Updated Jun 12, 2026

Chat language model that can use tools and interpret the results

Python 1,595 119 Updated Dec 3, 2025

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.

Python 5,669 485 Updated May 21, 2025

Transcripts of the DNS Challenge test sets

7 Updated Jul 7, 2023

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 5,019 619 Updated Jul 2, 2024

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Python 7,255 1,061 Updated Aug 5, 2024

[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"

Jupyter Notebook 1,602 144 Updated Aug 15, 2024

A fast, local neural text to speech system

C++ 11,101 1,025 Updated Aug 26, 2025

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Python 247 23 Updated Apr 20, 2024

Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

Python 489 34 Updated Apr 15, 2024
Python 2 Updated Nov 24, 2023

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation

Python 757 106 Updated May 12, 2026

Official code for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Jupyter Notebook 1,833 230 Updated Nov 29, 2022

Accurate stronghold calculator for Minecraft speedrunning.

Java 752 97 Updated Jun 11, 2026

📖 A curated list of resources dedicated to talking face.

1,541 121 Updated Dec 23, 2024
Next