-
UESTC PhD, TJU Master's
Lists (6)
Sort Name ascending (A-Z)
Starred repositories
Implementation of Sparsemax activation in Pytorch
Simple conversion and localization between simplified and traditional Chinese using tables from MediaWiki.
A python binding for FFmpeg which provides sync and async APIs
Python bindings for FFmpeg - with complex filtering support
daanzu / py-webrtcvad-wheels
Forked from wiseman/py-webrtcvadPython interface to the WebRTC Voice Activity Detector (VAD) [released with binary wheels!]
Python interface to the WebRTC Voice Activity Detector
Advanced data structures for handling temporal segments with attached labels.
Identifying "who speak when" using visual speech input and pretrained lip-sync expert
SyncNet's modern implementation (Python 3.9~3.13)
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]
Official implement of SPEAKER-ADAPTIVE LIPREADING VIA SPATIO-TEMPORAL INFORMATION LEARNING
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A novel cross-modal decoupling and alignment framework for multimodal representation learning.
Code for the Interspeech 2024 paper "MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting"
The official implemention of StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models accepted by IJCV
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
CAS-VSR-MOV20: A challenging dataset for Chinese visual speech recognition, consisting of video clips from 20 movies.
A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Awesome Unified Multimodal Models
This repo contains the implementation of VQGAN, Taming Transformers for High-Resolution Image Synthesis in PyTorch from scratch. I have added support for custom datasets, testings, experiment track…
This repository contains the code for our upcoming paper An Investigation of End-to-End Models for Robust Speech Recognition at ICASSP 2021.