-
UESTC PhD, TJU Master's
Lists (6)
Sort Name ascending (A-Z)
Starred repositories
Integrate deep learning models for image classification | Backbone learning/comparison/magic modification project
Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch
Official implementation of "Separate Anything You Describe"
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
This is a collection of our NAS and Vision Transformer work.
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
Pytorch library for fast transformer implementations
Deep Clustering for Unsupervised Learning of Visual Features
Code for ALBEF: a new vision-language pre-training method
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
Meta-Transformer for Unified Multimodal Learning
This is an official implementation for "Video Swin Transformers".
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
An open source framework for seq2seq models in PyTorch.
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
A PyTorch-based library for semi-supervised learning (NeurIPS'21)
PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
In defence of metric learning for speaker recognition
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)