Skip to content
View LindgeW's full-sized avatar
🎯
Focusing
🎯
Focusing
  • UESTC PhD, TJU Master's

Block or report LindgeW

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

598 stars written in Python
Clear filter

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,329 308 Updated Jun 21, 2025

Official DeiT repository

Python 4,279 584 Updated Mar 15, 2024

LPIPS metric. pip install lpips

Python 4,104 518 Updated Jul 2, 2024

Minimal keyword extraction with BERT

Python 4,038 373 Updated Oct 23, 2025

3D ResNets for Action Recognition (CVPR 2018)

Python 4,022 936 Updated Jan 20, 2021

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,825 344 Updated Jan 4, 2024

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,779 295 Updated Nov 6, 2025

OpenMMLab Pre-training Toolbox and Benchmark

Python 3,761 1,106 Updated Nov 1, 2024

Scenic: A Jax Library for Computer Vision Research and Beyond

Python 3,703 465 Updated Nov 6, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,668 297 Updated Nov 5, 2025

Whisper realtime streaming for long speech-to-text transcription and translation

Python 3,429 405 Updated Sep 2, 2025

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Python 3,429 263 Updated Oct 18, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,426 294 Updated Nov 5, 2024

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 3,085 282 Updated Jun 4, 2024

Pythonic bindings for FFmpeg's libraries.

Python 3,018 411 Updated Oct 13, 2025

View model summaries in PyTorch!

Python 2,877 131 Updated Nov 3, 2025

Retinaface get 80.99% in widerface hard val using mobilenet0.25.

Python 2,876 797 Updated Jun 28, 2023

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Python 2,666 295 Updated Jul 31, 2024

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code

Python 2,643 301 Updated Oct 18, 2024

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,542 228 Updated Oct 30, 2025

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,440 177 Updated Mar 28, 2025

This library provides common speech features for ASR including MFCCs and filterbank energies.

Python 2,420 620 Updated Oct 20, 2021

label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful

Python 2,253 373 Updated Oct 17, 2024

Official PyTorch code for "BAM: Bottleneck Attention Module (BMVC2018)" and "CBAM: Convolutional Block Attention Module (ECCV2018)"

Python 2,194 409 Updated Mar 9, 2023

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,182 333 Updated Sep 10, 2025

The implementation of DeBERTa

Python 2,162 237 Updated Sep 29, 2023

Speech Recognition using DeepSpeech2.

Python 2,136 629 Updated Dec 13, 2022

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Python 2,103 604 Updated Oct 27, 2023

本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形,设置面部区域可配置的增强方式进行合成唇形(人脸)区域画面增强,提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧,补充帧间合成唇形的动作过渡,使合成的唇形更为流畅、真实以及自然。

Python 1,995 347 Updated Jun 4, 2023

🔓 Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

Python 1,893 331 Updated Nov 7, 2022