Starred repositories
6
stars
written in Jupyter Notebook
Clear filter
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
A multi-voice TTS system trained with an emphasis on quality
Interview = 简历指南 + 算法题 + 八股文 + 源码分析
[WACV 2023] Audio-Visual Efficient Conformer (AVEC) for Robust Speech Recognition
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation