-
Zhejiang University
- Hangzhou, Zhejiang Province, China
Stars
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
On the Theoretical Limitations of Embedding-Based Retrieval
Audio Dataset for training CLAP and other models
RayGen: Multi-Modal Dataset Reinforcement for MobileCLIP and MobileCLIP2
Python code for handling the Clotho dataset.
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
AudioBench: A Universal Benchmark for Audio Large Language Models
Open source code for supervised learning of bridge bidding.
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
🔥🔥First-ever hour scale video understanding models
[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Brief guides for ZJU freshmen. [site](https://zjuers.com/welcome/)
Train transformer language models with reinforcement learning.
An open source implementation of CLIP.
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
OpenAI CLIP text encoders for multiple languages!