-
Zhejiang University
- Hangzhou, Zhejiang Province, China
Stars
The world's simplest facial recognition api for Python and the command line
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Train transformer language models with reinforcement learning.
An open source implementation of CLIP.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Retrieval and Retrieval-augmented LLMs
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025
FireRed-OpenStoryline is an AI video editing agent that transforms manual editing into intention-driven directing through natural language interaction, LLM-powered planning, and precise tool orches…
A high-performance topological machine learning toolbox in Python
This repository is the notebook of Data Structure and Algorithms of ZJU "数据结构-浙江大学"
Audio Dataset for training CLAP and other models
Kepler Mapper: A flexible Python implementation of the Mapper algorithm.
🔥🔥First-ever hour scale video understanding models
开源剪映小助手|剪映API | 扣子插件 | Open-source CapCut automation toolkit to generate & download draft files. | skills
[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
AudioBench: A Universal Benchmark for Audio Large Language Models
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps