Lists (18)
Sort Name ascending (A-Z)
Stars
🔥[IJCAI 2022, Official Code] for paper "Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks". Official Weights and Demos provided. 首个面向多主题场景的美学评估数据集、算法和benchmark.
🎁 6,500,000+ Unsplash images made available for research and machine learning
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
🔥 🔥 🔥 Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding 📹
A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
A Survey of Reinforcement Learning for Large Reasoning Models
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Reference PyTorch implementation and models for DINOv3
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
SGLang is a high-performance serving framework for large language models and multimodal models.
Real-time webcam demo with SmolVLM and llama.cpp server
Muon is an optimizer for hidden layers in neural networks
A high-throughput and memory-efficient inference and serving engine for LLMs
[CVPR 2023 Highlight] Perspective Fields for Single Image Camera Calibration
🚀 「大模型」1小时从0训练67M参数的视觉多模态VLM!🌏 Train a 67M-parameter VLM from scratch in just 1 hours!
🚀🚀 「大模型」2小时完全从0训练64M的小参数GPT!🌏 Train a 64M-parameter GPT from scratch in just 2h!
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
A Framework of Small-scale Large Multimodal Models
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.