Stars
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
轻量级大语言模型MiniMind的源码解读,包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程
Use interactive notebook to break down MiniMind code and learn from scratch.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
A Survey of Reinforcement Learning for Large Reasoning Models
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
Implement a reasoning LLM in PyTorch from scratch, step by step
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
"Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy. A frontier, first-principles handbook inspi…
The best workflows and configurations I've developed, having heavily used Claude Code since the day of it's release. Workflows are based off applied learnings from our AI-native startup.
《解构大语言模型:从线性回归到通用人工智能》配套代码
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grained visual understanding".
Context engineering is the new vibe coding - it's the way to actually make AI coding assistants work. Claude Code is the best for this so that's what this repo is centered around, but you can apply…
Advanced Python Mastery (course by @dabeaz)
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.